Producing Xi squared distribution by Monte Carlo

In summary, The Xi-squared test is used to determine if two sets of categorical data are independent. The test involves calculating a p-value, which in this case is 0.0184, and then using a Monte Carlo method to verify this value. This can be done by simulating the generation of other test data under the assumption of the null hypothesis and comparing the chi-square statistic of these simulated data sets to the value obtained from the actual data. It is important to properly define the null hypothesis and fill the cells of the data table accordingly. Additionally, the expected frequencies for each cell must be known in order to calculate the chi-square statistic.
  • #1
Whenry
23
0
Hi all,

I am using a Xi squared test to for independence of two sets of categorical data.

So let's say I have a vector x1 of 1s and 0s and x2 of 1s and 0s, and I am testing to see if x1 and x2 are independent.

and let's say, for my given data, with n = 200, I have

x1=1 x1=0
x2=1 40 80
x2=0 40 40


For this particular distribution, I get a p value of 0.0184.

How can I 'verify' this using monte carlo method?

I tired two ways so far.

First I calculated, from above, P(x2|x1) = 0.5

I then randomly generated 10000 above tables with p(x1=1) = 0.6 and p(x2=1) = 0.4.
I then looked for the number of groups which had P(x2|x1) > 0.5.

This didn't work...and I realized to I am not checking for the correct thing. But I am using Xi squared in the first place to see if the conditional probability is 'significant', so this should tell me something?

I tried another way in which I generated 10000 above tables, just as before.

The average of these tables is

48 72
32 48

so I looked for all the tables with

<40 >80
>40 <40


Now, one more related question is: if I find that I can reject the null hypothesis that x1 and x2 are independent. What do I use to measure accuracy of the calculated condition probability.

For example if I have x1 = [ zeros(1,998), 1,1] and x2 = [ zeros(1,998), 1,1] .

Then I find that I can reject the null hypothesis, but with what certainty can I say p(x1|x2) = 1?

Will
 
Physics news on Phys.org
  • #2
Whenry said:
For this particular distribution, I get a p value of 0.0184.

How can I 'verify' this using monte carlo method?

This is a good question. I find that writing a simulation (or just thinking about how to do it) clarifies problems in probability.

The simplistic answer is:
1) Compute the value [itex] \chi_{data}[/itex] of chi_square statistic on the test data you have
2) Simulate the generation of other test data under the assumption of the "null hypothesis" many times and compute the value of the chi_square statistic on each of these simulated data sets.
3) Compute what fraction of the chi_square statistics of the simulated data sets are greater than or equal to [itex] \chi_{data} [/itex]. See if that is approximately 0.0184


The difficulty is determining what it means to simulate the tests data under the assumption of the null hypothesis. Your examples involve assumptions about conditional probability. I don't know exactly how you implemented these assumptions.
Nor do I profess to remember exactly how people who do chi-squared tests state the null I hypothesis if they are pressed to state it precisely, so all I can tell you is my guess about this.

For the data table:
_________X1=1___X1=0
X2 = 1 ---40---------80
X2 = 0 ---40---------40

I think the null hypothesis assumes that X1 will be 1 in 80 cases, that X1 will be 0 in 120 cases, that X2 will be 1 in 120 cases and that X2 will be 0 in 80 cases. Subject to those constraints, we randomly assign the cases to the 4 cells in the data table.

This is different than assuming probabilities such as P(X =1). If we said P(X=1) = 80/200 then in simulating 200 cases, we might not get exactly 80 cases where X = 1

In expositions of the chi-square test, one often reads that the null hypothesis is merely about the independence of two events. If that were true then all it would tell us in this problem is things like P(X1=1 and X2=1) = P(X1=1)P(X2=1) and it wouldn't tell us the values of these probablities. I think the proper statement of the null hypothesis is that we assume that the "marginal frequencies" of data table come out as observed. For example we assume X1 will be 1 in 80 cases. This is much more restrictive assumption that assuming that the marginal frequences are equal to the true probabilities of events such as X1=1.

So, how shall we fill the cells of the table "randomly" but still make it come out so the marginal totals are fixed?

Also, the chi-squared statistic requires that we know the expected frequency for each cell in the table. What are the expected frequencies for each cell?
 
Last edited:

Related to Producing Xi squared distribution by Monte Carlo

1. What is the Monte Carlo method used for in producing Xi squared distribution?

The Monte Carlo method is a computational technique that uses random sampling to solve mathematical problems. In producing Xi squared distribution, the Monte Carlo method is used to generate a large number of random samples that approximate the distribution.

2. How does the Monte Carlo method produce Xi squared distribution?

The Monte Carlo method produces Xi squared distribution by using a random number generator to create a large number of samples from a given probability distribution. These samples are then used to calculate the Xi squared statistic, which follows a Xi squared distribution.

3. Why is the Monte Carlo method a useful tool for producing Xi squared distribution?

The Monte Carlo method is a useful tool for producing Xi squared distribution because it allows for the calculation of complex statistical measures that would be difficult or impossible to obtain analytically. It also allows for the production of a large number of samples, which can improve the accuracy of estimations.

4. What are the advantages of using the Monte Carlo method in producing Xi squared distribution?

The advantages of using the Monte Carlo method in producing Xi squared distribution include its ability to handle complex problems, its flexibility in handling different types of distributions, and its ability to produce a large number of samples for improved accuracy. It also allows for the analysis of distributions that do not have a closed-form solution.

5. Are there any limitations to using the Monte Carlo method in producing Xi squared distribution?

While the Monte Carlo method is a powerful tool for producing Xi squared distribution, it does have some limitations. These include the need for a large number of samples to obtain accurate estimations, the potential for bias in the results, and the computational resources required for complex problems. Additionally, the Monte Carlo method may not be the most efficient approach for simple problems with closed-form solutions.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
12
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
904
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
2K
  • Programming and Computer Science
Replies
1
Views
889
  • Atomic and Condensed Matter
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
10
Views
7K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
4K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
Back
Top