Does Returning Marbles After Each Draw Affect Chi-square Test Results?

JFS321 · Mar 6, 2019

Hi all, I'm a high school physics / AP biology teacher looking to expand my understanding of the Chi-square test some. I planned an activity in which students are randomly drawing colored marbles out of a bag in order to see if they match predicted ratios (2:1, 1:1, others). I'm having them draw blindly 50x (there are around 20-30 marbles per bag), returning the marble into the bag after each draw.

I'm wondering, though -- how does performing the test in this manner differ from simply having them dump all the marbles out and count the actual numbers and doing the Chi-square test with those exact values? If you do a Chi-sq test on flipping a coin, for example, this seems to be similar to returning the marble after each draw. They are all independent events. So, hopefully my question makes sense -- how does the Chi-square test "change" between these two scenarios? Does it?

Thanks. Please note, I am no mathematician.

andrewkirk · Mar 6, 2019

JFS321 said:

I'm wondering, though -- how does performing the test in this manner differ from simply having them dump all the marbles out and count the actual numbers and doing the Chi-square test with those exact values?

A Chi square test is designed to test a hypothesis about proportions in a population. It is only used when the entire population cannot be tested, either because it is too big, or because testing is intrusive or unpleasant. So a sample is tested and the Chi Square test tells us confidence levels about whether the sample accurately reflects the population.

If our population is the marbles in the bag and we take them all out and count the proportions of each colour then we know the exact, correct proportions for the population and no Chi Square test is needed.

But if we only take out a few marbles from the bag then we use the sample with a Chi Square test to get confidence levels for our hypothesis about the whole bag.

The hypothesis in this case might be something like 'The bag contains equal numbers of each colour'.

If instead the population were all marbles produced by the factory and sold in bags of that type then we cannot test the whole population and if we examine the entire contents of one bag then that is a sample of the population and Chi Square may be used, just as it could be if we only inspected some of the marbles from the bag.

But I'd advise caution about using that second approach, because the selection of bag is not random. It was bought from a particular shop in a particular area at a particular time. For all we know, the proportions in bags may have changed over time, or may differ between different distribution markets. Chi Square approaches always have a cloud of uncertainty over them when selection of the sample is not sufficiently random. Drawing marbles out of a bag while not looking, and having first given them a good mixing, is a pretty good random selection method where the population is just what's in the bag.

JFS321 · Mar 6, 2019

Ok, thanks a lot. I think I can make sense of that -- basically, if we did dump out all the marbles (I'm not going to do it that way, but...), those marbles would theoretically represent a random sample from the larger population. Then the Chi sq can tell us the probability getting those results based on our random sampling efforts -- in other words, the likelihood of receiving another sample at least as extreme as that one. Any issues there?

andrewkirk · Mar 6, 2019

The confidence statistic one gets from a Chi square is not a probability that one would get the same results from another random sample. Rather it is the probability that one can reject the Null Hypothesis which, in a situation like this, would typically be that all colours have the same frequency in the whole population.

If there are three colours Red, Green and Blue and we draw out twenty marbles and see that eight are red, two are green and ten are blue, we perform a Chi square test. The Chi square value is 5.2 and there are two degrees of freedom. The p-value is about 7.5%. That tells us that the probability of getting a Chi square value that high or higher from a sample of twenty is only 7.5% if the three colours are equally frequent. People sometimes say this as 'the probability that the three colours are equally frequent is 7.5%', which is a paraphrase that will send some statisticians into fits of rage, but it may be OK if you are talking to school students.

Note that the test says noting about the probability of getting another sample with the same proportions.

JFS321 · Mar 6, 2019

Also, let me be sure I am clear on this. There are only about 25 marbles in the bag -- by having them draw/replace 50x, am I increasing my statistical power because we are much more likely to get a ratio that is closest to the actual ratio in the bag? I think it's the "pretending" part that is getting me...sampling 25 marbles 50x is basically sampling the whole population, but we are pretending it's just a sample. Perhaps I should have done 100 marbles per bag and none of these questions would have jumped in my mind!

andrewkirk · Mar 6, 2019

JFS321 said:

There are only about 25 marbles in the bag -- by having them draw/replace 50x, am I increasing my statistical power because we are much more likely to get a ratio that is closest to the actual ratio in the bag?

Yes. But it's by virtue of the number of samples drawn, not by virtue of the ratio of that to the number in the bag. The statistical power is the same regardless of whether there are 25 in the bag or 25,000. It is only the number of samples and the number of different colours that matters.

JFS321 said:

sampling 25 marbles 50x is basically sampling the whole population,

It feels like that, but that is not the case. For all we know we picked only as many different marbles as the number of different colours we sampled. If all our samples were the same colour, it might have even been the same marble every time.

JFS321 · Mar 7, 2019

Thanks. All of this makes good sense.

When you said "for all we know we picked only as many different marbles as the number of different colors..." ... Do you mean if we had 100 red, 50 blue, and 10 green, we may have picked the same 3 red, blue, and green marble each sampling event?

andrewkirk · Mar 7, 2019

JFS321 said:

Thanks. All of this makes good sense.

When you said "for all we know we picked only as many different marbles as the number of different colors..." ... Do you mean if we had 100 red, 50 blue, and 10 green, we may have picked the same 3 red, blue, and green marble each sampling event?

Yes, it may be that, by coincidence, we picked the same red marble 100 times, the same blue one 50 times and the same green one 10 times.

JFS321 · Mar 7, 2019

Thanks for all of the help!

Does Returning Marbles After Each Draw Affect Chi-square Test Results?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Who May Find This Useful

Similar threads

Undergrad Please Explain (actually explain) The Monty Hall Problem

Undergrad A variant of the Monty Hall problem

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad How do E[X] and E[|X|] relate?

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight