# I Yet another Bayesian probability question

1. Jul 16, 2016

### lavoisier

But a very simple one. Just to check I'm not getting it wrong.

Suppose you have a very large enclosure with 100 animals.
70 of these animals are cats, 30 are dogs.
There is enough food for all the animals, but you introduce a new type of food, to see whether either cats or dogs will show a preference for it.

You place the new food in the enclosure, in the form of 100 equal pieces. Each animal can take at most one piece. All animals can see and smell the food and have free access to it, undisturbed by their peers or by the other species.
After some time, you observe that 8 pieces are eaten by dogs, 2 by cats.

Can we say the following?
- assuming that past observations predict future observations, given that the next piece of food is eaten, the probability that it is a dog eating it is 8/10.
- so given that another N pieces of food are eaten, N*8/10 will most probably be eaten by dogs, and N*2/10 by cats, resulting in 4:1 odds that the the additional N pieces of food are eaten by dogs rather than cats.
- by Bayes theorem, given that an animal approaching the new food is a dog, the probability that it will eat a piece is (8/10) / (30/100) * P(E), where P(E) is the overall probability that a piece of food is eaten; and similarly, given that an animal approaching the new food is a cat, the probability that it will eat a piece is (2/10) / (70/100) * P(E).
- so if one cat and one dog were presented with one piece of the new food, the odds it will be eaten by the dog rather than by the cat are (8/10) / (30/100) / [(2/10) / (70/100)] = 28:3.
- we would get 1:1 odds only if each group on animals in the 'test' had eaten a number of pieces proportional to their representation in the population, i.e. in this example, 3 pieces by dogs and 7 pieces by cats.
- or in other words, in our example dogs are disproportionately more drawn to the new food than cats are, with odds of about 9:1.

Does it make sense?

Thanks!
L

PS
Please consider that this is just a thought experiment to check my understanding of a statistical theory, no need to point out zoological or ethological contradictions.

2. Jul 16, 2016

### Orodruin

Staff Emeritus
This is not correct because your initial distribution has changed. Consider the extreme case where all dogs have eaten their food and only one cat. The probability of the next food being eaten by a dog is zero because no dogs have food left.

In general, your setup seems to involve some unstated assumptions and the result will generally depend on those.

3. Jul 16, 2016

### EnumaElish

Pr(eaten and by a dog) = P(eaten|dog) P(dog) = P(dog|eaten) P(eaten)

P(E | dog) = P(dog | E) P(E)/P(dog)

= 0.8 0.1/0.3 = 0.8/3

P(E | cat) = P(cat | E) P(E)/P(cat)

= 0.2 0.1/0.7 = 0.2/7

P(E | dog)/P(E | cat) = 0.8 7/(0.2 3) = 5.6/0.6 = 9.3333...

As I understand it, @Orodruin 's point is P(E) changes over time. You could say that does not affect the odds because the P(E)'s for cats & dogs cancel out. However when P(E) = 0 you'd get 0/0, which is undefined.

4. Jul 16, 2016

### Orodruin

Staff Emeritus
No, this is not my point. My point is that there is a difference in having 70 dogs and 30 cats to having 62 dogs and 28 cats. Since the relative number of cats is higher in the second case, any sensible model for how the food is eaten (eg, the time for each animal eating its food being exponentially distributed) will result in a larger probability in the next one being eaten by a cat.

5. Jul 16, 2016

### lavoisier

Thank you both!

If I understand @Orodruin 's point, it's a bit like the difference between binomial and hypergeometric, or sampling with or without replacement.
But then I'm even doubting my previous conclusion P(dog | E) = 0.8 .
Can I conclude this, not only for the test set where it's evidence, but as a prediction for the future, from the observation that 8/10 pieces were eaten by dogs? Or should I take into account the faster decreasing number of dogs compared to cats?

Maybe I could reformulate the problem as follows (not sure it makes it easier, but that's what I'm trying to figure out).

Suppose we have a very large population of animals with 70% cats and 30% dogs, and after a short 'test' on the population we observe that 8 dogs showed a preference for a new type of food, whereas only 2 cats did.
Knowing this, can we make inferences on the future behaviour of these animals?

In particular:

1 - if we make the new food available on the market, and we assume that the animals themselves (from the above large population) can choose it, after we sell 1000 pieces, what is the most likely proportion of pieces sold to cats and to dogs? Is it 80%:20% like in the test scenario, so 800:200?

2 - if we take a single animal from the population, and we confront it with the new food, what is the probability that it will choose it, if it's a dog? And if it's a cat? Is it correct to use Bayes' theorem as I showed above (numerically) and @EnumaElish formulated? And if so, would it be correct to conclude that dogs are about 9 times as likely as cats to choose the new food? And that our marketing campaign will be about 9 times as successful if we 'target' dogs rather than cats?

After this discussion I will have nightmares with cats and dogs attacking me from all sides, as a punishment for using them in thought experiments :O(

6. Jul 16, 2016

### Stephen Tashi

You haven't described a specific model for the experiment yet. Let's try this model:

The population of dogs consists of two types of dogs, D1 and D2. When a type D1 dog is offered the new food, the dog always chooses to eat it. When a type D2 dog offered the new food, it always declines to eat it.

The population of cats consists of two types of cats, C1 and C2. When a type C1 cat is offered the new food the cat always chooses to eat it. When a type C2 cat is offered the new food, it always declines to eat it.

The experiment consists of randomly selecting 30 dogs and 70 cats and offering each animal the new food.

Assume that each choice of a dog for the experiment has a probability of p_d1 of selecting a dog of type D1.
Assume that each choice of a cat for the experiment has a probability of p_c1 of picking a cat of type C1.
Assume p_d1 = 8/30 and p_c1 = 2/70.

Assume that in the 100 offerings of the new food, there were 10 where the animal chose to eat it the food.

If we pick one of those 10 offerings at random, what is the probability that we pick an offering where the animal that ate the food was a dog ?

7. Aug 8, 2016

### lavoisier

If I understand correctly, you see the selection of the 30 dogs and 70 cats as the initial part of the experiment, implying that if we repeated the experiment with another group of 30 dogs and 70 cats, we may get a different result, because our sample may not be perfectly representative of the whole population.

This is very interesting, and I believe it applies to many similar situations where sampling is involved.
I studied a problem about elections in the past, and I found there is a correction factor ( sqrt [(N-n)/(N-1)], where N is the total number of votes to count and n is the number of votes counted so far ) that must be applied to the standard deviation of the proportion of votes. This factor is clearly close to 1 at the beginning and becomes exactly zero when you've finished counting the votes, because of course there can be no doubt about the population proportion once you've actually counted all the votes.
I wonder if and how something like this could be applied here. Would it allow me to make a statement about the uncertainty on the proportion of dogs or cats that behave in a certain way (i.e. on your p_d1 and p_c1, if I'm not mistaken), given that I only sampled 30 or 70 individuals from a much larger population?
But OK, for now let's say that I'm only interested in the proportion means, and that they are 8/30 and 2/70, respectively, as you pointed out.

The second part of the experiment consisted in offering each of the 100 animals the new food and observing if they ate it or not.
From this observation we derive the above proportions.
Doesn't this actually answer my initial question, i.e. the propensity of an individual animal to eat the new food?
I would basically conclude that p_d1 is the proportion of new-food-eating dogs and p_c1 is the proportion of new-food-eating cats, giving a ratio of 8/30 / (2/70) = 28/3 as calculated before.

We already know that 10 pieces of food were eaten, 8 by dogs and 2 by cats.
So if I pick one of these 10 pieces at random, for me the probability that it was eaten by a dog is 80%, by a cat 20%, here too giving 80/20=4/1 as before.

Of course if I decided to sample the population differently, say 50 dogs and 50 cats, the number of pieces eaten per species may be different.
If we believe the proportions of category 1 animals from the first experiment represent well the whole population, then 50*8/30 = 13.33, 50*2/70 = 1.43, so I would expect about 14-15 pieces to be eaten, about 13 by dogs and about 1-2 by cats.
So about 13.33/14.77 = 90% of the pieces would be eaten by dogs and 10% by cats.
But once it is established what the percentage of dogs and cats in the overall population is, this proportion is also fixed.

Am I getting this right?

Thanks
L

8. Aug 9, 2016

### lavoisier

I just had a look at the Wikipedia page on the hypergeometric distribution, which I think is the theoretical basis of sampling in a finite population.

https://en.wikipedia.org/wiki/Hypergeometric_distribution

If I get it correctly, if we have a population of N dogs, K of which are of category 1 (they eat the new food), and we sample n dogs, we are most likely to pick:

dogs of category 1, with a variance:

https://wikimedia.org/api/rest_v1/media/math/render/svg/292428e615879c44b2f4827fc0405295af6c9631

Of course we don't know K, so we assume the experimentally found number of dogs of category 1 is close to the above mean, thus:

n_d1_exp ≈ n*K/N

hence:

p = p_d1_exp = n_d1_exp / n ≈ K/N

and the variance is then:

var ≈ n * p * (1-p) * (N-n)/(N-1)

So if I sample 30 dogs from a very large population (--> the correcting factor is not important) and find that 8 are of category 1:

p = 8/30

var ≈ 30 * 8/30 * (1-8/30) = 5.87

If I wanted the standard deviation on n_d1_exp, is it just the square root of var? Like:

SD_n_d1_exp ≈ sqrt(5.87) = 2.42

I see from other sources that the standard deviation on p would be:

SD_p ≈ sqrt ( p * (1-p) / n ) = sqrt ( 8/30 * (1-8/30) / 30 ) = 0.08

Which seems OK, because if p = 8/30 = 26.7% with SD 8%, on a sample of 30 dogs I would expect an average 30*26.7% = 8 category 1 dogs with a SD of 30*8% = 2.4.

For cats then p_c1 = 2/70 = 0.0286, and SD_p_c1 = sqrt(2/70 * (1-2/70) / 70) = 0.02.

Am I doing anything wrong in this reasoning so far?

If not, the next step for me is to calculate the uncertainty on the ratio between p_d1 and p_c1.
From the relative error, quite large especially for cats, I suspect the uncertainty on the ratio could be quite large too.

There are still a few things I don't get though.
If by chance we found in our sample p_exp = 1, would we conclude that the variance is 0?
And in general, what is the meaning of a SD on a proportion like p, when p is actually between 0 and 1?
Isn't SD only applicable when the variable can take any real value?

Thanks
L

9. Aug 9, 2016

### Stephen Tashi

No, because p_d1 and p_c1 are not random variables. They are population parameters. So they don't have "uncertainty" in the sense of a standard deviation. A statistic that attempted to estimate p_d1 or p_c1 from sample data is a random variable, so that statistic would have a standard deviation.

Asking about "the propensity of an individual animal to eat the new food" is not a mathematical question unless you can state it as a question about a specific event.

A ratio is a ratio. Are you saying this ratio has some significance ?

No, we don't. That wasn't given in my statement of the problem.

You're answering the question " Given that 8 pieces of food are eaten by a dog and 2 pieces are eaten by a cat, what is the probability that if we select one piece of food at random that it was eaten by a dog?" That question completely ignores any differences in how the population of dogs differs from the population of cats in the probability of eating the new food. Once you say "Given 8 pieces are eaten by a dog and 2 pieces are eaten by a cat" , it doesn't matter if 99% of the dogs like the new food or 23% of dogs like the new food.

10. Aug 10, 2016

### Stephen Tashi

No, the mean value is not the most likely value. The mode is the most likely value.

You keep using language that fails to distinguish between population parameters and the statistics that estimate them. What you call "var" is not the variance of p_d1_exp. It is an estimator of the variance of p_d1_exp. The variance of p_d1_exp is a single number. The estimator that you call "var" is a random variable because it depends on the number of dogs that liked the food, which you assume is equal to K for purposes of calculating "var".

Yes, those are estimates. But if we do things correctly, we have to ask if the formulas you are using are the "best" estimators. It often happens that a formula for "good" estimator resembles the formula for the population parameter, but this is not always the case. (For example, you could consider the alternative of assuming the number of dogs that liked the food in the sample is equal to the mode of the hypergeometric distribution instead of the mean.) The formula for an estimator contains variables that represent something observed from the sample, so the estimator is a random variable. The formula for a population parameter contains only variables representing parameters of the population.

You could declare the square root to be an estimator of the standard deviation. It might be a good one. But rules like "There is .6827 probability that the observed value of n_d1 is within one standard deviation of the mean value of n_d1" don't necessarily apply because we aren't dealing with a normal distribution. So it isn't clear what knowing an estimate of the variance is worth.

I have no idea where you're going with these estimates. You still have not clearly stated the mathematical question (or questions) that you wish to answer.

I have no idea why you think that's the next step.

I don't know whether you intend to "calculate" a population parameter from other population parameters or estimate a population parameter from data in a sample. Presumably if p_d1/p_c1 is something that can have an "uncertainty" then p_d1/p_c1 is the ratio between two random variables.

The value of the estimator for the variance (that you are using) would be zero. What you conclude about the population variance from knowing the value of the estimator is up to you.

The population parameter "Standard deviation" has a definition that expresses it as calculation that depends on the distribution of the random variable in question. There's nothing in that definition that restricts the domain of the random variable. If you are asking about "meaning" in the sense of "Does the 68% rule apply" then it may not. The common rules of thumb relating probability to distances in terms of standard deviations are developed from the normal distribution and don't necessarily apply to other distributions. Chebychev's rule is widely applicable. https://en.wikipedia.org/wiki/Chebyshev's_inequality

11. Aug 10, 2016

### lavoisier

OK Stephen, thanks.
I think this discussion is getting a bit circular.
Just for clarity, no, I did not ask exact 'mathematical questions' because I am no mathematician; I proposed a situation that as a non-expert I didn't know how to model / interpret mathematically, and I asked you expert people to help me with that.
I did ask some questions, and with your reply you proposed an approach and asked _me_ some other questions, which I tried to answer with the disastrous result that apparently I didn't get at all what you meant.

The simplest formulation of the problem I would like to solve is the following.
Suppose we have in the world a total population of 7 000 000 cats and 3 000 000 dogs.
We want to increase our knowledge about one specific behaviour of cats and dogs (their propensity to eat a certain new food).
Of course we can't test all the animals in the population, so we take a random sample of 70 cats and 30 dogs.
We confront each animal in this sample with the food, and observe that only 2 cats and 8 dogs eat it.

Questions:
1. can we conclude that the best estimate of the total number of animals in the overall population that would eat the new food is 2/70 * 7 000 000 = 200 000 cats and 8/30 * 3 000 000 = 800 000 dogs? And if not, what would it be instead?
2. given the small size of our sample compared to the population (assuming that it matters - if not please ignore it), I imagine that our estimate of the above total numbers (200 000 and 800 000, or whatever the right numbers are) is subject to uncertainty. If so, can we estimate such uncertainty?
3. if one randomly selected animal taken from the population of all cats is presented with the new food, what is the probability that it will eat it? And if it's taken from the population of all dogs?
4. if a new group of randomly selected N cats and M dogs is presented with the new food, can we estimate how many cats and how many dogs would eat the new food?

If there is an answer to the above questions (which may also be 'there is no answer') I would like to know it or be pointed to some reference or theory addressing it, please.
But I'm OK to leave it if it takes such an enormous (and unproductive) effort from all of us.

Thanks!

12. Aug 10, 2016

### Heinera

1. Yes (given that we have no further information than what you have already told us). Obviously, the word "best" is subjective and will in general require a precise mathematical definition (which would take the discussion to far in this case), so my answer presumes the generic understanding of "best". But there are other definitions of "best".
2. Yes, the estimate is subject to uncertainty, and with some further information, the uncertainty can also be estimated.
3. Given the information we have, the subjective probability that a cat sampled by a uniformly random procedure will eat the food is 2/70, and for the dogs it's 8/30 (aka 1/35 and 4/15).
4. Yes. Multiply by the factors in answer number 3.

Last edited: Aug 10, 2016
13. Aug 10, 2016

### Stephen Tashi

It's a significant condition that each animal has the opportunity to eat or decline the food. Since each animal has the opportunity to eat the food we don't have to imagine that animals interact with other animals. We can imagine that each animal is tested in isolation from the other animals.

This can be answered mathematically if we quantify "best" and "better". An estimator E is a random variable. There are different criteria that can be used to determine if an estimator is "best". Among these criteria are:

1) Unbiased: An estimator E is defined to be "unbiased" iff its mean value is equal to the quantity that it is attempting to estimate.

2) Maximum likelihood: An estimator E is defined to be a maximum likelihood estimator iff setting the parameter it estimates equal to the realized value of E maximizes the likelihood of the sample. (There is technical distinction between "the probability" of the sample and "the likelihood" of the sample in the case of continuous random variables. The "likelihood" of the sample is the value of the joint probability density function for the sample and the value of a density function is technically not a "probability of" something.)

3) Minimum variance: An estimator E is defined to be a minimum variance estimator if, among the set of all possible estimators for a parameter, it's distribution has the smallest variance. ( The requirement of "Minimum variance" is usually combined with the requirement of "unbiased" since we want an estimator who variance is computed about the correct value of the parameter that it is trying to estimate.)

The good news is that "the first estimator that comes to mind" is often the best estimator by one or more of those criteria. I'm tempted to agree that the estimates you propose are best, but there are sometimes surprises, so I'm going to research the matter first.

I think we can find a good estimator for it. (I see now why you were investigating the hypergeometric distribution.) When you search for material on estimators, it's a good idea to include the term "estimator" in the keywords. For example "estimator of the variance of ..." is a different topic that "variance of...".

Your model for the experiment is two examples of the same situation. You could have asked the above questions about a population of dogs and not mentioned anything about the population of cats. The population of cats problem is the same general problem with different constants.

There are problems where the two populations would be significant. For example, suppose we wish to estimate the answer (yes or no) to the question "Is a randomly selected dog more likely eat the new food than a randomly selected cat ?" The intuitive way to answer the question is to estimate the probability that a dog will eat the the food, estimate the probability that a cat will eat the food and give a definite yes or no answer depending on which probability is greater. However a procedure to estimate the answer (yes or no) is actually a random variable. So we can ask if a particular estimation procedure for the answer is unbiased, maximum likelihood, etc.

The questions you are asking about the "uncertainty" of the estimates 2/70 and 8/30 indicate that you understand the complexity of answering the yes-or-no question. If the estimators producing these numbers have standard deviations that are large then some of the time they can produce the wrong answer to the "who is more likely" question.

14. Aug 11, 2016

### lavoisier

Thank you both very much, it's much clearer now.
My approach is clearly too unsophisticated, a bit due to what I learned at university (statistics for chemists).
There are surely assumptions and simplifications in my way of looking at this that make the problem too vague for those who studied higher mathematics.

The reason why I kept together cats and dogs (!), although I see that the two groups could be considered separately, is because at the end of the process I would like to understand the following.
If I am looking at a group of N cats and M dogs, and from the above percentages I estimate that in this group the number of new-food-eating dogs is larger than the number of new-food-eating cats, can I, and if so how, test if the difference is statistically significant (with a given level of confidence)?

This link explains how to test the difference between two proportions, which I guess is not exactly the same.

http://stattrek.com/hypothesis-test/difference-in-proportions.aspx

15. Aug 11, 2016

### Stephen Tashi

Do you mean "proportion of food eating...." or "number of food eating ...." ?

Estimation and Hypothesis Testing are different statistical procedures. "Confidence" is a term that applies to estimation. "Significance" is a term that applies to hypothesis testing.

That hypothesis test is applicable to your problem if you accept that sampling with replacement is an approximation for sampling without replacement. That's a reasonable assumption when the total population is large compared to the sample.

But you must understand that a hypothesis test does not tell you the probability that you are making the right decision. (In particular, the level of "statistical significance" doesn't tell you this.)

The "frequentist" approach to answering the question "What is the probability that I will make the right decision" is to create a "power curve" for the test. The informal substitute for a power curve is to use simulation or calculation to answer the question "What is the probability that I will make the right decision" in some specific situations. For example, you can pick two proportions (e.g. 23/28 and 83/330 ) and find the probability that your test makes the correct decision given that those are the actual proportions. By doing this for a number of different assumed proportions, you can get a feel of how reliable the test is in detecting various size disparities in the proportions.

The Bayesian approach to the problem can tackle the question "What is the probability that i make the correct decison" head-on, but it requires assigning a joint prior distribution for the two proportions. If you have data on proportions of animals liking other new foods then you might fit a prior distribution to that data.

-----

Returning to the topic of estimation. This documentation page for the "R" language gives some estimators for parameters of the hypergeometric distribution. http://rpackages.ianhowson.com/cran/EnvStats/man/ehyper.html [Broken]

Last edited by a moderator: May 8, 2017