# Work out an estimate for the total number of ponies in the forest

• paulb203
paulb203
Homework Statement
Wyatt wants to work out an estimate for the total number of wild ponies in a forest

In July, Wyatt catches 42 ponies in the forest
He puts a tag on each of these ponies and releases them

In August, Wyatt catches 60 ponies in the forest
He finds that 5 of the 60 ponies are tagged

Work out an estimate for the total number of ponies in the forest
Relevant Equations
N/A
My question; Which branch of maths is this?
Also, can you give me a clue as to where to start regards solving this. Just a hint please, not a full explanation.

I'm struggling to even guess at this one. I did think, '60 ponies, 5 of which are tagged, so, 5/60 tagged, which is 1/12
1/12 of the 60 are tagged...
How does this relate to the initial number of 42 caught, and tagged, then released..?
Of the intial 42 caught, 100% were tagged...
Why did he only catch 42 in July, but caught 60 in August?

If only 5 of the 60 ponies were tagged that means 55 weren't. If 55 weren't, then none of those 55 belong to the first lot of 42 that were caught and tagged. So there is AT LEAST 97 ponies in the forest (?).

Q. They've asked for an estimate for the total number of ponies in the forest; do they want us to include any the 60 that Wyatt has caught at the time of asking, and hasn't released yet?

paulb203 said:
1/12 of the 60 are tagged...
If you catch 100 and find 30 have black manes, the rest brown manes, what would you guess about the proportion of black manes in the whole population?

MatinSAR, paulb203 and PeroK
paulb203 said:
1/12 of the 60 are tagged...
How does this relate to the initial number of 42 caught, and tagged, then released..?
Of the intial 42 caught, 100% were tagged...
Why did he only catch 42 in July, but caught 60 in August?
Suppose you compare these two ratios:
(5 of second sample tagged)/(60 total in second sample) = 1/12
versus
(42 of entire population tagged)/(??? total in entire population)

How would you expect those ratios to compare? first greater? equal? second greater?
From that, can you estimate (??? total in entire population)

PS. This type of question is in the basic field of probability. Advanced examples are in the field of sampling theory.

Last edited:
paulb203 and Delta2
paulb203 said:
Homework Statement: Wyatt wants to work out an estimate for the total number of wild ponies in a forest

In July, Wyatt catches 42 ponies in the forest
He puts a tag on each of these ponies and releases them

In August, Wyatt catches 60 ponies in the forest
He finds that 5 of the 60 ponies are tagged

Work out an estimate for the total number of ponies in the forest
Relevant Equations: N/A

My question; Which branch of maths is this?
Also, can you give me a clue as to where to start regards solving this. Just a hint please, not a full explanation.

I'm struggling to even guess at this one. I did think, '60 ponies, 5 of which are tagged, so, 5/60 tagged, which is 1/12
1/12 of the 60 are tagged...
How does this relate to the initial number of 42 caught, and tagged, then released..?
Of the intial 42 caught, 100% were tagged...
Why did he only catch 42 in July, but caught 60 in August?

If only 5 of the 60 ponies were tagged that means 55 weren't. If 55 weren't, then none of those 55 belong to the first lot of 42 that were caught and tagged. So there is AT LEAST 97 ponies in the forest (?).

Q. They've asked for an estimate for the total number of ponies in the forest; do they want us to include any the 60 that Wyatt has caught at the time of asking, and hasn't released yet?
The ideas in this question form the basis of statistical sampling. You can always have some fun with these questions. I like your answer of at least 97, because it might be hard to catch the same pony twice!

The underlying assumption is that Wyatt's method of pony counting is free from bias. So, 97 is definitely not the expected answer.

Have you enrolled in a statistical methods course without realising it?

paulb203
I think this problem the way is stated is kind of misleading and doesn't guide the problem solver to think of a simple solution. If instead the problem was something like this:

"From a box that contains an unknown number of white balls we select 42 balls we paint them black and put them back to box. Then we select (with a random procedure) 60 balls from the box and we find 5 to be black. What is an estimate for the initial number of white balls in the box?

Then I think the above statement is not misleading and the problem solver would have an easier time to find the solution.

paulb203, haruspex and PeroK
haruspex said:
If you catch 100 and find 30 have black manes, the rest brown manes, what would you guess about the proportion of black manes in the whole population?
Thanks, haruspex.

If I caught 110 and found 30 had black manes, and 70 had brown manes I would guess 30% of the whole population had black manes.

How does this relate to my question..?

The result of the second ‘round-up’ of ponies was;

He caught 60 and found 5 of them were tagged, 55 weren’t tagged.

So 5 of them had already been caught, in the first round-up (of 42), 55 hadn’t already been caught, they were first time captures.

Now, there are 60 captive ponies, 5 of which were caught first time around, and 37 ponies, at least, out in the forest, yes?

FactChecker said:
Suppose you compare these two ratios:
(5 of second sample tagged)/(60 total in second sample) = 1/12
versus
(42 of entire population tagged)/(??? total in entire population)

How would you expect those ratios to compare? first greater? equal? second greater?
From that, can you estimate (??? total in entire population)

PS. This type of question is in the basic field of probability. Advanced examples are in the field of sampling theory.
Thanks, Factchecker

I would expect the ratios to be equal; 1/12=42/504

An estimate for the total number of ponies in the forest is 504?

If so, I think I understand it better working backwards;

There are 504 ponies in a forest. Wyatt rounds up 42, tags them, then releases them. He then rounds up 60 to find 1/12 of them (5) are tagged. This is approximately what he expected as he had tagged 1/12 of the total population.

You said this is in the field of probability. My first thought, when I started to read your answer was proportion (direct proportion to be more specific); I’m guessing now that the field is probability; some of the tools are ratio and proportion?

paulb203 said:
If so, I think I understand it better working backwards;

There are 504 ponies in a forest. Wyatt rounds up 42, tags them, then releases them. He then rounds up 60 to find 1/12 of them (5) are tagged. This is approximately what he expected as he had tagged 1/12 of the total population.

I like to solve this problem like this:
• If you round up 60 ponies and ## \frac{5}{60} = \frac{1}{12} ## of them are tagged you can guess that ## \frac{1}{12} ## of all the ponies in the forest are tagged.
• You know that exactly 42 of the ponies in the forest are tagged.
• Now if 42 is exactly ## \frac{1}{12} ## of the ponies in the forest then there are exactly ## 42 \times 12 = 504 ## ponies in the forest in all.
• But ## \frac{1}{12} ## is only an estimate, so our answer of 504 is only an estimate too.
• Because we are working with estimates the best answer might be "I estimate that there are about 500 ponies in the forest".

paulb203 said:
You said this is in the field of probability.
The distinctions between fields of mathematics are a bit blurry but I think most people would consider this as in the field of statistics rather than probability.

paulb203 said:
some of the tools are ratio and proportion?
I don't think most people would consider these particularly as tools of statistics (or probability).

One important concept (or tool if you like) that is used here is the (unstated) assumption that the sample of 60 ponies is an unbiased sample.

paulb203, SammyS and FactChecker
pbuk said:
I don't think most people would consider these particularly as tools of statistics (or probability
Well ratio and proportion are tools of all branches of mathematics but they are elementary tools, I guess you consider as tools of statistics the central limit theorem or the least square method.

paulb203 and FactChecker
Delta2 said:
Well ratio and proportion are tools of all branches of mathematics but they are elementary tools, I guess you consider as tools of statistics the central limit theorem or the least square method.
I would prefer to call it a statistical or probability problem for the reason that the issue of drawing an unbiased, independent sample should be addressed. Admittedly, the problem statement does not say anything about that so it probably is not intended to be a statistical question.

paulb203
paulb203 said:
If I caught 110 and found 30 had black manes, and 70 had brown manes I would guess 30% of the whole population had black manes.

How does this relate to my question..?
So change "black manes" to "tags".

paulb203
pbuk said:
[*]Because we are working with estimates the best answer might be "I estimate that there are about 500 ponies in the forest".
[/LIST]

One important concept (or tool if you like) that is used here is the (unstated) assumption that the sample of 60 ponies is an unbiased sample.
Why would we assume that? Perhaps ponies usually go around in groups of 40-60. And, in the second sample we got a new group, plus a few stragglers from the first group that was rounded up the first time.

Honestly, IMO, "at least 97" is a better answer than "about 500".

paulb203 and pbuk
The problem statement does not say if the estimate should be a number. It might be an interval.
The "97" is an estimate of the lower end of the interval, but not necessarily the lower end: some tagged ponies could have died between the July and the August counts.

paulb203 and PeroK
Hill said:
The problem statement does not say if the estimate should be a number. It might be an interval.
The "97" is an estimate of the lower end of the interval, but not necessarily the lower end: some tagged ponies could have died between the July and the August counts.
Or left the forest!

paulb203
Hill said:
The problem statement does not say if the estimate should be a number. It might be an interval.
The "97" is an estimate of the lower end of the interval, but not necessarily the lower end: some tagged ponies could have died between the July and the August counts.
I did say you can have some fun with these problems!

paulb203 and Hill
There are all sorts of real-world issues that would need to be addressed in a problem like this that are not mentioned in the problem statement. IMO, that is a good reason to assume that the expected answer is a simple ratio calculation. It's just an academic exercise. In such a situation, I am not inclined to go looking for trouble with real-world issues.

Hornbein, paulb203, Delta2 and 1 other person
PeroK said:
I did say you can have some fun with these problems!

I would guess that the OP is at the early stages of learning some statistics: how does you having fun help him here?

SammyS
pbuk said:
I would guess that the OP is at the early stages of learning some statistics: how does you having fun help him here?
Problem does not state any projected 'variations' of the sample nor the population so one has to assume that the pony catcher was relying on homogeneity, rather than heterogeneity.

Question for you ( and all ).
Sampling - in this case all of the 60 ponies were collected, penned, and then counted, and all 60 released.
What if the pony catcher had collected one pony at a time, done his categorical counting, and then released to the wild population.
Would this have effected the statistical analysis?
Normally one assumes that the population is much larger than the sample. In this case it seems to be about 10%.

paulb203 and Delta2
Hill said:
some tagged ponies could have died between the July and the August counts.
That was one of the reasons I wrote what i wrote at post #5. Although in my equivalent problem with the box and balls, some paint could have smeared off the freshly painted black balls and give rise to "blackenwhited "balls a haha.

paulb203
pbuk said:
One important concept (or tool if you like) that is used here is the (unstated) assumption that the sample of 60 ponies is an unbiased sample.
I reckon, that's indeed the main issue here, that the probability to find tagged ponies in the sample is about the same as the probability to find tagged ponies in the entire population, which equality would mean that the sample is unbiased (with respect to the tagged property) indeed.

paulb203
pbuk said:
I would guess that the OP is at the early stages of learning some statistics: how does you having fun help him here?
Because these assumptions are the heart of this whole subject. It's a bit like teaching programming and saying don't bother with a test plan. I see this sort of teaching as anti-thinking. Just take the numbers you're given and manipulate them in the simplest way.

That's not what statistics should be about. Maths teaching should be better than this.

PeroK said:
Have you enrolled in a statistical methods course without realising it?
Thanks, PeroK.

And no :), it was just one of a variety of questions in my latest GCSE (UK) assessments.

pbuk
256bits said:
Problem does not state any projected 'variations' of the sample nor the population so one has to assume that the pony catcher was relying on homogeneity, rather than heterogeneity.

Question for you ( and all ).
Sampling - in this case all of the 60 ponies were collected, penned, and then counted, and all 60 released.
What if the pony catcher had collected one pony at a time, done his categorical counting, and then released to the wild population.
Would this have effected the statistical analysis?
Normally one assumes that the population is much larger than the sample. In this case it seems to be about 10%.
The Pony Catcher. Coming Soon, on Amazon Prime.

256bits
Delta2 said:
That was one of the reasons I wrote what i wrote at post #5. Although in my equivalent problem with the box and balls, some paint could have smeared off the freshly painted black balls and give rise to "blackenwhited "balls a haha.
Blackenwhited balls. Can you get cream for that?

Paulb203:
Hope you dont mind I elaborate on the case of the German Tank Problem, first setting some bsckground:
In Estimation theory, you consider estimators( for different statistics ; mean, etc)with different properties: biased/unbiased ( whether the expected value of the estimator equals the statistic it estimates), minimum variance, maximum likelihood, etc.
In the German Tank Problem above, the issue is to provide an estimate for the Maximum of a discrete distribution . One estimate is based on standard Frequentist theory, the other method is based on Bayesian . In this problem, in ww2, allies wanted to estimate the number of tanks that Germans were producing, based on serial numbers in tanks found that had been abandoned by the Germans.
Numbers were assumed to be assigned in numerical order, starting with 1. 4 tanks were found. The estimate of the maximum serial number ( thus an estimate of the number of tanks), was given by the largest serial number plus the average of the gaps between the other serial numbers. Say the numbers found were 13, 19, 42, 60. Then the max number found was 60, and the average gap was ##\frac{5+22+17}{3}=\frac{44}{3}##

Thus the estimate of the maximum was 60+44/3 =( approx) 75 tanks. This was based on frequentist theory.
But from your setup, there are differences to consider before applying this approach.
Maybe @Dale can verify or knows about the Bayesian approach?

Last edited:
paulb203
One common real-world bias that the original problem might have is the "self-selection" bias. Suppose it is only the slowest horses that tend to be caught in both the first and second group of horses. Then there might be a lot of fast horses that were never subject to the tagging and the estimate might tend to underestimate the total number of horses. The "self-selection" bias is very common. To avoid this possibility in the problem statement, it should be stated that all horses are equally likely to be caught.

PeroK
FactChecker said:
To avoid this possibility in the problem statement, it should be stated that all horses are equally likely to be caught.
Rather than stating assumptions in the question, statistics is often examined by asking students to explain them in follow-on questions; see for example question 15 in this specimen GCSE Statistics paper published by AQA:

Kirstie is estimating the population of fish in a lake.
She catches some fish and marks them with an[sic] harmless dye.
She then returns them to the lake.
One week later she catches a smaller sample of 50 fish and sees that 6 of them are marked.
She correctly estimates there are 1125 fish in the lake

(a) How many fish did she originally mark?

(b) (i) State two assumptions Kirstie makes to ensure this process is valid.
(b) (ii) Evaluate one of these assumptions; stating clearly which one it is.

FactChecker and WWGD
pbuk said:
Rather than stating assumptions in the question, statistics is often examined by asking students to explain them in follow-on questions; see for example question 15 in this specimen GCSE Statistics paper published by AQA:
I really like that. I think that they should be shown some good examples of how to state the problem before they are asked this.

Replies
4
Views
1K
• Feedback and Announcements
Replies
16
Views
3K
• Precalculus Mathematics Homework Help
Replies
1
Views
18K
• Math Proof Training and Practice
Replies
67
Views
10K
• Cosmology
Replies
23
Views
5K
• Introductory Physics Homework Help
Replies
2
Views
1K
• General Math
Replies
21
Views
3K
• STEM Career Guidance
Replies
37
Views
13K
• Cosmology
Replies
13
Views
2K
• Engineering and Comp Sci Homework Help
Replies
1
Views
19K