Calculating the Probability of Two Boys for Math-Obsessed Friend

JeffJo · Feb 14, 2011

Dadface said:

We know that, regardless of how the information was obtained, the son has one sibling. We further know that there is a fifty percent chance that the sibling is a boy and a fifty percent chance that it is a girl. The answer is 1/2. Where's the paradox?

If the family has two boys, which one is the "the son" that I colored in red? If you can't identify him, you have to consider the chances of the two children together.

Essentially, you are treating the problem statement as "I know about one particular child of Mr. Smith's. That child is a boy." When that is the problem statement, the answer is indeed 1/2 because the un-pictured child has a 1/2 chance to be a boy, as you reasoned. But that is assuming information not included in the problem. You are assuming a specific child is identified.

Others treat it as "I know about both of Mr. Smith's children. The pair includes at least one boy." When that is the problem statement, the answer is 1/3 because 3/4 of all the possible families include a boy, and 1/4 include two, so the answer is (1/4)/(3/4)=1/3. But this also assumes information not included in the problem. It assumes you actually know both ngenders: you need to, in order to always find a boy when one is there. It is implicit that you can know about one boy, but not both. It also assumes the intent to look for a boy, something you can't deduce from the statement alone.

Most people who take the second approach accuse those who answer "1/2" of taking the first. That's why it is usually compared to the Mr. Jones question. While many do exactly that - as did you - that isn't the only way to get 1/2. I prefer to think of it as "I know a gender that exists in Mr. Smith's family." This assumes nothing, since all that is really implicit in the statement is that you know a gender. This is more complicated to solve, because you have to allow for the possibility that you might know "girl." In fact, the whole reason conditional probability is unintitive is that you have to allow for things you know didn't happen. But the short of it is that in the 1/2 that have one boy and one girl, it is just as likely to know about the girl as the boy.

An example of "knowing a gender" might be if a new family moves into the three-bedrooom house next door. You might deduce they have a boy by seeing a boy's bicycle in the driveway, but you can't associate it with a specific child.

+++++

But since you are interested in history, I have traced what may (it's speculation, but it seems plausible) be the history of the Tuesday Boy Problem, and how top-level experts can't always agree.

In his 1988 book "Innumeracy," Professor John Allen Paulos of Temple University included a variant of the Two Child Problem as an example, paraphrased: "If Myrtle is a girl from a family of two, what is the probability she has a brother?" His answer was 2/3 (note that he asks for the probability if a mixed family, not two girls).

J. L. Snell (Dartmouth) and R. Vanderbei (Princeton) pointed out in a 1995 article titled "Three Bewitching paradoxes" that, by giving the girl an uncommon name, he inadvertantly changed the problem. The probability Myrtle has a sister should be 2/(4-p), where p is the probability a girl is named Myrtle (this is the same formula that would be used for the Tuesday Boy question, if it asked fro a mixed family). In his 2008 book "The Drunkard's Walk," Leonard Mlodinow (Stanford, and a frequent collaborator with no less than Stephen Hawkings) used the same problem, about a girl named Florida.

Snell and Vanderbei ignored the illogic of having two girls named Myrtle in the same family, and included that case in their count. Mlodinow argued that the factor for it depends on p^2, which is negligibly small compared to the other kinds of families. Which is wrong: compared to the fraction of the famlies he is counting, it depends on p^1.

Giulio D'Agostini (Universit`a “La Sapienza” and INFN, Rome, Italy) attempted to correct for that factor, but did it wrong by disallowing only girls named Myrtle/Florida. He allowed two girls of any other name. But at least he got the right answer, 1/2, which is quite trivial to prove by a different method!

Define the following events: M2 is the event where a family of 2 children includes a girl named Myrtle. MO is the event where she is the older sibling, and in MY she is the younger. Finally, MB is the event where she has a brother. Everybody will agree that the probability Myrtle has a brother, given that she is either the older or younger sibling, is 1/2. Which is the first statwement in this progression:

P(MB|MO)=1/2
P(MB|MY)=1/2
P(MO|M2)=Q (Most will say it is actually 1/2. I use a variable because I don't need to know it, and the error I'll demonstrate makes it a little more than 1/2.)
P(MY|M2)=1-Q (That is, MO and MY represent all possibilites in M2, and do not overlap)
P(MB|MO)*P(MO|M2) = P(MB and MO|M2) = Q/2
P(MB|MY)*P(MY|M2) = P(MB and MY|M2) = (1-Q)/2
P(MB and MO|M2) + P(MB and MY|M2) = P(MB|M2) = Q/2 + (1-Q)/2 = 1/2.
QED.

The error that Snell, Vanderbei, and Mlodinow make is that by allowing two girls named Myrtle/Florida in the same family, MO and MY overlap. Both P(MO|M2) and P(MY|M2) are equal to Q=2/(4-P), which is greater than 1/2. But they still use equation #7, which is invalid if MO and MY overlap. As a result, they get P(MB|M2)=2/(4-P), or the probability Myrtle has a sister is (2-P)/(4-P).

And what is event worse, is that my derivation above is wrong. Specificalky, equation #2 is wrong. #1 is right because the gender of a second child is independent of both the gender and name of the first child, but the name (not gender) of a second child will depend on the name of the first child, if the gender is the same. I won't go into details unless asked, but it turns out that P(Myrtle has a brother) is approximately equal to 1/2+[P(a girl receives the "average" name)-P(a girl gets named Myrtle)]/8. In other words, for uncommon names, the probabiltiy is greater than 1/2!

Dadface · Feb 14, 2011

JeffJo,thanks for your replies.Consider the following:

1.Mr Smith has a son.
2.Mr Smith has two children

The two children must be siblings and therefore the Smith children must be either brother with brother or brother with sister.It follows that if there is a son the probability of the second child being a son is 1/2.

This is an attempt to structure my reasoning such that I don't need to identify any of the children.

JeffJo · Feb 14, 2011

Dadface said:

JeffJo,thanks for your replies.Consider the following:

1.Mr Smith has a son.
2.Mr Smith has two children

The two children must be siblings and therefore the Smith children must be either brother with brother or brother with sister. It follows that if there is a son the probability of the second child being a son is 1/2.

This is an attempt to structure my reasoning such that I don't need to identify any of the children.

Correction: it must be brother with brother, older brother with younger sister, or younger brother with older sister. Each of those cases is equally likely to exist. The fact that you cannot specify the relative ages for the first group changes how you account for them. This is the point mathematicians are trying to demonstrate with this problem.

But they overlook that "girl" cannot apply to that case, but can apply to the others. That also changes how you account for them. The answer is 1/2, but for reasons different than you propose.

JeffJo · Feb 15, 2011

For some reason, I can't edit posts today. I misremembered where Leonard Mlodinow teaches. He's at Caltech, not Stanford (there is another professor who is part of the story - because he is one of the few who acknowledges the ambiguity in these problems - at Stanford).

Dadface · Feb 15, 2011

JeffJo said:

Correction: it must be brother with brother, older brother with younger sister, or younger brother with older sister. Each of those cases is equally likely to exist. The fact that you cannot specify the relative ages for the first group changes how you account for them. This is the point mathematicians are trying to demonstrate with this problem.

But they overlook that "girl" cannot apply to that case, but can apply to the others. That also changes how you account for them. The answer is 1/2, but for reasons different than you propose.

Thanks for your feed back JeffJo.When I wrote post 62 I forgot that age differences need to be considered.What a dope I am.I wasn't familiar with these type of problems before but now I'm hooked.

Dadface · Feb 16, 2011

Can someone confirm,or otherwise that with problems of this type we must use the information contained in the problem statement only,even though this might be ambiguous and open to different interpretations,and that we are not allowed to bring our own extra knowledge to the task?.Consider the boy girl paradox.We don't need to be told that the two siblings have different ages,this is knowledge we bring to the task.We don't need to be told that if there are two boys they can be identified because they have numerous other differences in addition to having different ages but this is knowledge it seems we are not allowed to bring to the task.The restrictions imposed on what we can and cannot use seem arbitary and artificial.Consider this...we are required to suspend all knowledge that if there are two boys they will have different age related bodily conditions but retain the knowledge that they(siblings) have different ages.

JeffJo · Feb 16, 2011

Dadface said:

Can someone confirm, or otherwise that with problems of this type we must use the information contained in the problem statement only, ...

Of course. But like Inigo Montoya said, I don’t think that means what you think it means.

We know the family has two distinct children. That means that they can be differentiated from one another by somebody, but not necessarily by us. And we do know this, whether or not it is stated in the problem.

Age is a very convenient tool to use to express the difference in terms of an order. But it isn't the only one we could use. I prefer to alphabetize the first name of each child's best friend, or order them clockwise (relative to mother) as they sit around the dinner table. As long as it is unique (which I suppose friends' names might not be, but let's assume they are) and independent of gender, all that matters is that the order exists.

We don’t even have to know how the order applies to any family; we use it only to calculate the proportions of the various groups of families we need to keep track of. Here's some simpler examples:

A six-sided die has two red sides, and four white sides.; The red sides are opposite each other. Other than that, no side can be distinguished from another. What is the probability that a red side will come up on a roll? Is it 1/2, because there are two colors? Or 4/6=2/3, because there are four red sides out of six, even though we can't distinguish them?
Two normal dice are completely indistinguishable. What is the probability their sum is 7 when you roll them? Is it 1/11, because there are 11 different totals that could come up? Or 3/21=1/7, because there three unordered pairs of numbers that add up to 7, out of 21 possible unordered pairs? Or 6/36=1/6, because there are 36 possible ordered pairs, and 6 of them total 7?

The last answer is right for each, of course. The different groupings exist, and we know they exist, even though it might be ambiguous what group a particular roll belongs to. The problem with how this Paradox is normally presented, is that it makes age sound like an we need ultimately to know the age, and we don't.

+++++
But you can't add information to a problem if it isn’t there, or clearly implied. That is why the Two Child Problem is generally ambiguous, but the best answer is 1/2.

When a man walks up to you and says "Mr. Smith has two children, and one is a boy," do you know, for a fact, how he decided to tell you that? In particular, do you have any reason to believe one of these possible reasons (and there are others) is preferable to another? (Note that you have to assume he never intended to tell you all the information that applies to Mr. Smith, since there would be no probability problem then. That makes all possibilities a little unrealistic, so don’t be surprised by it.)

He knows about only one child of Mr. Smith's.
He picked a gender at random from what is either one, or two, that exist in the family.
He is predisposed to mention boys, and so the tells you about a boy if he can, and a girl if he can't.
He is predisposed to mention girls, and so the tells you about a girl if he can, and a boy if he can't.

He could say "one is a boy" with any of these, so you can't pick one that must be true. So most wordings of the problem are ambiguous. But most of the reasons that you can put in the list require adding some information. Does the problem say he knows about both of Mr. Smith's children (as #2, #3, and #4 require)? Some do, but the Mr. Smith one doesn't. Does it say he is predisposed toward one gender or the other? Very few do. And the 1/3 answer requires that the answer to both questions be "yes." Without both, the answer is 1/2.

Calculating the Probability of Two Boys for Math-Obsessed Friend

Similar threads

Predictions for the Nobel Prize in Physics 2025 (results: John Clarke, Michel H. Devoret and John M. Martinis)

Why do we spend so much time learning grammar in the public school system?

What is the deepest/most impactful statement that you have ever seen?

Why do we spend so little time learning grammar in college?

Kitten raising advice

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers