There’s a google interview question that’s been discussed quite a bit in various forums recently.

The question is: There is a certain country where everybody has to have a son. Therefore each couple keeps having children until they have a boy, then they stop. In expectation, what fraction of births are female?

Apparently the “official google answer” is 0.5 due to the fact that the expectation of each birth being a girl (or boy) is always 0.5. The “stopping rule” does not change this expectation.

The controversy is that some say that the official google answer and argument are wrong.

 Blog Entries: 8 Recognitions: Gold Member Science Advisor Staff Emeritus Well, they ask "What fraction of births are female?" But a birth is always 50/50, so half the births are female. However, if you would ask "What fraction of the population is female", that's a more difficult question. But I guess that the answer would be 0.5 to...
 Mentor I assume you mean the recent imbroglio created by Steven Landsburg, including his \$15,000 bet (http://www.thebigquestions.com/2010/...dsburgs-money/). Two comments: 1. Landsburg is wrong. 2. He won't lose any money. There are essentially two ways to set up this simulation. The simulation could be set up to answer the question as raised, in which case the outcome will be 50-50. Landsburg will not agree to such a set up. The simulation could also be set up to produce a biased estimator, in which case the other party should not agree to the setup. It isn't answering the question as raised.

Micromass- yes "what fraction of the pollution are female is how it phrased". It was changed to female births to reflect the fact that most discussions of this problem, seem to ignore the mothers- who of course, are female.

DH- Why is Landsburg wrong? Can you put your finger on which part of his logic is faulty? Is it the use of the family as a model for a country?

 Mentor I don't know what his problem is (or rather, I don't know why he is being so persistent). I do see what his problem is. He has been told in multiple places, in multiple ways, and in no uncertain terms that he is wrong. His problem is far even more basic than a biased estimator. Hhe is doing statistics incorrectly. His 44% figure arises from a high school, maybe even grade school, algebra mistake. Suppose you drive 10 miles at 10 miles per hour and another 10 miles at 20 miles per hour. What is your average velocity? By his logic, the answer would be 15 miles per hour.

Mentor
 Quote by DeDunc Why is Landsburg wrong? Can you put your finger on which part of his logic is faulty?
He's calculating the wrong thing. The average fraction of girls per family is not the same as the fraction of girls in the overall population.

Consider the following case: there are 10 families in town. 9 have one girl and no boys, and 1 has 9 boys and no girls. In this case, the fraction of girls in the population is 50%: 9/(9+9). However, the average fraction of girls per family is 90%.

 Recognitions: Science Advisor Given that each couple have completed their job of getting one boy and stopping at that moment, the possibilities for each couple are B, GB,GGB,GGGB,... and so on, where B is boy and G is girl. The probability for each outcome is respectively 1/2,1/4,1/8, hence the expectation for fraction of boys in some family is 1/2*1+1/4*1/2+1/8*1/3+... = $\sum_{k=1}^{\infty} \frac{1}{2^k k}$, and the expectation for the fraction of girls in some family is the complement of this. Define $f(x) = \sum_{n=1}^{\infty} \frac{x^n}{2^n n}$. We have $f^{\prime}(x) = \sum_{n=1}^{\infty} \frac{x^{n-1}}{2^n} = \frac{1}{2}\sum_{n=0}^{\infty} (\frac{x}{2})^n = \frac{1}{2-x}$, hence $f(x) = C-\log(2-x)$. Since f(0) = 0, we have $C = \log 2$, giving "fraction of boys" = $f(1) = \log 2 - \log 1 = \log 2$. Thus the fraction of girls is $1-\log 2$. I am having second thoughts on this, it is confusing. I am almost certain my answer above is wrong, but I don't really see why.

Mentor
 Quote by Jarle I am having second thoughts on this, it is confusing. I am almost certain my answer above is wrong, but I don't really see why.
You are doing exactly what Vanadium talked about in post #6. Landsburg did something different, but also incorrect.

 Recognitions: Science Advisor Right, so 1-log(2) is the expected fraction of girls in some given family, not corresponding to the overall fraction. This is the algebraic mistake a/b + c/d = (a+c)/(b+d). I believe this is an illustration of Simpson's paradox.
 Mentor I think it the algebraic mistake of a/b + c/d = (a+c)/(b+d). Or worse.
 I have never heard of Lansburg before this thread, but I consider the proposition to be an inappropriate application of probability theory. Life experience, that is measurement, of the birth rate suggests that Pgirl = Pboy = 0.5 We should not apply the statistics of perfect unbiased coin tossing to a situation where we can never prove the event (birth) to be unbiased. We can define or even manufacture unbiased coins to any required degree of acuracy, but we cannot manufacture unbiased mothers. We cannot even show that birth events are not linked; birth statistics suggest otherwise, so while the overall probability approaches 0.5 girls or boys appear to run in families.

Mentor
 Quote by Jarle Given that each couple have completed their job of getting one boy and stopping at that moment, the possibilities for each couple are B, GB,GGB,GGGB,... and so on, where B is boy and G is girl. The probability for each outcome is respectively 1/2,1/4,1/8, hence the expectation for fraction of boys in some family is 1/2*1+1/4*1/2+1/8*1/3+... = $\sum_{k=1}^{\infty} \frac{1}{2^k k}$, and the expectation for the fraction of girls in some family is the complement of this. ... I am having second thoughts on this, it is confusing. I am almost certain my answer above is wrong, but I don't really see why.

I was thinking this same thing: Look at at those girls some families have!

But looking more closely, with this model, every family has exactly one Boy - even if they first have 12 Girls - or whatever number.

1/2 of the families have 0 Girls. → B
1/4 of the families have exactly 1 Girl. → BG
1/8 of the families have exactly 2 Girls. → BGG
1/16 of the families have exactly 3 Girls. → BGGG
...

1/2n of the families have exactly (n-1) Girls. → BGG...G

If we have 4096 families (A round number for computer scientists), 4096 Boys will be born.

The number of girls will be:
1024 +2×512 +3×256 +4×128 +5×64 +6×32 +7×16 +8×8 +9×4 +10×2 +11×1 +12×(1/2) +13×(1/4) +... (Well, that's getting to be ridiculous.)

= 1024 + 1024 + 768 + 512 + 320 + 192 + 112 + 64 + 36 + 20 + 11 + the terms with fractions of a family.

= 4083 Girls in 2047 of the 2048 families with girls. The remaining family has at least 11 Girls with .5 probability of having (exactly) 1 more, .25 prob. of 2 more, ...

That gives us at least 4094 Girls and 4096 Boys. Maybe we should take a Boy from that last family, since they're not finished having Girls!

At any rate, the number of each is essentially the same. If I had started with 1 Giga-Family we would be left with only that discrepancy of one or two.

 Mentor At some level, this problem reduces to "what is the name of the bus driver's mother?" The answer is given in the question: a 50% chance of giving birth to a boy means that in a sample of N births, as N gets large, the number of boys approaches N/2. I mention in passing that if it were possible to change this expectation by a stopping algorithm, that same algorithm could be used with lottery tickets.

Here is the response I just posted on Landsburg's blog. Seems to make sense. I apologize in advance for the crappy notation.

 I have found that the ratio of girls to the total population after child bearing is complete is pg/ptot = (g0 + pcouples) / (p0 + 2*pcouples), where pg => the total number of girls after child bearing process has been completed ptot => total population “ “ g0 => initial number of girls in the population pcouples => number of eligible child bearing couples in the population p0 => initial population Proof: Consider an initial population p0. There will be boys and girls, g0 and b0 such that p0 = g0 + b0 (for the sake of argument we’ll assume there are no transsexuals). In the initial population, there will be a certain number of couples able to bear children which we’ll refer to as pcouples such that p0 = pcouples + pother (0 < pcouples < p0 and 0 < pother < pcouples) We have a rule that says each couple should produce children until they have a boy. We would like to know the ratio of girls in the population to the entire population at the end of all child bearing, Rg such that Rg = pg / ptot where pg is the number of girls after all child bearing and ptot is the total population at the end of all child bearing. I am going to think of the child bearing process in the following way: round up all of the eligible child bearing couples (the number of which is pcouples). Round one: Have all of the couples produce a child. We expect half of the couples to produce a boy and the other half to produce a girl. The couples that produced a boy are out of the running. Round two: Take all of the couples that produced a girl in round one and have them produce another child. We expect half of these couples to produce a boy and the other half to produce a girl. Continue this process ad infinitum (I am of course using the approximation that the population is continuous). We need to calculate the total number of girls and the total population after this process. Total Population: The total population is the initial population + the children that were produced, ptot = p0 + pchildren To count up the children we look at the children the couples produced in the child bearing process. Half of the couples produced 1 child (a boy), a quarter of the couples produced 2 children (a girl then a boy), an eighth of the couples produced 3 children (two girls and a boy), …. pchildren = (1/2^1)*pcouples*1 + (1/2^2)*pcouples*2 + (1/2^3)*pcouples*3 + … = pcouples * sum from 1 to infty of n*(1/2)^n = 2*pcouples therefore ptot = p0 + 2pcouples Total Number of Girls: The total number of girls is the initial number of girls + children that were girls, pg = g0 + pgirl/child By a similar process we count the number of children that were girls. Half of the couples produced 0 girls (1 boy), a quarter of the couples produced 1 girls (a boy and a girl), an eighth produced 2 girls (a boy and 2 girls), …. pgirl/child = (1/2^1+0)*pcouples*0 + (1/2^1+1)*pcouples*1 + (1/2^1+2)*pcouples*2 + … = (1/2) * pcouples * sum from 0 to infty of n*(1/2)^n = (1/2) * pcouples * sum from 1 to infty of n*(1/2)^n = (1/2 * pcouples * 2 = pcouples therefore pg = g0 + pcouples Finally, Rg = pg / ptot = (g0 + pcouples) / (p0 + 2*pcouples) Consider an initial population of half men, half women partnered into couples, g0 = (1/2)*p0 pcouples = (1/2)*p0 Rg = [ (1/2)*p0 + (1/2)*p0 ] / [ p0 + 2*(1/2)*p0 ] = ½ Jackson Walters

 I think you can formulate an equivalent problem in the following way: You're given a binary string, x, of length N bits. Each bit in the string is chosen randomly from {0,1}, except the last bit of the string, whose value is always 1. Let z be the number of zeros occurring in x. What is the expectation value of z/N. The population problem can be cast in terms of the binary string problem by representing the birth sequence for each family as a binary string (so that for example GGB = 001, GGGB = 0001, etc) then concatenating the binary strings for each family into one big long string.
 Mentor There is an easy way to look at the problem: "births are always 50/50". The policy does nothing done to change that. Births are still 50/50, so the population will still be 50/50. If you compute a result that says otherwise says you need to double-check your assumptions and your calculations.