Is the Official Google Answer to this Controversial Interview Question Correct?

DeDunc · Jan 1, 2011

There’s a google interview question that’s been discussed quite a bit in various forums recently.

The question is: There is a certain country where everybody has to have a son. Therefore each couple keeps having children until they have a boy, then they stop. In expectation, what fraction of births are female?

Apparently the “official google answer” is 0.5 due to the fact that the expectation of each birth being a girl (or boy) is always 0.5. The “stopping rule” does not change this expectation.

The controversy is that some say that the official google answer and argument are wrong.

Any comments?

micromass · Jan 1, 2011

Well, they ask "What fraction of births are female?" But a birth is always 50/50, so half the births are female.

However, if you would ask "What fraction of the population is female", that's a more difficult question. But I guess that the answer would be 0.5 to...

D H · Jan 1, 2011

I assume you mean the recent imbroglio created by Steven Landsburg, including his $15,000 bet (http://www.thebigquestions.com/2010/12/27/win-landsburgs-money/).

Two comments:
1. Landsburg is wrong.
2. He won't lose any money.

There are essentially two ways to set up this simulation. The simulation could be set up to answer the question as raised, in which case the outcome will be 50-50. Landsburg will not agree to such a set up. The simulation could also be set up to produce a biased estimator, in which case the other party should not agree to the setup. It isn't answering the question as raised.

DeDunc · Jan 1, 2011

Micromass- yes "what fraction of the pollution are female is how it phrased". It was changed to female births to reflect the fact that most discussions of this problem, seem to ignore the mothers- who of course, are female.

DH- Why is Landsburg wrong? Can you put your finger on which part of his logic is faulty? Is it the use of the family as a model for a country?

D H · Jan 1, 2011

I don't know what his problem is (or rather, I don't know why he is being so persistent). I do see what his problem is. He has been told in multiple places, in multiple ways, and in no uncertain terms that he is wrong. His problem is far even more basic than a biased estimator. Hhe is doing statistics incorrectly. His 44% figure arises from a high school, maybe even grade school, algebra mistake.

Suppose you drive 10 miles at 10 miles per hour and another 10 miles at 20 miles per hour. What is your average velocity? By his logic, the answer would be 15 miles per hour.

Vanadium 50 · Jan 1, 2011

DeDunc said:

Why is Landsburg wrong? Can you put your finger on which part of his logic is faulty?

He's calculating the wrong thing. The average fraction of girls per family is not the same as the fraction of girls in the overall population.

Consider the following case: there are 10 families in town. 9 have one girl and no boys, and 1 has 9 boys and no girls. In this case, the fraction of girls in the population is 50%: 9/(9+9). However, the average fraction of girls per family is 90%.

disregardthat · Jan 1, 2011

Given that each couple have completed their job of getting one boy and stopping at that moment, the possibilities for each couple are B, GB,GGB,GGGB,... and so on, where B is boy and G is girl. The probability for each outcome is respectively 1/2,1/4,1/8, hence the expectation for fraction of boys in some family is 1/2*1+1/4*1/2+1/8*1/3+... = [itex]\sum_{k=1}^{\infty} \frac{1}{2^k k}[/itex], and the expectation for the fraction of girls in some family is the complement of this.

Define [itex]f(x) = \sum_{n=1}^{\infty} \frac{x^n}{2^n n}[/itex]. We have [itex]f^{\prime}(x) = \sum_{n=1}^{\infty} \frac{x^{n-1}}{2^n} = \frac{1}{2}\sum_{n=0}^{\infty} (\frac{x}{2})^n = \frac{1}{2-x}[/itex], hence [itex]f(x) = C-\log(2-x)[/itex]. Since f(0) = 0, we have [itex]C = \log 2[/itex], giving "fraction of boys" = [itex]f(1) = \log 2 - \log 1 = \log 2[/itex]. Thus the fraction of girls is [itex]1-\log 2[/itex].

I am having second thoughts on this, it is confusing. I am almost certain my answer above is wrong, but I don't really see why.

D H · Jan 1, 2011

Jarle said:

I am having second thoughts on this, it is confusing. I am almost certain my answer above is wrong, but I don't really see why.

You are doing exactly what Vanadium talked about in post #6. Landsburg did something different, but also incorrect.

disregardthat · Jan 1, 2011

Right, so 1-log(2) is the expected fraction of girls in some given family, not corresponding to the overall fraction. This is the algebraic mistake a/b + c/d = (a+c)/(b+d). I believe this is an illustration of Simpson's paradox[/url].

D H · Jan 1, 2011

I think it the algebraic mistake of a/b + c/d = (a+c)/(b+d). Or worse.

Borek · Jan 1, 2011

https://www.physicsforums.com/showthread.php?t=233255

Studiot · Jan 1, 2011

I have never heard of Lansburg before this thread, but I consider the proposition to be an inappropriate application of probability theory.

Life experience, that is measurement, of the birth rate suggests that P_girl = P_boy = 0.5

We should not apply the statistics of perfect unbiased coin tossing to a situation where we can never prove the event (birth) to be unbiased.

We can define or even manufacture unbiased coins to any required degree of acuracy, but we cannot manufacture unbiased mothers. We cannot even show that birth events are not linked; birth statistics suggest otherwise, so while the overall probability approaches 0.5 girls or boys appear to run in families.

SammyS · Jan 1, 2011

Jarle said:

Given that each couple have completed their job of getting one boy and stopping at that moment, the possibilities for each couple are B, GB,GGB,GGGB,... and so on, where B is boy and G is girl. The probability for each outcome is respectively 1/2,1/4,1/8, hence the expectation for fraction of boys in some family is 1/2*1+1/4*1/2+1/8*1/3+... = [itex]\sum_{k=1}^{\infty} \frac{1}{2^k k}[/itex], and the expectation for the fraction of girls in some family is the complement of this.
...

I am having second thoughts on this, it is confusing. I am almost certain my answer above is wrong, but I don't really see why.

I was thinking this same thing: Look at at those girls some families have!

But looking more closely, with this model, every family has exactly one Boy - even if they first have 12 Girls - or whatever number.

1/2 of the families have 0 Girls. → B
1/4 of the families have exactly 1 Girl. → BG
1/8 of the families have exactly 2 Girls. → BGG
1/16 of the families have exactly 3 Girls. → BGGG
...

1/2ⁿ of the families have exactly (n-1) Girls. → BGG...G

If we have 4096 families (A round number for computer scientists), 4096 Boys will be born.

The number of girls will be:
1024 +2×512 +3×256 +4×128 +5×64 +6×32 +7×16 +8×8 +9×4 +10×2 +11×1 +12×(1/2) +13×(1/4) +... (Well, that's getting to be ridiculous.)

= 1024 + 1024 + 768 + 512 + 320 + 192 + 112 + 64 + 36 + 20 + 11 + the terms with fractions of a family.

= 4083 Girls in 2047 of the 2048 families with girls. The remaining family has at least 11 Girls with .5 probability of having (exactly) 1 more, .25 prob. of 2 more, ...

That gives us at least 4094 Girls and 4096 Boys. Maybe we should take a Boy from that last family, since they're not finished having Girls!

At any rate, the number of each is essentially the same. If I had started with 1 Giga-Family we would be left with only that discrepancy of one or two.

Vanadium 50 · Jan 1, 2011

At some level, this problem reduces to "what is the name of the bus driver's mother?" The answer is given in the question: a 50% chance of giving birth to a boy means that in a sample of N births, as N gets large, the number of boys approaches N/2.

I mention in passing that if it were possible to change this expectation by a stopping algorithm, that same algorithm could be used with lottery tickets.

jacksonwalter · Jan 1, 2011

Here is the response I just posted on Landsburg's blog. Seems to make sense. I apologize in advance for the crappy notation.

I have found that the ratio of girls to the total population after child bearing is complete is

pg/ptot = (g0 + pcouples) / (p0 + 2*pcouples), where

pg => the total number of girls after child bearing process has been completed
ptot => total population “ “
g0 => initial number of girls in the population
pcouples => number of eligible child bearing couples in the population
p0 => initial population

Proof:

Consider an initial population p0. There will be boys and girls, g0 and b0 such that

p0 = g0 + b0

(for the sake of argument we’ll assume there are no transsexuals).

In the initial population, there will be a certain number of couples able to bear children which we’ll refer to as pcouples such that

p0 = pcouples + pother (0 < pcouples < p0 and 0 < pother < pcouples)

We have a rule that says each couple should produce children until they have a boy. We would like to know the ratio of girls in the population to the entire population at the end of all child bearing, Rg such that

Rg = pg / ptot

where pg is the number of girls after all child bearing and ptot is the total population at the end of all child bearing.

I am going to think of the child bearing process in the following way: round up all of the eligible child bearing couples (the number of which is pcouples).

Round one: Have all of the couples produce a child. We expect half of the couples to produce a boy and the other half to produce a girl. The couples that produced a boy are out of the running.

Round two: Take all of the couples that produced a girl in round one and have them produce another child. We expect half of these couples to produce a boy and the other half to produce a girl.

Continue this process ad infinitum (I am of course using the approximation that the population is continuous).

We need to calculate the total number of girls and the total population after this process.

Total Population:

The total population is the initial population + the children that were produced,

ptot = p0 + pchildren

To count up the children we look at the children the couples produced in the child bearing process. Half of the couples produced 1 child (a boy), a quarter of the couples produced 2 children (a girl then a boy), an eighth of the couples produced 3 children (two girls and a boy), ….

pchildren = (1/2^1)*pcouples*1 + (1/2^2)*pcouples*2 + (1/2^3)*pcouples*3 + …
= pcouples * sum from 1 to infty of n*(1/2)^n
= 2*pcouples

therefore ptot = p0 + 2pcouples

Total Number of Girls:

The total number of girls is the initial number of girls + children that were girls,

pg = g0 + pgirl/child

By a similar process we count the number of children that were girls. Half of the couples produced 0 girls (1 boy), a quarter of the couples produced 1 girls (a boy and a girl), an eighth produced 2 girls (a boy and 2 girls), ….

pgirl/child = (1/2^1+0)*pcouples*0 + (1/2^1+1)*pcouples*1 + (1/2^1+2)*pcouples*2 + …
= (1/2) * pcouples * sum from 0 to infty of n*(1/2)^n
= (1/2) * pcouples * sum from 1 to infty of n*(1/2)^n
= (1/2 * pcouples * 2
= pcouples

therefore pg = g0 + pcouples

Finally,
Rg = pg / ptot = (g0 + pcouples) / (p0 + 2*pcouples)

Consider an initial population of half men, half women partnered into couples,

g0 = (1/2)*p0
pcouples = (1/2)*p0

Rg = [ (1/2)*p0 + (1/2)*p0 ] / [ p0 + 2*(1/2)*p0 ] = ½

Jackson Walters

jpr0 · Jan 1, 2011

I think you can formulate an equivalent problem in the following way:

You're given a binary string, x, of length N bits. Each bit in the string is chosen randomly from {0,1}, except the last bit of the string, whose value is always 1. Let z be the number of zeros occurring in x. What is the expectation value of z/N.

The population problem can be cast in terms of the binary string problem by representing the birth sequence for each family as a binary string (so that for example GGB = 001, GGGB = 0001, etc) then concatenating the binary strings for each family into one big long string.

D H · Jan 1, 2011

There is an easy way to look at the problem: "births are always 50/50". The policy does nothing done to change that. Births are still 50/50, so the population will still be 50/50. If you compute a result that says otherwise says you need to double-check your assumptions and your calculations.

JonF · Jan 1, 2011

jpr0 said:

IYou're given a binary string, x, of length N bits. Each bit in the string is chosen randomly from {0,1}, except the last bit of the string, whose value is always 1. Let z be the number of zeros occurring in x. What is the expectation value of z/N.

This isn't the problem since this would allow 0,0,0,0,1,0,0,1 which isn't allowed in the orignal question. The only "1" in your string has to be the terminal one.

If you are trying to think of how to model this with a computer (which is what your language seems to suggest that you’re thinking of) it’s really simple.
Let r be your random variable, let b,g,t count the number of boys girls and total respectivly. Let i be a dummy variable and p be your population size. Initialize everything to 0. In C it would look something like this if you very litterally put it in code (my syntax is off I’m sure, I haven’t programed in a few years).

1. do while i < p;
2. {
3. r = rnd(0,1);
4. Do while r=0;
5. { g++;
6. t++;
7. r = rnd(0,1); }
8. b++;
9. t++;
10.i++}

think of adding one to i as starting a new family.

But this is obviously equivlent to:
1. do while i < p;
2. {i++
3. r = rnd(0,1);
4. if r=0 then g++, t++; go to line 1;
5. if r=1 then b++, t++ got to line 1;
6. }

Which trivially gives a 1:1 ratio statistically.

jpr0 · Jan 1, 2011

JonF said:

This isn't the problem since this would allow 0,0,0,0,1,0,0,1 which isn't allowed in the orignal question. The only "1" in your string has to be the terminal one.

0,0,0,0,1,0,0,1 is perfectly valid. This would correspond to sampling 2 families, the first one of which had 4 girls and one boy, and the second of which had 2 girls and 1 boy.

Edit: I wasn't thinking of how to calculate this computationally, which is a fairly easy task. I'm just trying to cut away the semantics of the problem. It's easy to see that the number of zeros in a random binary string will average to half the length of the string.

arpace · Jan 1, 2011

well, the bias is not actually 50/50, as it is usually 105 boys to 100 girls; therefore, it should be 51:50 or 0.51219...repeated to infinity.

Tell me if I am wrong.

D H · Jan 2, 2011

arpace said:

well, the bias is not actually 50/50, as it is usually 105 boys to 100 girls; therefore, it should be 51:50 or 0.51219...repeated to infinity.

Tell me if I am wrong.

You are wrong. This isn't a problem about the real world. Couples can't have one million girls in a row and finally have a boy. You are told up front that the probability is 50-50.

DeDunc · Jan 2, 2011

I understand that he has calculated the expected fraction of girls per family, but I’m not clear exactly why this does not answer the question. Is it because with low numbers of children i.e. small numbers of families, the expected fraction of girls per family is biased against the contribution of large families ( which will have more girls) . Similar to the example given by Vanadium.

In the same way, that averaging two speeds, doesn’t take into account the amount of time at which a vehicle will be traveling at each different speeds ( as per the example given by DH)

D H · Jan 2, 2011

DeDunc said:

I understand that he has calculated the expected fraction of girls per family, but I’m not clear exactly why this does not answer the question.

Landsburg did not calculate the expected fraction of girls per family. He calculated something different (more below), but also something different from the fraction of females in the population.

Regarding your question: Suppose Landsburg's four couples produced children as follows: two couples have 1 boy, no girls, one couple has a girl and a boy, and the third has three girls and a boy. By construction, this totals to four boys and four girls, so a 50-50 split. On the other hand, the average number of girls per family is (0+0+1/2+3/4)/4, or 5/16. The problem with this computation is that the average number of girls per family gives equal weight to the families with one child and the family with four children. That family with four children obviously has more weight than the single-child families when it comes to the fraction of girls in the population as a whole.

Regarding what Landsburg is computing: Instead of the average number of girls per family, he is calculating the average number of girls per simulation run. Landsburg postulates four couples that produce children per the rules of this hypothetical country. The above B,B,GB,GGB represents the result of one such run. Another possible outcome: B,B,B,B. Just as the different family sizes made the average number of girls per family differ from the fraction of girls in the population as a while, the different populations produced by each run will similarly make the average number of girls per run be a different statistic than the expected fraction of girls in the population as a whole.

There are a couple of other problems with Landsburg's computation. He is starting with a very small number of couples. If he started with a larger number of couples he would get something closer to 50% than his 44% figure. The average number of girls per simulation run increases with the number of couples, reaching 50% in the limit of an infinite number of couples.

Another problem is that his the population doesn't reproduce. He is only considering the outcome of the first generation. Note that his population sometimes can't reproduce. For example, that that B,B,B,B combination (a 1/16 chance) represents a population that has bred itself into extinction. Even if that couple of four does manage to have a girl or two, the odds are that subsequent generations will eventually breed themselves out of existence (no couples exist that have not produced a boy). An initial population of four couples will eventually breed itself into extinction about 87% of the time. This "breeding itself into extinction" probability remains about 50% even for an initial population of 40. It starts falling off around 100 (22%) and doesn't become small (~1%) unless the initial population is around 300 or more.

DeDunc · Jan 2, 2011

D H said:

Regarding what Landsburg is computing: Instead of the average number of girls per family, he is calculating the average number of girls per simulation run. .

Given what you have said above. Is Landsburg interpretation of the question reasonable? Ie could the average number of girls per simulation run be reasonably said to represent the fraction of girls in the population with this stopping rule.

D H · Jan 2, 2011

No. It is, as Lubos Motl put it, borderline cheating. (Motl can be rather blunt and politically incorrect.) As an estimator of the population distribution, the average number of girls per simulation run is a biased estimator. The bias is greatest when the number of couples is low and when the population doesn't reproduce. Landsburg's four couples is ridiculously low, and he is only concerned with the first generation. Since Landsburg has been told multiple times that he is wrong, I have to lean towards Motl's assessment of Landsburg's setup.

Vanadium 50 · Jan 2, 2011

D H said:

You are doing exactly what Vanadium talked about in post #6. Landsburg did something different, but also incorrect.

I think he made the same mistake I said he did. Here's his "http://www.thebigquestions.com/2010/12/22/a-big-answer-2/" ".

In it, he says

[*]To see why not, let me tell you about the families who live on my block. There are 3 families with four girls each (and no boys), and one family with 12 boys (and no girls). Altogether, that makes 12 girls and 12 boys — equal numbers! On average, each family has three girls and three boys. Nevertheless, the fraction of girls in the average family is not 50%. It’s 75% (the average of 100%, 100%, 100%, and 0%).
[*]In other words, if you were to choose a random family off my block, the expected number of girls would equal the expected number of boys — 3 in either case. But the expected fraction of girls in the family would be 75%. Moral: Just because two variables have an expected difference of zero, you can’t conclude they have an expected ratio of one. That needs to be computed separately.

I think it's clear from that that he is interpreting "What fraction of the population is female?" to mean "What is the average fraction of female children per family" rather than "What fraction of the entire population is female?"

DeDunc · Jan 3, 2011

Thanks to all contributors. I think the consensus is that the maths is right, but the interpretation is wrong and perhaps more worryingly, the interpretation has been tweaked to fit the maths. Just looking at the link Vanadium posted Landsburg says:

“More precisely: What fraction of the population should we expect to be female? That is, in a large number of similar countries, what would be the average proportion of females

Let’s say we ignore second generations, ridiculously small family sizes and mothers (after all this is not the real world), if he calculates the average number of girls per simulation, and in his simulation each family represents a country. Then has he (albeit in a highly restricted way) answered the question? Or is there still something else missing?

Vanadium 50 · Jan 3, 2011

There are at least three different things being discussed: the literal wording of the question, what Landsburg describes as the question (including his example), and Landsburg's calculation, expressed in words.

I don't think we have a consensus on which, if any, of these are identical.

DeDunc · Jan 3, 2011

Vanadium 50 said:

There are at least three different things being discussed: the literal wording of the question, what Landsburg describes as the question (including his example), and Landsburg's calculation, expressed in words.

I don't think we have a consensus on which, if any, of these are identical.

Which of the three things you mentioned above, is the most problematic?

D H · Jan 3, 2011

DeDunc said:

Let’s say we ignore second generations, ridiculously small family sizes and mothers (after all this is not the real world), if he calculates the average number of girls per simulation, and in his simulation each family represents a country. Then has he (albeit in a highly restricted way) answered the question?

No, he has not answered the question. He has answered a question, but not the question as posed. As an economist, Landsburg should know better.

There are ways to make the gender makeup of a country deviate from the nominal 50-50 split:

There are various reproductive schemes that purport to tilt the balance in favor of boys or in favor of girls. These schemes have to be employed at the moment of conception rather than after the birth of a baby. The scheme in question postulates that babies are born 50-50 male/female.
Another way to get something other than a 50-50 split is to kill newborns of the "wrong" gender. This apparently is exactly what is happening in China and India. The question doesn't say anything about infanticide, so we can leave that out is a possibility for the problem at hand.
Yet another way to get something other than a 50-50 split is to import or export people with a bias toward one gender or another. The wild west for example suffered a paucity of females because the initial push west was largely made by single men. Las Vegas on the other hand currently has a paucity of males. The job situation is more favorable for females than for males in Las Vegas, and hence more females than males move to Vegas.

The scheme in question does none of the above. The proposed technique falls exactly in the category that makes the optional stopping theorem applicable. Landsburg, being an economist, should be aware of the optional stopping theorem. Game theory was invented largely because of economics. This question can easily be recast as a question about martingales in economic game theory. Vanadium saw the connection way back in post #14:

Vanadium 50 said:

I mention in passing that if it were possible to change this expectation by a stopping algorithm, that same algorithm could be used with lottery tickets.

Is the Official Google Answer to this Controversial Interview Question Correct?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad The problem of points

Graduate Expected numbers of cards of a last color remaining

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect