
#1
Jan111, 08:59 AM

P: 14

There’s a google interview question that’s been discussed quite a bit in various forums recently.
The question is: There is a certain country where everybody has to have a son. Therefore each couple keeps having children until they have a boy, then they stop. In expectation, what fraction of births are female? Apparently the “official google answer” is 0.5 due to the fact that the expectation of each birth being a girl (or boy) is always 0.5. The “stopping rule” does not change this expectation. The controversy is that some say that the official google answer and argument are wrong. Any comments? 



#2
Jan111, 09:06 AM

Mentor
P: 16,701

Well, they ask "What fraction of births are female?" But a birth is always 50/50, so half the births are female.
However, if you would ask "What fraction of the population is female", that's a more difficult question. But I guess that the answer would be 0.5 to... 



#3
Jan111, 11:10 AM

Mentor
P: 14,477

I assume you mean the recent imbroglio created by Steven Landsburg, including his $15,000 bet (http://www.thebigquestions.com/2010/...dsburgsmoney/).
Two comments: 1. Landsburg is wrong. 2. He won't lose any money. There are essentially two ways to set up this simulation. The simulation could be set up to answer the question as raised, in which case the outcome will be 5050. Landsburg will not agree to such a set up. The simulation could also be set up to produce a biased estimator, in which case the other party should not agree to the setup. It isn't answering the question as raised. 



#4
Jan111, 02:44 PM

P: 14

Google interview question
Micromass yes "what fraction of the pollution are female is how it phrased". It was changed to female births to reflect the fact that most discussions of this problem, seem to ignore the mothers who of course, are female.
DH Why is Landsburg wrong? Can you put your finger on which part of his logic is faulty? Is it the use of the family as a model for a country? 



#5
Jan111, 04:14 PM

Mentor
P: 14,477

I don't know what his problem is (or rather, I don't know why he is being so persistent). I do see what his problem is. He has been told in multiple places, in multiple ways, and in no uncertain terms that he is wrong. His problem is far even more basic than a biased estimator. Hhe is doing statistics incorrectly. His 44% figure arises from a high school, maybe even grade school, algebra mistake.
Suppose you drive 10 miles at 10 miles per hour and another 10 miles at 20 miles per hour. What is your average velocity? By his logic, the answer would be 15 miles per hour. 



#6
Jan111, 04:27 PM

Mentor
P: 15,625

Consider the following case: there are 10 families in town. 9 have one girl and no boys, and 1 has 9 boys and no girls. In this case, the fraction of girls in the population is 50%: 9/(9+9). However, the average fraction of girls per family is 90%. 



#7
Jan111, 04:36 PM

Sci Advisor
P: 1,707

Given that each couple have completed their job of getting one boy and stopping at that moment, the possibilities for each couple are B, GB,GGB,GGGB,... and so on, where B is boy and G is girl. The probability for each outcome is respectively 1/2,1/4,1/8, hence the expectation for fraction of boys in some family is 1/2*1+1/4*1/2+1/8*1/3+... = [itex]\sum_{k=1}^{\infty} \frac{1}{2^k k}[/itex], and the expectation for the fraction of girls in some family is the complement of this.
Define [itex]f(x) = \sum_{n=1}^{\infty} \frac{x^n}{2^n n}[/itex]. We have [itex]f^{\prime}(x) = \sum_{n=1}^{\infty} \frac{x^{n1}}{2^n} = \frac{1}{2}\sum_{n=0}^{\infty} (\frac{x}{2})^n = \frac{1}{2x}[/itex], hence [itex]f(x) = C\log(2x)[/itex]. Since f(0) = 0, we have [itex]C = \log 2[/itex], giving "fraction of boys" = [itex]f(1) = \log 2  \log 1 = \log 2[/itex]. Thus the fraction of girls is [itex]1\log 2[/itex]. I am having second thoughts on this, it is confusing. I am almost certain my answer above is wrong, but I don't really see why. 



#8
Jan111, 04:55 PM

Mentor
P: 14,477





#9
Jan111, 05:02 PM

Sci Advisor
P: 1,707

Right, so 1log(2) is the expected fraction of girls in some given family, not corresponding to the overall fraction. This is the algebraic mistake a/b + c/d = (a+c)/(b+d). I believe this is an illustration of Simpson's paradox.




#10
Jan111, 05:27 PM

Mentor
P: 14,477

I think it the algebraic mistake of a/b + c/d = (a+c)/(b+d). Or worse.




#11
Jan111, 05:31 PM

Admin
P: 22,710




#12
Jan111, 06:46 PM

P: 5,462

I have never heard of Lansburg before this thread, but I consider the proposition to be an inappropriate application of probability theory.
Life experience, that is measurement, of the birth rate suggests that P_{girl} = P_{boy} = 0.5 We should not apply the statistics of perfect unbiased coin tossing to a situation where we can never prove the event (birth) to be unbiased. We can define or even manufacture unbiased coins to any required degree of acuracy, but we cannot manufacture unbiased mothers. We cannot even show that birth events are not linked; birth statistics suggest otherwise, so while the overall probability approaches 0.5 girls or boys appear to run in families. 



#13
Jan111, 07:11 PM

Emeritus
Sci Advisor
HW Helper
PF Gold
P: 7,418

I was thinking this same thing: Look at at those girls some families have! But looking more closely, with this model, every family has exactly one Boy  even if they first have 12 Girls  or whatever number. 1/2 of the families have 0 Girls. → B 1/4 of the families have exactly 1 Girl. → BG 1/8 of the families have exactly 2 Girls. → BGG 1/16 of the families have exactly 3 Girls. → BGGG ... 1/2^{n} of the families have exactly (n1) Girls. → BGG...G If we have 4096 families (A round number for computer scientists), 4096 Boys will be born. The number of girls will be: 1024 +2×512 +3×256 +4×128 +5×64 +6×32 +7×16 +8×8 +9×4 +10×2 +11×1 +12×(1/2) +13×(1/4) +... (Well, that's getting to be ridiculous.) = 1024 + 1024 + 768 + 512 + 320 + 192 + 112 + 64 + 36 + 20 + 11 + the terms with fractions of a family. = 4083 Girls in 2047 of the 2048 families with girls. The remaining family has at least 11 Girls with .5 probability of having (exactly) 1 more, .25 prob. of 2 more, ... That gives us at least 4094 Girls and 4096 Boys. Maybe we should take a Boy from that last family, since they're not finished having Girls! At any rate, the number of each is essentially the same. If I had started with 1 GigaFamily we would be left with only that discrepancy of one or two. 



#14
Jan111, 07:25 PM

Mentor
P: 15,625

At some level, this problem reduces to "what is the name of the bus driver's mother?" The answer is given in the question: a 50% chance of giving birth to a boy means that in a sample of N births, as N gets large, the number of boys approaches N/2.
I mention in passing that if it were possible to change this expectation by a stopping algorithm, that same algorithm could be used with lottery tickets. 



#15
Jan111, 07:53 PM

P: 43

Here is the response I just posted on Landsburg's blog. Seems to make sense. I apologize in advance for the crappy notation.




#16
Jan111, 08:23 PM

P: 147

I think you can formulate an equivalent problem in the following way:
You're given a binary string, x, of length N bits. Each bit in the string is chosen randomly from {0,1}, except the last bit of the string, whose value is always 1. Let z be the number of zeros occurring in x. What is the expectation value of z/N. The population problem can be cast in terms of the binary string problem by representing the birth sequence for each family as a binary string (so that for example GGB = 001, GGGB = 0001, etc) then concatenating the binary strings for each family into one big long string. 



#17
Jan111, 08:48 PM

Mentor
P: 14,477

There is an easy way to look at the problem: "births are always 50/50". The policy does nothing done to change that. Births are still 50/50, so the population will still be 50/50. If you compute a result that says otherwise says you need to doublecheck your assumptions and your calculations.




#18
Jan111, 08:53 PM

P: 617

If you are trying to think of how to model this with a computer (which is what your langauge seems to suggest that you’re thinking of) it’s really simple. Let r be your random variable, let b,g,t count the number of boys girls and total respectivly. Let i be a dummy variable and p be your population size. Initialize everything to 0. In C it would look something like this if you very litterally put it in code (my syntax is off I’m sure, I haven’t programed in a few years). 1. do while i < p; 2. { 3. r = rnd(0,1); 4. Do while r=0; 5. { g++; 6. t++; 7. r = rnd(0,1); } 8. b++; 9. t++; 10.i++} think of adding one to i as starting a new family. But this is obviously equivlent to: 1. do while i < p; 2. {i++ 3. r = rnd(0,1); 4. if r=0 then g++, t++; go to line 1; 5. if r=1 then b++, t++ got to line 1; 6. } Which trivially gives a 1:1 ratio statistically. 


Register to reply 
Related Discussions  
Extract Locations from XML feed from private Google calendar to use on Google map.  Computers  0  
Interview Question  Career Guidance  4  
yoyo interview question  Introductory Physics Homework  3  
Interview Question  Classical Physics  2 