Probability of Getting 10 Pairs from a Box of 30 Socks

  • Thread starter juanitotruan77
  • Start date
  • Tags
    Probability
In summary, the conversation discusses a probability problem involving picking socks from a box containing 30 pairs. The question is to determine the most probable number of tries needed to get 10 pairs of socks. The conversation explores different approaches and equations, including the Gauss distribution, to solve the problem. It is suggested to lay out the sample space of all possible outcomes to better understand the problem. The conversation also discusses the conditions for an acceptable pair of socks and the difference between combinations and permutations. Ultimately, it is determined that finding the mode of the probability distribution will give the most probable number of tries needed to get 10 pairs of socks.
  • #1
juanitotruan77
44
0

Homework Statement


I have this problem that says that i have 30 pair of socks mixed in a box, and i have to make 10 pairs taking them one by one of the box. The question is "¿Wich is the most probable number of tries that you'll make before you get the 10 pairs?
(Sorry for my english, I'm not from a native english speaker country)

Homework Equations


I know it has something to do with the Gauss distribution, but that's it, my probability teacher is an A-hole.

The Attempt at a Solution


By doing a program in php that simulates the experiment, i get that the center of the gauss function graphic is 28 or 29.
 
Physics news on Phys.org
  • #2
It always helps to lay out the sample space . I assume the issue is socks of the same handedness, right, I mean you want to find 10 left and 10 right? Otherwise, what is the condition for an acceptable pair?
 
  • #3
Bacle2 said:
It always helps to lay out the sample space . I assume the issue is socks of the same handedness, right, I mean you want to find 10 left and 10 right? Otherwise, what is the condition for an acceptable pair?

Well they've to be the pair, same color, right and left. It can be shown as a number: 1 with 1, 2 with 2, etc.
 
  • #4
Why don't you try laying out the sample space, i.e., the set of all possible outcomes of pulling socks:

you can pull out : socks of different color, different numbers, etc.
 
  • #5
Bacle2 said:
Why don't you try laying out the sample space, i.e., the set of all possible outcomes of pulling socks:

you can pull out : socks of different color, different numbers, etc.

It's way simpler than that, there's just 2 events:1 the socks are a pair, 2 the socks aren't a pair.
 
  • #6
juanitotruan77 said:
It's way simpler than that, there's just 2 events:1 the socks are a pair, 2 the socks aren't a pair.

No. It's way more complicated than that. Try answering the question: what is the sample space?

If you are having trouble, start with a simpler case: say you have 4 pairs of socks (i.e., 8 socks) and you want to know the probability distribution of the number you have to pick until you get two good pairs.
 
  • #7
You could grind this out using a simpler case. Say there are 4 pairs and you're trying to get any 2 pairs.

1st draw - any sock is ok

2nd draw - 1/7 chance of getting the 1st sock's mate, 6/7 chance of getting a different sock.

3rd draw - if 2nd draw got a mate for 1st sock, any sock is ok, else 2/6 chance of getting 1st or 2nd sock's mate, 4/6 chance of getting a sock different than the first 2.

...
 
  • #8
rcgldr said:
You could grind this out using a simpler case. Say there are 4 pairs and you're trying to get any 2 pairs.

1st draw - any sock is ok

2nd draw - 1/7 chance of getting the 1st sock's mate, 6/7 chance of getting a different sock.

3rd draw - if 2nd draw got a mate for 1st sock, any sock is ok, else 2/6 chance of getting 1st or 2nd sock's mate, 4/6 chance of getting a sock different than the first 2.

...
The thing is that I'm not asking for the probability, I'm asking for the most probable number.
 
  • #9
My previous post was a bad example as it gets too complicated to follow that path. Using the same example, 4 pairs of socks and you want 2 pairs:

The probability is zero if you get 1, 2, or 3 socks, and 100% if you get 6, 7, or 8 socks. Then there's the probability of getting two pairs if you take 8 things 4 at a time, and the probability of getting two pairs if you take 8 things 5 at a time. This approach considers left socks to be different from right socks, but that issue is resolved as long as you only consider combinations (and not permutations) where the order does not matter.

For 8 things taken 4 at a time, you have 70 cases, of which 6 produce 2 pair.

For 8 things take 5 at a time, you have 56 cases, of which 12 produce 2 pair.

So the distribution and probablity is:

0 = 0/1
1 = 0/8
2 = 0/28
3 = 0/56
4 = 6/70
5 = 12/56
6 = 28/28
7 = 8/8
8 = 1/1

At this point, I'm not sure what the "most probable" number of socks taken to produce two pair is. The sum of the probabilties of taking 0 thorugh 5 socks to get two pair is .3, while the probabiliy for taking 6 or more is 1.0.
 
Last edited:
  • #10
juanitotruan77 said:
The thing is that I'm not asking for the probability, I'm asking for the most probable number.

In normal usage, this would mean the number with the highest probability; so, to find it, you need to find *all* the probabilities first, then pick out the largest one! You want what is called the mode of the probability distribution.
 
  • #11
rcgldr said:
My previous post was a bad example as it gets too complicated to follow that path. Using the same example, 4 pairs of socks and you want 2 pairs ... This approach considers left socks to be different from right socks, but that issue is resolved as long as you only consider combinations (and not permutations) where the order does not matter.
This is really treating socks like shoes (left and right), and trying to account for it afterwards, and I'm not sure this is a valid approach. Hopefully someone here will be able to answer your question properly.

If I grind through this using a program for 8 pairs of socks taken 4 at a time, out of the 1680 cases (8×7×6×5), you end up with this distribution (2 colors is the same as 2 pairs in this case). For the 3 colors case, the combination patterns a a b c, a b b c, and a b c c are produced and not permutations of each other.

Code:
2 colors =  144
3 colors = 1152
4 colors =  384
           ----
total      1680

This matches the results using combinations, 2 pair probability for 4 socks out of 8 = 6/70 = 144/1680.
 
Last edited:
  • #12
rcgldr said:
This is really treating socks like shoes (left and right), and trying to account for it afterwards, and I'm not sure this is a valid approach. Hopefully someone here will be able to answer your question properly.

If I grind through this using a program for 8 pairs of socks taken 4 at a time, out of the 1680 cases (8×7×6×5), you end up with this distribution (2 colors is the same as 2 pairs in this case). For the 3 colors case, the combination patterns a a b c, a b b c, and a b c c are produced and not permutations of each other.

Code:
2 colors =  144
3 colors = 1152
4 colors =  384
           ----
total      1680

This matches the results using combinations, 2 pair probability for 4 socks out of 8 = 6/70 = 144/1680.
Well, you take 1 sock at a time, there're 60 socks. I simulated it 400 times with my code and it gave me the following chart:
Code:
0|0
1|0
2|0
3|0
4|0
5|0
6|0
7|0
8|0
9|0
10|0
11|0
12|0
13|0
14|0
15|0
16|0
17|0
18|0
19|0
20|0
21|]]0.005
22|]]]0.0075
23|]]]]]]]]]0.0225
24|]]]]]]]]]]]]]]]]]]]]]0.0525
25|]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]0.08
26|]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]0.1175
27|]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]0.105
28|]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]0.16
29|]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]0.13
30|]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]0.1025
31|]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]0.09
32|]]]]]]]]]]]]]]]]]]]]]]0.055
33|]]]]]]]]]]]]]]]]]]0.045
34|]]]0.0075
35|]]]]0.01
36|]]0.005
37|]]0.005
38|0
39|0
40|0
41|0
42|0
43|0
44|0
45|0
46|0
47|0
48|0
49|0
50|0
51|0
52|0
53|0
54|0
55|0
56|0
57|0
58|0
59|0
60|0
Now i have to demonstrate that mathematically, that what i want to know, how to do it.
 
  • #13
Getting back to the original problem, there are 30 colors of socks, so after getting 30 socks, there's a tiny chance that you got all 30 colors and no pairs. You're guaranteed one pair when you get the 31st sock, and 10 pairs when you get the 40th sock. You'll have 11 pairs on the 41st sock, and all pairs on the 60th sock. So the smallest number of socks with 100% probability of getting 10 pairs is 40 socks.

60 things taken n at a time involves large numbers. For 60 things taken 30 at a time, I get 1.182645816 x 10^17.
 
Last edited:
  • #14
rcgldr said:
Getting back to the original problem, there are 30 colors of socks, so after getting 30 socks, there's a tiny chance that you got all 30 colors and no pairs. You're guaranteed one pair when you get the 31st sock, and 10 pairs when you get the 40th sock. You'll have 11 pairs on the 41st sock, and all pairs on the 60th sock. So the smallest number of socks with 100% probability of getting 10 pairs is 40 socks.
Yeah, actually that was the thing that originated the problem, my physics teacher told us that, but the he said that the most probable number was a lot less than that and told us to find out which one by doing a program and by math, but i don't know how to get it trough math.
 
  • #15
rcgldr said:
Getting back to the original problem, ... the smallest number of socks with 100% probability of getting 10 pairs is 40 socks.
That doesn't seem to me to get very far with the original problem.

Let's try some algebra: N pairs in the box, t chosen, looking for r pairs retrieved.
What is the probability of getting exactly r pairs in t socks chosen? Let this be P(r, t).
In terms of the P(r, t), what is the probability that it will take exactly t selections to achieve r pairs? Call this Q(r, t).
If tm maximises that, you would expect a certain relation between Q(r, tm-.5) and Q(r, tm+.5) to be approximately true. Putting all that together should give you a polynomial in tm, which luckily reduces to a quadratic. (Keeping N and r as variables makes it rather messy, though, so you might want to substitute the actual values at some point.)
Fwiw, I get 27.
 
  • #16
haruspex said:
What is the probability of getting exactly r pairs in t socks chosen?
So the key point here is to look for the most likely value of t socks to get exactly r pairs, as opposed to getting r or more than r pairs.
 
  • #17
rcgldr said:
So the key point here is to look for the most likely value of t socks to get exactly r pairs, as opposed to getting r or more than r pairs.

But how do i do it? like i said, i don't know a single thing of probability
 
  • #18
juanitotruan77 said:
But how do i do it? like i said, i don't know a single thing of probability
Then I don't understand how you can expect to solve this problem. I'll try to walk you through it:
N pairs in the box, t socks chosen, need to get r pairs. Notation: for ease I'll write xCy for the number of ways of choosing y things from x things. This evaluates to x!/(y!(x-y)!).

What is the probability of getting exactly r pairs in t socks chosen? Let this be P(r, t).
There are NCr possibilities for which r pairs you get. Having identified the pairs, you need to get both socks of each of those pairs, and there's only one way to do that. Then, you must choose one sock of each of t-2r of the remaining N-r pairs. There are (N-r)C(t-2r) ways to choose those pairs, and 2t-2r of choosing one of each of those pairs chosen. All this has to be compared with the total number of ways of choosing t socks from 2N:
P(r, t) = NCr * (N-r)C(t-2r) * 2t-2r / (2N)Ct

What is the probability that it will take exactly t selections to achieve r pairs? Call this Q(r, t).
Maybe you can figure this one out... Can you find a simple expression for Q(r, t) in terms of the P(r, t) set?

If tm maximises that, you would expect a certain relation between Q(r, tm-.5) and Q(r, tm+.5) to be approximately true. (It's like differentiation, but applied to a sequence.) Can you think what that relationship is?
 
  • #19
No dude, i can't, i don't quite follow you. It's useless, thanks for trying to help tho. I Appreciate it. I think i'll go to summer school to learn some probability
 
  • #20
I'm not an expert on probability, but I think I can help explain this.

Take the case where you draw 20 socks. The probability of getting 10 pairs is 30C10 / 60C20. It's 30C10 because it's the number of combinations of 10 colors out of a possible 30 colors (you're assuming the case where you got 10 pairs, so there are 10 colors, and the ordering of left / right sock doesn't matter). The 60C20 is the number of possible combinations by drawing 20 socks out of 60 (which is why (2N)C(t) was used in the denominator in haruspex's formula).

Next is the case where you draw 21 socks, get exactly 10 pairs, so the 1 extra sock can be any of the remaining 40 socks. In this case, you can think of this as left and right socks being unique when there's a sock being drawn that doesn't match any of the other colors in a set of t socks, since there are 2 of each sock of the same color and you could draw either one.

Next is the case where you draw 22 socks, get exactly 10 pairs, and the 2 extra socks have to be a different color than the 10 pair, and different from each other.

The 2^(...) component in haruspex's formula is due to the fact that there are two socks of each color.

If you follow this logic and compare the results with haruspex's formula it should work out. You may want to consider a smaller case, like getting exactly 2 pairs from 10 socks of 5 colors which is small enough that you could use a program to create a list of all the possible outcomes (252 max number of combinations with 10 socks).

You can use haruspex's formula in a program for 20 <= t <= 60, and find the t with the highest probability of getting exactly 10 pairs. If the relationship between t and probabilty is quadractic as mentioned by haruspex, you could use a quadratic least squares fit on t versus probablity to find t to a number of decimal places.

I have a documument with example code for doing polynomial least squre fit that doesn't involve inverting a matrix, which can be an isssue for higher degree polynomials. Link to that document (rtf fiile):

http://rcgldr.net/misc/opls.rtf
 
Last edited:
  • #21
rcgldr said:
I'm not an expert on probability, but I think I can help explain this.

Take the case where you draw 20 socks. The probability of getting 10 pairs is 30C10 / 60C20. It's 30C10 because it's the number of combinations of 10 colors out of a possible 30 colors (you're assuming the case where you got 10 pairs, so there are 10 colors, and the ordering of left / right sock doesn't matter). The 60C20 is the number of possible combinations by drawing 20 socks out of 60 (which is why (2N)C(t) was used in the denominator in haruspex's formula).

i think a get this. So i have to do this formula 30C10 / 60Cx to every value from 20 to 40 and see which one is bigger?
 
  • #22
juanitotruan77 said:
i think a get this. So i have to do this formula 30C10 / 60Cx to every value from 20 to 40 and see which one is bigger?
No, there is an algebraic way, as I posted earlier:
You have a formula for P(r, t), the probability that after t selections you will have exactly r pairs. But you might have had r pairs already after t-1 selections. Can you see how to obtain from P(r,t) the probability Q(r, t) that it will take exactly t selections to achieve r pairs? (It's a very simple relationship.)
Having got that, you have to find where the sequence peaks. Assuming it's unimodal, there's quite an easy way to do that. If it peaks at tm, you would expect a certain relation between Q(r, tm-.5) and Q(r, tm+.5) to be approximately true. (It's like differentiation, but applied to a sequence.) Can you think what that relationship is?
 
  • #23
Ok i think i get it a little bit more, so i do P(r, t) = NCr * (N-r)C(t-2r) * 2t-2r / (2N)Ct This is the probability of me getting r pairs after t selections, right? But now i don't get what q(r,t) is.
 
  • #24
Oh i got it, at least the first formula. I made a code that applied the formula of P(r,t) from 20 to 60 t. and i got this:
Code:
20 1.4334985450147E-8
21 2.2577602083981E-7
22 0
23 1.464649571089E-6
24 2.4226095608284E-5
25 6.2808396021476E-5
26 6.6653808022791E-5
27 0.00013232741298642
28 0.00036490286611408
29 0.00048501672620996
30 0.00037549682029158
31 0.00039683186689906
32 0.00051086401255971
33 0.00039698223895927
34 0.00019996142406837
35 0.00011664416403988
36 7.2678286824851E-5
37 2.7828341850692E-5
38 6.4529488349431E-6
39 1.2793974514256E-6
40 1.5768483995162E-7
41 1.6092727713677E-8
42 1.4060631661627E-9
43 1.5334239601992E-10
44 2.0474288170727E-11
45 2.3919480891763E-12
46 2.5077974268287E-13
47 3.2480841892898E-14
48 5.1398255303048E-15
49 7.4783476825028E-16
50 1.0234297903718E-16
51 1.74383947E-17
52 3.6914609470306E-18
53 7.6288547869225E-19
54 1.5825523495561E-19
55 4.2743043369707E-20
56 1.5312659308205E-20
57 6.0526389833106E-21
58 2.8425349071418E-21
59 2.2098481845764E-21
60 3.762714476441E-21
That's the answer I'm looking for, right? Does the (2N)Ct affect the whole NCr * (N-r)C(t-2r) * 2t-2r term or just
2^(t-2r)?
 
Last edited:
  • #25
Somethings wrong here. Say you get lucky and draw 30 socks, all different colors. The next 10 socks you pick will end up producing 10 pairs. Any sock you pick after that (41 through 60) produces more than 10 pairs. The probability of exactly 10 pairs for t = 41 thorugh 60 is zero.
 
  • #26
rcgldr said:
Somethings wrong here. Say you get lucky and draw 30 socks, all different colors. The next 10 socks you pick will end up producing 10 pairs. Any sock you pick after that (41 through 60) produces more than 10 pairs. The probability of exactly 10 pairs for t = 41 thorugh 60 is zero.

That's what i was thinking, you can't have 10 pairs in less than 20 socks either.
 
  • #27
juanitotruan77 said:
That's what i was thinking, you can't have 10 pairs in less than 20 socks either.
The formula is probably OK for t = 20 to t = 40, just don't use it outside that range.
 
  • #28
rcgldr said:
The formula is probably OK for t = 20 to t = 40, just don't use it outside that range.

The formula's ok. It produces zero for excess values of t. But the software that interprets nCr when r > n might do strange things.
 
  • #29
juanitotruan77 said:
This is the probability of me getting r pairs after t selections, right? But now i don't get what q(r,t) is.
P(r,t) is the probability that you will have r pairs after t selections. But you want the probability of needing t selections to get r pairs. You might have had r pairs already after t-1 selections.
 
  • #30
rcgldr said:
The formula is probably OK for t = 20 to t = 40, just don't use it outside that range.
but what about the 22=0?
 
  • #31
haruspex said:
P(r,t) is the probability that you will have r pairs after t selections. But you want the probability of needing t selections to get r pairs. You might have had r pairs already after t-1 selections.

So i how do i do that?
 
  • #32
juanitotruan77 said:
but what about the 22=0?
Check your program, 22 should not be 0.
 
  • #33
rcgldr said:
Check your program, 22 should not be 0.

i don't know if I am wrinting the formula correctly.
Prob= NCr * (N-r)C(t-2r) * ((2^t-2r) / (2N)Ct)
 
  • #34
P(r, t) = NCr × (N-r)C(t-2r) × 2^(t-2r) / (2N)Ct
 
  • #35
Does the division affect the whole term or just the 2^(t-2r) ?
 

Similar threads

  • Precalculus Mathematics Homework Help
Replies
24
Views
1K
  • Precalculus Mathematics Homework Help
Replies
16
Views
2K
  • Calculus and Beyond Homework Help
Replies
3
Views
737
  • Precalculus Mathematics Homework Help
Replies
11
Views
3K
  • Precalculus Mathematics Homework Help
Replies
12
Views
2K
  • Precalculus Mathematics Homework Help
Replies
5
Views
4K
  • Precalculus Mathematics Homework Help
Replies
2
Views
1K
  • Advanced Physics Homework Help
Replies
11
Views
1K
Replies
2
Views
1K
Back
Top