# Discrete probability distribution

1. Oct 18, 2015

### toothpaste666

1. The problem statement, all variables and given/known data
1. Consider selecting at random a student who is among the 15,000 registered for the current semester at a school Let X be the number of courses for which the selected student is registered and suppose that X has probability distribution

x: 1 2 3 4 5 6
f(x): .01 .03 .13 .25 .39 .19

(a) Find the cdf of X.

(b) Find the expected number of courses taken by a student in this semester.

(c) Find the standard deviation of X.

(d) Find the median of this distribution.

3. The attempt at a solution
a)This part I am confused of what they want, since X is not specified. It seems like they already provided me with the cdf

b) this is the summation of the xf(x) 's
(1)(.01) + 2(.03) + 3(.13) + 4(.25) + 5(.39) + 6(.19) = 4.55

c) the variance is the sum of the (x-μ)^2f(x) 's
(1-4.55)^2(.01) + (2-4.55)^2(.03) + (3-4.55)^2(.13) + (4-4.55)^2(.25) + (5-4.55)^2(.39) + (6-4.55)^2(.19)
= 1.1875
and the standard deviation is the square root of that
= 1.09

d) put them in order
.01 .03 .13 .19 .25 .39

the median is (.13 + .19)/2 = .16

unless I am trying to find the median number of courses taken?
in that case
1 2 3 6 4 5
(3+6)/2 = 9/2 = 4.5 courses

I am not that confident I did this right because I didn't use the number of registered students they gave me and I don't think I understood part a)

2. Oct 18, 2015

### Ray Vickson

In (a): of course X is specified. You wrote "Let X be the number of courses for which the selected student is registered and suppose that X has probability distribution..."

I suppose you may be a bit confused about who the "selected student" is, and how his/her course probability distribution is obtained. Basically, the problem is just specifying that 1% of the students take exactly 1 course, that 3% take exactly two courses, etc. And, of course, if you actually look at student John Smith he will be taking some specific number of course, either 1 or 2 or 3 or 4 or 5 or 6, with no probabilities involved anywhere. However, prior to the selection, you will not know the actual number that is going to occur, only the chances of the various numbers. Is that what was throwing you off?

Just to be accurate: the standard deviation is not 1.09; it is approximately 1.089724736, which is, in turn approximately 1.09. Saying "approximate" instead of "equals" will not hurt you, and it makes clear that you understand the difference.

The median of the distribution is not what you wrote (except, maybe, by accident): in probability and statistics, the median is the 50th percentile on the cdf. So, if you plot the graph y = F(x) of the cdf F (including vertical line segments at the jumps of F), you can think of the median as the point x where the line y = 1/2 cuts the graph y = F(x). (Sometimes, if F(x) = 1/2 on an interval [a,b), the whole segment a->b can be thought of as a median, but usually one would pick a point in the interval and use that as the median. There may be different conventions for how to do that.)

The "median" you obtained would be OK for the uniform distribution, where each point had probability 1/5.

3. Oct 18, 2015

### toothpaste666

for part a) X is the number of courses he is registered for so the cdf for X is the sum of all the f(x)'s <= f(X) ???

for the median i need to find the number of courses that 50% of the students have less than or equal to. 42% have less than or equal to 4 and 81% have less than or equal to 5. So the median would be somewhere in between 4 and 5 , but since there are only integer numbers of courses there is no number of courses in between 4 and 5. I am sorry if I am just not getting it, but I still don't understand how this works in the discrete case

4. Oct 19, 2015

### Ray Vickson

Have you tried what I suggested? Did you plot the graph of y = F(x) (with vertical lines inserted at the jump points) then find where that graph meets the line y = 1/2? That will give you the median. It will be a whole number of courses, not a fraction. How it works in the discrete case is exactly as I have described it.

You don't believe it? Think of it this way. There are N = 15,000 students. Of these, 1% are taking one course, so the number taking one course is N1 = 0.01*15,000 = 150. Next, 3% of them are taking 2 courses, so the number taking 2 courses is 0.03*15,000 = 450, etc., etc. Number the students from 1 to 15,000 in order of the number of courses they take, so students 1-150 are in 1 course, students 151-600 are in 2 courses, etc., etc. Half the students are below the median and half are above, so the median student number can be taken as number 7,500 or 7,501. How many courses are students 7,500 and 7,501 taking? That is the median of the course distribution, and that is exactly what you would get if you carried out the procedure I suggested.

Another way to look at it is to graph the function G(x) = 15,000*F(x). The plot of y = G(x) looks like the cdf, but goes from y = 0 to y = 15,000 instead of from 0 to 1. Basically, it is plotting students numbers instead of probabilities. Now look at the two halves 0-7,500 and 7,501-15,000 on the y-axis, and see where they come out on the x-axis.

Last edited: Oct 19, 2015
5. Oct 19, 2015

### toothpaste666

so I plotted the cumulative distribution and the line y = 1/2 cuts both 5 and 6 courses. I think since 5 is the first one it cuts (the first time the cumulative probability is greater than or equal to .5) 5 courses would be the median

6. Oct 19, 2015

### Ray Vickson

You plot it once or twice in your life until you develop some intuition---they you can throw away the plots. The median $m$ is the value for which $F(m) \geq 1/2$ while $F(m-0) \leq 1/2$. Here, $F(m-0) =$ limit of $F(x)$ as $x \to m$ from below. For a discrete random variable $X$ we will typically have $F(m-0) =$ value of $F(x)$ at the preceding data point. Basically, the median $m$ is the value of $x$ where $F$ jumps up from below 1/2 to 1/2 or above.

7. Oct 19, 2015

### toothpaste666

So it is 5? I think I am starting to understand

8. Oct 19, 2015

### Ray Vickson

I get 5 as well.

9. Oct 19, 2015

### toothpaste666

thanks for all your help. Sorry I tend to get stuck at times =\