Discrete probability distribution

Click For Summary

Homework Help Overview

This discussion revolves around a discrete probability distribution related to the number of courses students are registered for at a school. The problem involves calculating the cumulative distribution function (cdf), expected value, standard deviation, and median of the distribution based on given probabilities for different course enrollments.

Discussion Character

  • Exploratory, Conceptual clarification, Mathematical reasoning, Assumption checking

Approaches and Questions Raised

  • Participants explore the definition and calculation of the cdf, questioning what is required in part (a) of the problem. There are attempts to clarify the meaning of the median in the context of a discrete distribution, with some confusion about how to interpret cumulative probabilities and the role of the total number of students.

Discussion Status

Some participants have offered insights into the calculation of the cdf and median, suggesting methods to visualize the cumulative distribution. There is an ongoing exploration of how to determine the median based on cumulative probabilities, with differing interpretations of the results. No explicit consensus has been reached regarding the correct median value.

Contextual Notes

Participants express uncertainty about the implications of the total number of registered students and how it relates to the calculations. There is also mention of the need for clarity in understanding the definitions of statistical terms in the context of discrete distributions.

toothpaste666
Messages
517
Reaction score
20

Homework Statement


1. Consider selecting at random a student who is among the 15,000 registered for the current semester at a school Let X be the number of courses for which the selected student is registered and suppose that X has probability distribution

x: 1 2 3 4 5 6
f(x): .01 .03 .13 .25 .39 .19

(a) Find the cdf of X.

(b) Find the expected number of courses taken by a student in this semester.

(c) Find the standard deviation of X.

(d) Find the median of this distribution.

The Attempt at a Solution


a)This part I am confused of what they want, since X is not specified. It seems like they already provided me with the cdf

b) this is the summation of the xf(x) 's
(1)(.01) + 2(.03) + 3(.13) + 4(.25) + 5(.39) + 6(.19) = 4.55

c) the variance is the sum of the (x-μ)^2f(x) 's
(1-4.55)^2(.01) + (2-4.55)^2(.03) + (3-4.55)^2(.13) + (4-4.55)^2(.25) + (5-4.55)^2(.39) + (6-4.55)^2(.19)
= 1.1875
and the standard deviation is the square root of that
= 1.09

d) put them in order
.01 .03 .13 .19 .25 .39

the median is (.13 + .19)/2 = .16

unless I am trying to find the median number of courses taken?
in that case
1 2 3 6 4 5
(3+6)/2 = 9/2 = 4.5 courses

I am not that confident I did this right because I didn't use the number of registered students they gave me and I don't think I understood part a)
 
Physics news on Phys.org
toothpaste666 said:

Homework Statement


1. Consider selecting at random a student who is among the 15,000 registered for the current semester at a school Let X be the number of courses for which the selected student is registered and suppose that X has probability distribution

x: 1 2 3 4 5 6
f(x): .01 .03 .13 .25 .39 .19

(a) Find the cdf of X.

(b) Find the expected number of courses taken by a student in this semester.

(c) Find the standard deviation of X.

(d) Find the median of this distribution.

The Attempt at a Solution


a)This part I am confused of what they want, since X is not specified. It seems like they already provided me with the cdf

b) this is the summation of the xf(x) 's
(1)(.01) + 2(.03) + 3(.13) + 4(.25) + 5(.39) + 6(.19) = 4.55

c) the variance is the sum of the (x-μ)^2f(x) 's
(1-4.55)^2(.01) + (2-4.55)^2(.03) + (3-4.55)^2(.13) + (4-4.55)^2(.25) + (5-4.55)^2(.39) + (6-4.55)^2(.19)
= 1.1875
and the standard deviation is the square root of that
= 1.09

d) put them in order
.01 .03 .13 .19 .25 .39

the median is (.13 + .19)/2 = .16

unless I am trying to find the median number of courses taken?
in that case
1 2 3 6 4 5
(3+6)/2 = 9/2 = 4.5 courses

I am not that confident I did this right because I didn't use the number of registered students they gave me and I don't think I understood part a)

In (a): of course X is specified. You wrote "Let X be the number of courses for which the selected student is registered and suppose that X has probability distribution..."

I suppose you may be a bit confused about who the "selected student" is, and how his/her course probability distribution is obtained. Basically, the problem is just specifying that 1% of the students take exactly 1 course, that 3% take exactly two courses, etc. And, of course, if you actually look at student John Smith he will be taking some specific number of course, either 1 or 2 or 3 or 4 or 5 or 6, with no probabilities involved anywhere. However, prior to the selection, you will not know the actual number that is going to occur, only the chances of the various numbers. Is that what was throwing you off?

Just to be accurate: the standard deviation is not 1.09; it is approximately 1.089724736, which is, in turn approximately 1.09. Saying "approximate" instead of "equals" will not hurt you, and it makes clear that you understand the difference.

The median of the distribution is not what you wrote (except, maybe, by accident): in probability and statistics, the median is the 50th percentile on the cdf. So, if you plot the graph y = F(x) of the cdf F (including vertical line segments at the jumps of F), you can think of the median as the point x where the line y = 1/2 cuts the graph y = F(x). (Sometimes, if F(x) = 1/2 on an interval [a,b), the whole segment a->b can be thought of as a median, but usually one would pick a point in the interval and use that as the median. There may be different conventions for how to do that.)

The "median" you obtained would be OK for the uniform distribution, where each point had probability 1/5.
 
for part a) X is the number of courses he is registered for so the cdf for X is the sum of all the f(x)'s <= f(X) ?

for the median i need to find the number of courses that 50% of the students have less than or equal to. 42% have less than or equal to 4 and 81% have less than or equal to 5. So the median would be somewhere in between 4 and 5 , but since there are only integer numbers of courses there is no number of courses in between 4 and 5. I am sorry if I am just not getting it, but I still don't understand how this works in the discrete case
 
toothpaste666 said:
for part a) X is the number of courses he is registered for so the cdf for X is the sum of all the f(x)'s <= f(X) ?

for the median i need to find the number of courses that 50% of the students have less than or equal to. 42% have less than or equal to 4 and 81% have less than or equal to 5. So the median would be somewhere in between 4 and 5 , but since there are only integer numbers of courses there is no number of courses in between 4 and 5. I am sorry if I am just not getting it, but I still don't understand how this works in the discrete case

Have you tried what I suggested? Did you plot the graph of y = F(x) (with vertical lines inserted at the jump points) then find where that graph meets the line y = 1/2? That will give you the median. It will be a whole number of courses, not a fraction. How it works in the discrete case is exactly as I have described it.

You don't believe it? Think of it this way. There are N = 15,000 students. Of these, 1% are taking one course, so the number taking one course is N1 = 0.01*15,000 = 150. Next, 3% of them are taking 2 courses, so the number taking 2 courses is 0.03*15,000 = 450, etc., etc. Number the students from 1 to 15,000 in order of the number of courses they take, so students 1-150 are in 1 course, students 151-600 are in 2 courses, etc., etc. Half the students are below the median and half are above, so the median student number can be taken as number 7,500 or 7,501. How many courses are students 7,500 and 7,501 taking? That is the median of the course distribution, and that is exactly what you would get if you carried out the procedure I suggested.

Another way to look at it is to graph the function G(x) = 15,000*F(x). The plot of y = G(x) looks like the cdf, but goes from y = 0 to y = 15,000 instead of from 0 to 1. Basically, it is plotting students numbers instead of probabilities. Now look at the two halves 0-7,500 and 7,501-15,000 on the y-axis, and see where they come out on the x-axis.
 
Last edited:
so I plotted the cumulative distribution and the line y = 1/2 cuts both 5 and 6 courses. I think since 5 is the first one it cuts (the first time the cumulative probability is greater than or equal to .5) 5 courses would be the median
 
toothpaste666 said:
so I plotted the cumulative distribution and the line y = 1/2 cuts both 5 and 6 courses. I think since 5 is the first one it cuts (the first time the cumulative probability is greater than or equal to .5) 5 courses would be the median

You plot it once or twice in your life until you develop some intuition---they you can throw away the plots. The median ##m## is the value for which ##F(m) \geq 1/2## while ##F(m-0) \leq 1/2##. Here, ##F(m-0) = ## limit of ##F(x)## as ##x \to m## from below. For a discrete random variable ##X## we will typically have ##F(m-0) = ## value of ##F(x)## at the preceding data point. Basically, the median ##m## is the value of ##x## where ##F## jumps up from below 1/2 to 1/2 or above.
 
  • Like
Likes   Reactions: toothpaste666
So it is 5? I think I am starting to understand
 
toothpaste666 said:
So it is 5? I think I am starting to understand

I get 5 as well.
 
  • Like
Likes   Reactions: toothpaste666
thanks for all your help. Sorry I tend to get stuck at times =\
 

Similar threads

  • · Replies 13 ·
Replies
13
Views
2K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 23 ·
Replies
23
Views
4K
Replies
6
Views
2K
  • · Replies 13 ·
Replies
13
Views
2K
  • · Replies 7 ·
Replies
7
Views
3K
Replies
3
Views
2K
Replies
14
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 27 ·
Replies
27
Views
2K