Discrete probability distribution

In summary: F(x)." The actual median is not always the same as the "line y = 1/2 cuts the graph y = F(x)" because sometimes x is closer to the "line y = 1/2" than other times.
  • #1
toothpaste666
516
20

Homework Statement


1. Consider selecting at random a student who is among the 15,000 registered for the current semester at a school Let X be the number of courses for which the selected student is registered and suppose that X has probability distribution

x: 1 2 3 4 5 6
f(x): .01 .03 .13 .25 .39 .19

(a) Find the cdf of X.

(b) Find the expected number of courses taken by a student in this semester.

(c) Find the standard deviation of X.

(d) Find the median of this distribution.

The Attempt at a Solution


a)This part I am confused of what they want, since X is not specified. It seems like they already provided me with the cdf

b) this is the summation of the xf(x) 's
(1)(.01) + 2(.03) + 3(.13) + 4(.25) + 5(.39) + 6(.19) = 4.55

c) the variance is the sum of the (x-μ)^2f(x) 's
(1-4.55)^2(.01) + (2-4.55)^2(.03) + (3-4.55)^2(.13) + (4-4.55)^2(.25) + (5-4.55)^2(.39) + (6-4.55)^2(.19)
= 1.1875
and the standard deviation is the square root of that
= 1.09

d) put them in order
.01 .03 .13 .19 .25 .39

the median is (.13 + .19)/2 = .16

unless I am trying to find the median number of courses taken?
in that case
1 2 3 6 4 5
(3+6)/2 = 9/2 = 4.5 courses

I am not that confident I did this right because I didn't use the number of registered students they gave me and I don't think I understood part a)
 
Physics news on Phys.org
  • #2
toothpaste666 said:

Homework Statement


1. Consider selecting at random a student who is among the 15,000 registered for the current semester at a school Let X be the number of courses for which the selected student is registered and suppose that X has probability distribution

x: 1 2 3 4 5 6
f(x): .01 .03 .13 .25 .39 .19

(a) Find the cdf of X.

(b) Find the expected number of courses taken by a student in this semester.

(c) Find the standard deviation of X.

(d) Find the median of this distribution.

The Attempt at a Solution


a)This part I am confused of what they want, since X is not specified. It seems like they already provided me with the cdf

b) this is the summation of the xf(x) 's
(1)(.01) + 2(.03) + 3(.13) + 4(.25) + 5(.39) + 6(.19) = 4.55

c) the variance is the sum of the (x-μ)^2f(x) 's
(1-4.55)^2(.01) + (2-4.55)^2(.03) + (3-4.55)^2(.13) + (4-4.55)^2(.25) + (5-4.55)^2(.39) + (6-4.55)^2(.19)
= 1.1875
and the standard deviation is the square root of that
= 1.09

d) put them in order
.01 .03 .13 .19 .25 .39

the median is (.13 + .19)/2 = .16

unless I am trying to find the median number of courses taken?
in that case
1 2 3 6 4 5
(3+6)/2 = 9/2 = 4.5 courses

I am not that confident I did this right because I didn't use the number of registered students they gave me and I don't think I understood part a)

In (a): of course X is specified. You wrote "Let X be the number of courses for which the selected student is registered and suppose that X has probability distribution..."

I suppose you may be a bit confused about who the "selected student" is, and how his/her course probability distribution is obtained. Basically, the problem is just specifying that 1% of the students take exactly 1 course, that 3% take exactly two courses, etc. And, of course, if you actually look at student John Smith he will be taking some specific number of course, either 1 or 2 or 3 or 4 or 5 or 6, with no probabilities involved anywhere. However, prior to the selection, you will not know the actual number that is going to occur, only the chances of the various numbers. Is that what was throwing you off?

Just to be accurate: the standard deviation is not 1.09; it is approximately 1.089724736, which is, in turn approximately 1.09. Saying "approximate" instead of "equals" will not hurt you, and it makes clear that you understand the difference.

The median of the distribution is not what you wrote (except, maybe, by accident): in probability and statistics, the median is the 50th percentile on the cdf. So, if you plot the graph y = F(x) of the cdf F (including vertical line segments at the jumps of F), you can think of the median as the point x where the line y = 1/2 cuts the graph y = F(x). (Sometimes, if F(x) = 1/2 on an interval [a,b), the whole segment a->b can be thought of as a median, but usually one would pick a point in the interval and use that as the median. There may be different conventions for how to do that.)

The "median" you obtained would be OK for the uniform distribution, where each point had probability 1/5.
 
  • #3
for part a) X is the number of courses he is registered for so the cdf for X is the sum of all the f(x)'s <= f(X) ?

for the median i need to find the number of courses that 50% of the students have less than or equal to. 42% have less than or equal to 4 and 81% have less than or equal to 5. So the median would be somewhere in between 4 and 5 , but since there are only integer numbers of courses there is no number of courses in between 4 and 5. I am sorry if I am just not getting it, but I still don't understand how this works in the discrete case
 
  • #4
toothpaste666 said:
for part a) X is the number of courses he is registered for so the cdf for X is the sum of all the f(x)'s <= f(X) ?

for the median i need to find the number of courses that 50% of the students have less than or equal to. 42% have less than or equal to 4 and 81% have less than or equal to 5. So the median would be somewhere in between 4 and 5 , but since there are only integer numbers of courses there is no number of courses in between 4 and 5. I am sorry if I am just not getting it, but I still don't understand how this works in the discrete case

Have you tried what I suggested? Did you plot the graph of y = F(x) (with vertical lines inserted at the jump points) then find where that graph meets the line y = 1/2? That will give you the median. It will be a whole number of courses, not a fraction. How it works in the discrete case is exactly as I have described it.

You don't believe it? Think of it this way. There are N = 15,000 students. Of these, 1% are taking one course, so the number taking one course is N1 = 0.01*15,000 = 150. Next, 3% of them are taking 2 courses, so the number taking 2 courses is 0.03*15,000 = 450, etc., etc. Number the students from 1 to 15,000 in order of the number of courses they take, so students 1-150 are in 1 course, students 151-600 are in 2 courses, etc., etc. Half the students are below the median and half are above, so the median student number can be taken as number 7,500 or 7,501. How many courses are students 7,500 and 7,501 taking? That is the median of the course distribution, and that is exactly what you would get if you carried out the procedure I suggested.

Another way to look at it is to graph the function G(x) = 15,000*F(x). The plot of y = G(x) looks like the cdf, but goes from y = 0 to y = 15,000 instead of from 0 to 1. Basically, it is plotting students numbers instead of probabilities. Now look at the two halves 0-7,500 and 7,501-15,000 on the y-axis, and see where they come out on the x-axis.
 
Last edited:
  • #5
so I plotted the cumulative distribution and the line y = 1/2 cuts both 5 and 6 courses. I think since 5 is the first one it cuts (the first time the cumulative probability is greater than or equal to .5) 5 courses would be the median
 
  • #6
toothpaste666 said:
so I plotted the cumulative distribution and the line y = 1/2 cuts both 5 and 6 courses. I think since 5 is the first one it cuts (the first time the cumulative probability is greater than or equal to .5) 5 courses would be the median

You plot it once or twice in your life until you develop some intuition---they you can throw away the plots. The median ##m## is the value for which ##F(m) \geq 1/2## while ##F(m-0) \leq 1/2##. Here, ##F(m-0) = ## limit of ##F(x)## as ##x \to m## from below. For a discrete random variable ##X## we will typically have ##F(m-0) = ## value of ##F(x)## at the preceding data point. Basically, the median ##m## is the value of ##x## where ##F## jumps up from below 1/2 to 1/2 or above.
 
  • Like
Likes toothpaste666
  • #7
So it is 5? I think I am starting to understand
 
  • #8
toothpaste666 said:
So it is 5? I think I am starting to understand

I get 5 as well.
 
  • Like
Likes toothpaste666
  • #9
thanks for all your help. Sorry I tend to get stuck at times =\
 

What is a discrete probability distribution?

A discrete probability distribution is a statistical concept that describes the probability of each possible outcome of a discrete random variable. It is represented by a table, graph or formula and shows the probabilities associated with each possible value of the random variable.

How is a discrete probability distribution different from a continuous probability distribution?

A discrete probability distribution deals with discrete random variables, which have a finite or countable number of possible outcomes. In contrast, a continuous probability distribution deals with continuous random variables, which can have an infinite number of possible outcomes within a given range.

What are some examples of discrete probability distributions?

Some common examples of discrete probability distributions include the binomial distribution, Poisson distribution, and geometric distribution. These distributions are used to model events with a fixed number of possible outcomes, such as coin flips, dice rolls, or the number of customers arriving at a store in a given time period.

How do you calculate probabilities using a discrete probability distribution?

To calculate probabilities using a discrete probability distribution, you first need to determine the probability of each possible outcome. This can be done by either using a formula or by creating a probability distribution table. Then, to find the probability of a specific outcome, you multiply the probability of that outcome by the number of times it occurs.

What is the importance of understanding discrete probability distributions?

Understanding discrete probability distributions is important in many fields, including statistics, finance, and engineering. It allows us to analyze and predict the likelihood of certain events occurring, which is useful for decision-making and risk management. It also serves as the foundation for more complex statistical concepts, such as hypothesis testing and regression analysis.

Similar threads

  • Calculus and Beyond Homework Help
Replies
6
Views
610
  • Calculus and Beyond Homework Help
Replies
1
Views
1K
  • Calculus and Beyond Homework Help
Replies
13
Views
1K
  • Calculus and Beyond Homework Help
Replies
23
Views
3K
  • Calculus and Beyond Homework Help
Replies
13
Views
946
  • Calculus and Beyond Homework Help
Replies
14
Views
746
  • Calculus and Beyond Homework Help
Replies
4
Views
837
  • Calculus and Beyond Homework Help
Replies
5
Views
1K
  • Calculus and Beyond Homework Help
Replies
6
Views
1K
  • Calculus and Beyond Homework Help
Replies
1
Views
615
Back
Top