# Normal distribution

## Homework Statement

Find P(X>130) where
P(X > 130) = 1 − Φ( (130 − µ)/σ )

## Homework Equations

Φ is the normal distribution density function:
http://en.wikipedia.org/wiki/Normal_distribution

## The Attempt at a Solution

This is pretty simple to use in R if one knew how to. I get a ghastly incorrect answer. Dividing the number of items above 130 by the total number of items yields 0.022, and the above yields something like 0.4e-5 if I use the mean and stdev from the data set. So, what µ and σ are they? They can't be the ones from the data set.

Last edited:

RUber
Homework Helper
Your Phi should be the cumulative (CDF) function for the normal. What is your mean and stdev for this set?

Mean: 54.51222
Stddev: 19.16929

RUber
Homework Helper
You wrote that 22% of your data points were above 130. That does not jive with the mean and stdev you posted.
130 is is 4 standard deviations above your stated mean. There is a very low statistical probability that those data came from a set with the mean and stdev you gave.
Where did those numbers come from? You clearly could not have calculated the mean and stdev from the same data set that has 22% above 130.

Oh, sorry I meant 0.022

RUber
Homework Helper
,022 of data points above implies that should be about 2 stdevs above the mean. My question is, did you calculate the mean and stdev from this data set or were they given to you? If you calculated them, double check your work. If they were given, then go with your initial answer since observed data have nothing to do with probability based on a given mean and stdev.

I was not given the values, I calculated them in R with the data set I was given.

RUber
Homework Helper
It feels like your stdev is about 1/2 of what it should be based on what you are saying. Can you verify the calculation in R?

To the best of my ability yes, I just call the sd(x) and mean(x) functions in R

RUber
Homework Helper
Is there anything odd about your settings for the functions? mean can be set to trim off extreme values, both functions may have other settings available.

It seems odd to me. The numbers you have for the probability given that mean and stdev are right, and don't match with the observed data, unless there is some skewness in the data that would break the normal assumption in the tails.
In that case, sometimes a lognormal distribution clears it up.

I think I've figured it out, much like what you suggested. The sample was intentionally very inaccurate in one place (where it mattered for this exact question, but didn't matter that much as for the correlation to the normal distribution) to emphasize the effect of chance on random selection. Both the logarithmic and "regular" forms of the normal distribution gave the exact same value for P(X>130), which is very different from the answer given by calculating the ratio of X>130 within the data set because of the small sample size which skewed things quite a bit.

I think this is it at least.

RUber
Homework Helper
That makes sense. Normally, if you are making inferences about a data set, it is best to rely on empirical data, like the ratio you did. However, this question seems to be asking you what the normal probability of a set of observations is -- after assuming that your mean and stdev are correct.