# Normal distribution

1. Mar 12, 2015

### gummz

1. The problem statement, all variables and given/known data
Find P(X>130) where
P(X > 130) = 1 − Φ( (130 − µ)/σ )

2. Relevant equations
Φ is the normal distribution density function:
http://en.wikipedia.org/wiki/Normal_distribution

3. The attempt at a solution
This is pretty simple to use in R if one knew how to. I get a ghastly incorrect answer. Dividing the number of items above 130 by the total number of items yields 0.022, and the above yields something like 0.4e-5 if I use the mean and stdev from the data set. So, what µ and σ are they? They can't be the ones from the data set.

Last edited: Mar 12, 2015
2. Mar 12, 2015

### RUber

Your Phi should be the cumulative (CDF) function for the normal. What is your mean and stdev for this set?

3. Mar 12, 2015

### gummz

Mean: 54.51222
Stddev: 19.16929

4. Mar 12, 2015

### RUber

You wrote that 22% of your data points were above 130. That does not jive with the mean and stdev you posted.
130 is is 4 standard deviations above your stated mean. There is a very low statistical probability that those data came from a set with the mean and stdev you gave.
Where did those numbers come from? You clearly could not have calculated the mean and stdev from the same data set that has 22% above 130.

5. Mar 12, 2015

### gummz

Oh, sorry I meant 0.022

6. Mar 12, 2015

### RUber

,022 of data points above implies that should be about 2 stdevs above the mean. My question is, did you calculate the mean and stdev from this data set or were they given to you? If you calculated them, double check your work. If they were given, then go with your initial answer since observed data have nothing to do with probability based on a given mean and stdev.

7. Mar 12, 2015

### gummz

I was not given the values, I calculated them in R with the data set I was given.

8. Mar 12, 2015

### RUber

It feels like your stdev is about 1/2 of what it should be based on what you are saying. Can you verify the calculation in R?

9. Mar 12, 2015

### gummz

To the best of my ability yes, I just call the sd(x) and mean(x) functions in R

10. Mar 12, 2015

### RUber

Is there anything odd about your settings for the functions? mean can be set to trim off extreme values, both functions may have other settings available.

It seems odd to me. The numbers you have for the probability given that mean and stdev are right, and don't match with the observed data, unless there is some skewness in the data that would break the normal assumption in the tails.
In that case, sometimes a lognormal distribution clears it up.

11. Mar 12, 2015

### gummz

I think I've figured it out, much like what you suggested. The sample was intentionally very inaccurate in one place (where it mattered for this exact question, but didn't matter that much as for the correlation to the normal distribution) to emphasize the effect of chance on random selection. Both the logarithmic and "regular" forms of the normal distribution gave the exact same value for P(X>130), which is very different from the answer given by calculating the ratio of X>130 within the data set because of the small sample size which skewed things quite a bit.

I think this is it at least.

12. Mar 12, 2015

### RUber

That makes sense. Normally, if you are making inferences about a data set, it is best to rely on empirical data, like the ratio you did. However, this question seems to be asking you what the normal probability of a set of observations is -- after assuming that your mean and stdev are correct.