Normal Distrib Probability: P(X>130)

Click For Summary
SUMMARY

The discussion centers on calculating the probability P(X>130) using the normal distribution formula P(X > 130) = 1 − Φ((130 − µ)/σ). The participants highlight discrepancies between the calculated mean (µ = 54.51222) and standard deviation (σ = 19.16929) from the dataset and the observed data points, which suggest that 22% of the data is above 130. The conversation emphasizes the importance of verifying calculations in R using the mean and standard deviation functions, as well as considering potential skewness in the data that could affect the normality assumption.

PREREQUISITES
  • Understanding of normal distribution and its properties
  • Familiarity with R programming and statistical functions
  • Knowledge of cumulative distribution functions (CDF)
  • Ability to interpret statistical results and data skewness
NEXT STEPS
  • Learn how to use R for statistical analysis, focusing on the sd() and mean() functions
  • Study the implications of skewness in data and its effects on normal distribution assumptions
  • Explore lognormal distributions and their applications in statistical analysis
  • Investigate methods for validating statistical calculations and results in R
USEFUL FOR

Statisticians, data analysts, students studying probability and statistics, and anyone using R for statistical calculations will benefit from this discussion.

gummz
Messages
32
Reaction score
2

Homework Statement


Find P(X>130) where
P(X > 130) = 1 − Φ( (130 − µ)/σ )

Homework Equations


Φ is the normal distribution density function:
http://en.wikipedia.org/wiki/Normal_distribution

The Attempt at a Solution


This is pretty simple to use in R if one knew how to. I get a ghastly incorrect answer. Dividing the number of items above 130 by the total number of items yields 0.022, and the above yields something like 0.4e-5 if I use the mean and stdev from the data set. So, what µ and σ are they? They can't be the ones from the data set.
 
Last edited:
Physics news on Phys.org
Your Phi should be the cumulative (CDF) function for the normal. What is your mean and stdev for this set?
 
Mean: 54.51222
Stddev: 19.16929
 
You wrote that 22% of your data points were above 130. That does not jive with the mean and stdev you posted.
130 is is 4 standard deviations above your stated mean. There is a very low statistical probability that those data came from a set with the mean and stdev you gave.
Where did those numbers come from? You clearly could not have calculated the mean and stdev from the same data set that has 22% above 130.
 
Oh, sorry I meant 0.022
 
,022 of data points above implies that should be about 2 stdevs above the mean. My question is, did you calculate the mean and stdev from this data set or were they given to you? If you calculated them, double check your work. If they were given, then go with your initial answer since observed data have nothing to do with probability based on a given mean and stdev.
 
I was not given the values, I calculated them in R with the data set I was given.
 
It feels like your stdev is about 1/2 of what it should be based on what you are saying. Can you verify the calculation in R?
 
To the best of my ability yes, I just call the sd(x) and mean(x) functions in R
 
  • #10
Is there anything odd about your settings for the functions? mean can be set to trim off extreme values, both functions may have other settings available.

It seems odd to me. The numbers you have for the probability given that mean and stdev are right, and don't match with the observed data, unless there is some skewness in the data that would break the normal assumption in the tails.
In that case, sometimes a lognormal distribution clears it up.
 
  • #11
I think I've figured it out, much like what you suggested. The sample was intentionally very inaccurate in one place (where it mattered for this exact question, but didn't matter that much as for the correlation to the normal distribution) to emphasize the effect of chance on random selection. Both the logarithmic and "regular" forms of the normal distribution gave the exact same value for P(X>130), which is very different from the answer given by calculating the ratio of X>130 within the data set because of the small sample size which skewed things quite a bit.

I think this is it at least.
 
  • #12
That makes sense. Normally, if you are making inferences about a data set, it is best to rely on empirical data, like the ratio you did. However, this question seems to be asking you what the normal probability of a set of observations is -- after assuming that your mean and stdev are correct.
 

Similar threads

  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 8 ·
Replies
8
Views
4K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
Replies
1
Views
1K
Replies
8
Views
4K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 2 ·
Replies
2
Views
1K
  • · Replies 1 ·
Replies
1
Views
2K