Normal Distrib Probability: P(X>130)

Click For Summary

Homework Help Overview

The discussion revolves around finding the probability P(X>130) using the normal distribution, specifically focusing on the parameters mean (µ) and standard deviation (σ) derived from a given data set. Participants are exploring the implications of these parameters on the calculated probability and the observed data distribution.

Discussion Character

  • Exploratory, Assumption checking, Problem interpretation

Approaches and Questions Raised

  • Participants discuss the use of the cumulative distribution function (CDF) for the normal distribution and question the validity of the calculated mean and standard deviation in relation to the observed data. There is an exploration of discrepancies between the calculated probability and the observed frequency of data points above 130.

Discussion Status

The discussion is active, with participants providing guidance on verifying calculations and questioning the assumptions made about the data set. There is acknowledgment of potential skewness in the data and the impact it may have on the normality assumption. Some participants suggest alternative distributions if the normal assumption does not hold.

Contextual Notes

Participants note that the sample may have been intentionally inaccurate in certain areas, affecting the probability calculations. There is also mention of the small sample size potentially skewing results, which raises questions about the reliability of the empirical data versus theoretical probabilities.

gummz
Messages
32
Reaction score
2

Homework Statement


Find P(X>130) where
P(X > 130) = 1 − Φ( (130 − µ)/σ )

Homework Equations


Φ is the normal distribution density function:
http://en.wikipedia.org/wiki/Normal_distribution

The Attempt at a Solution


This is pretty simple to use in R if one knew how to. I get a ghastly incorrect answer. Dividing the number of items above 130 by the total number of items yields 0.022, and the above yields something like 0.4e-5 if I use the mean and stdev from the data set. So, what µ and σ are they? They can't be the ones from the data set.
 
Last edited:
Physics news on Phys.org
Your Phi should be the cumulative (CDF) function for the normal. What is your mean and stdev for this set?
 
Mean: 54.51222
Stddev: 19.16929
 
You wrote that 22% of your data points were above 130. That does not jive with the mean and stdev you posted.
130 is is 4 standard deviations above your stated mean. There is a very low statistical probability that those data came from a set with the mean and stdev you gave.
Where did those numbers come from? You clearly could not have calculated the mean and stdev from the same data set that has 22% above 130.
 
Oh, sorry I meant 0.022
 
,022 of data points above implies that should be about 2 stdevs above the mean. My question is, did you calculate the mean and stdev from this data set or were they given to you? If you calculated them, double check your work. If they were given, then go with your initial answer since observed data have nothing to do with probability based on a given mean and stdev.
 
I was not given the values, I calculated them in R with the data set I was given.
 
It feels like your stdev is about 1/2 of what it should be based on what you are saying. Can you verify the calculation in R?
 
To the best of my ability yes, I just call the sd(x) and mean(x) functions in R
 
  • #10
Is there anything odd about your settings for the functions? mean can be set to trim off extreme values, both functions may have other settings available.

It seems odd to me. The numbers you have for the probability given that mean and stdev are right, and don't match with the observed data, unless there is some skewness in the data that would break the normal assumption in the tails.
In that case, sometimes a lognormal distribution clears it up.
 
  • #11
I think I've figured it out, much like what you suggested. The sample was intentionally very inaccurate in one place (where it mattered for this exact question, but didn't matter that much as for the correlation to the normal distribution) to emphasize the effect of chance on random selection. Both the logarithmic and "regular" forms of the normal distribution gave the exact same value for P(X>130), which is very different from the answer given by calculating the ratio of X>130 within the data set because of the small sample size which skewed things quite a bit.

I think this is it at least.
 
  • #12
That makes sense. Normally, if you are making inferences about a data set, it is best to rely on empirical data, like the ratio you did. However, this question seems to be asking you what the normal probability of a set of observations is -- after assuming that your mean and stdev are correct.
 

Similar threads

  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 8 ·
Replies
8
Views
4K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
Replies
1
Views
1K
Replies
8
Views
4K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 2 ·
Replies
2
Views
1K
  • · Replies 1 ·
Replies
1
Views
2K