Undergrad Understanding the probability density function

Click For Summary
The discussion focuses on understanding and interpreting the probability density function (PDF) based on a set of measurements. The user has calculated the mean and standard deviation from their data and is trying to plot the PDF correctly, ensuring the mean is centered and the x-values represent standard deviations. They inquire about the significance of the percentages derived from integrating the PDF, specifically how 68.269% indicates the likelihood of a measurement falling within one standard deviation of the mean in a normal distribution. Additionally, they seek clarification on the multipliers (1.96 and 2.935) used for determining confidence intervals in relation to standard deviations. The conversation emphasizes the distinction between probabilities derived from a theoretical distribution versus those from a small sample set.
tomtomtom1
Messages
160
Reaction score
8
Hi all

This is not a homework question but something work related which I am having difficulty understanding which I was hoping someone from the community could help me with.

I am trying to understand how to interpret & create the probability density function plot from a set of data.

For example:-
  • Below is a set of measurements of the same table which I measured 10 times.
P1.JPG


  • As you can see I have calculated the Mean, Residuals, Squared the residuals and summed up the Squared Residuals.
  • Because I can measure the table an infinite number of times (but impossible to do so) I only measured it 10 times, so 10 is my sample population and I have been told that I need to subtract 1 from the sample population which I have done so.
  • I have then calculated the variance and standard deviation.
I have then used each measurement of my table along with the mean and standard deviation and put them through the probability density function. This is what I get:-

P2.JPG


By plotting the measurements of my table (x) against the PDF (y) I get the following plot.

p3.JPG


I know that to find the probability of a measurement of my table to fall between 1852 - 1855 for example then I would need to integrate the P.D.F from 1855 and subtract it from the integral of the PDF to 1852.

Hopefully I have got things correct so far.

The question is how do I adjust this graph and data so that the mean is exactly in the middle and the x values are 1 2 and 3 standard deviations as shown in the example plot below:-

P4.JPG


I know this is a very long winded question but I could really appreciate your insight.

I have attached a note pad file that contains this data.

Many thanks.
 

Attachments

Mathematics news on Phys.org
Is it just a plotting question?
Based on the mean and standard deviation estimated from your measurements, you can make a new table where you use (mean), (mean +- 1 standard deviation), (mean +-2 standard devations) and so on as points.
 
If you have a good reason to assume a known distribution of the random variable that you are sampling, then you can just plot that equation using the parameter estimates from the sample. In this case, if you know that the data is from a normal distribution, then you have an equation that you can plot.

If you want to base a graph only on the data without assuming that the data came from a particular distribution, then you can do it this way: First plot points of the sample cumulative distribution. Then fit a smooth curve through the points making sure that it starts at 0 at the bottom and ends at 1 at the top. Finally, plot the slopes of the CDF curve to get a PDF.
 
mfb said:
Is it just a plotting question?
Based on the mean and standard deviation estimated from your measurements, you can make a new table where you use (mean), (mean +- 1 standard deviation), (mean +-2 standard devations) and so on as points.

Thanks I managed to re-arrange the data into a new table.
 
mfb said:
Is it just a plotting question?
Based on the mean and standard deviation estimated from your measurements, you can make a new table where you use (mean), (mean +- 1 standard deviation), (mean +-2 standard devations) and so on as points.

mfb

Thank your response, I was hoping you could explain two additional queries I am having trouble with.The first is this, my Mean is 1853.910 and SD is 1.829. I have integrated the probability density function from :-
  • -1SD to +1SD (1852.081 - 1855.739) and I get a value of 68.269%.
  • -2SD to +2SD (1850.252 - 1857.568) and I get a value of 95.44997%
  • -3SD to +3SD (1848.423 - 1859.397) and I get a value of 99.73707%

My question is what does 68.269%, 95.44997%, 99.73707% actually mean?

What does it mean to say that between +/- 1 SD it is 68.269%.

I think (but hoping you can confirm) that what 68.269% means is that if I randomly pick a measurement from my data set then there is a 68.269% chance that the measurement will fall within +/- 1SD.

Or can I say that for the data set to be considered a normal distribution then 68.269% of the measurements must fall within +/- 1SD.

Have I got this completely incorrect and misinterpreted? how would you explain what 68.269% means?The second question is what people call multipliers, for example:-
  • 95% = 1.96 * Standard Deviation
  • 99.7% = 2.935 * Standard Deviation
Where does 1.96 and 2.935 (which are referred to as multipliers) come from? and why does multiplying 1.96 by the standard deviation result in 95%? I thought the percentage values come from integrating the probability density function.

Can help explain or clarify?

Thanks
 
tomtomtom1 said:
I think (but hoping you can confirm) that what 68.269% means is that if I randomly pick a measurement from my data set then there is a 68.269% chance that the measurement will fall within +/- 1SD.
If you randomly pick a measurement from a distribution that follows a Gaussian distribution, you get this probability. If you re-measure the length again, you get this probability that the value will be within +-1 SD.
If you randomly pick from your small set of measurements, the probability will be something else.
tomtomtom1 said:
Where does 1.96 and 2.935 (which are referred to as multipliers) come from?
They are chosen to get 95% or 99.7% as integral, respectively. It doesn't make sense to write an equal sign there. They are just more entries to the table of "x% of the measurements will be within y SD of the mean" in the same way as you made three already.
 
mfb said:
If you randomly pick a measurement from a distribution that follows a Gaussian distribution, you get this probability. If you re-measure the length again, you get this probability that the value will be within +-1 SD.
If you randomly pick from your small set of measurements, the probability will be something else.They are chosen to get 95% or 99.7% as integral, respectively. It doesn't make sense to write an equal sign there. They are just more entries to the table of "x% of the measurements will be within y SD of the mean" in the same way as you made three already.
mfb said:
If you randomly pick a measurement from a distribution that follows a Gaussian distribution, you get this probability. If you re-measure the length again, you get this probability that the value will be within +-1 SD.
If you randomly pick from your small set of measurements, the probability will be something else.They are chosen to get 95% or 99.7% as integral, respectively. It doesn't make sense to write an equal sign there. They are just more entries to the table of "x% of the measurements will be within y SD of the mean" in the same way as you made three already.
Hi mfbAgain thank you for your insight.You the following:-If you re-measure the length again, you get this probability that the value will be within +-1 SD - This makes a lot of sense to me.However your comment about:-If you randomly pick from your small set of measurements, the probability will be something else.Correct me if I am wrong but I have 10 measurements, if I randomly pick a measurement from this small data set then the probability of picking any of the measurements is equally the same 1/10 or 10%. - is this what you were referring to when you said "the probability will be something else"?If I randomly pick a measurement from this data set (where each measurement is equally likely to be picked i.e. 10%) then is it correct to say the probability of the measurement being picked has a 68.269% chance of being between +/- 1SD?Your thoughts?
 
tomtomtom1 said:
Correct me if I am wrong but I have 10 measurements, if I randomly pick a measurement from this small data set then the probability of picking any of the measurements is equally the same 1/10 or 10%. - is this what you were referring to when you said "the probability will be something else"?
Right. You have some value of measurements within 1 standard deviation - but certainly not 6.8 measurements because that doesn't make sense.
tomtomtom1 said:
If I randomly pick a measurement from this data set (where each measurement is equally likely to be picked i.e. 10%) then is it correct to say the probability of the measurement being picked has a 68.269% chance of being between +/- 1SD?
No.

Think of rolling a die once: Before you roll you know you have a 1/6 chance to roll a 6. Afterwards you either rolled it (100% of your rolls were 6) or you did not (0% were 6), but there is no way 16.7% of your 1 rolls were 6.
 

Similar threads

  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 3 ·
Replies
3
Views
2K
Replies
4
Views
2K
Replies
1
Views
4K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 18 ·
Replies
18
Views
2K
  • · Replies 4 ·
Replies
4
Views
1K
  • · Replies 19 ·
Replies
19
Views
3K
  • · Replies 16 ·
Replies
16
Views
2K
  • · Replies 1 ·
Replies
1
Views
1K