1. Limited time only! Sign up for a free 30min personal tutor trial with Chegg Tutors
    Dismiss Notice
Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

I Understanding the probability density function

  1. Oct 9, 2017 #1
    Hi all

    This is not a homework question but something work related which I am having difficulty understanding which I was hoping someone from the community could help me with.

    I am trying to understand how to interpret & create the probability density function plot from a set of data.

    For example:-
    • Below is a set of measurements of the same table which I measured 10 times.
    P1.JPG

    • As you can see I have calculated the Mean, Residuals, Squared the residuals and summed up the Squared Residuals.
    • Because I can measure the table an infinite number of times (but impossible to do so) I only measured it 10 times, so 10 is my sample population and I have been told that I need to subtract 1 from the sample population which I have done so.
    • I have then calculated the variance and standard deviation.
    I have then used each measurement of my table along with the mean and standard deviation and put them through the probability density function. This is what I get:-

    P2.JPG

    By plotting the measurements of my table (x) against the PDF (y) I get the following plot.

    p3.JPG

    I know that to find the probability of a measurement of my table to fall between 1852 - 1855 for example then I would need to integrate the P.D.F from 1855 and subtract it from the integral of the PDF to 1852.

    Hopefully I have got things correct so far.

    The question is how do I adjust this graph and data so that the mean is exactly in the middle and the x values are 1 2 and 3 standard deviations as shown in the example plot below:-

    P4.JPG

    I know this is a very long winded question but I could really appreciate your insight.

    I have attached a note pad file that contains this data.

    Many thanks.
     

    Attached Files:

  2. jcsd
  3. Oct 9, 2017 #2

    mfb

    User Avatar
    2016 Award

    Staff: Mentor

    Is it just a plotting question?
    Based on the mean and standard deviation estimated from your measurements, you can make a new table where you use (mean), (mean +- 1 standard deviation), (mean +-2 standard devations) and so on as points.
     
  4. Oct 9, 2017 #3

    FactChecker

    User Avatar
    Science Advisor
    Gold Member

    If you have a good reason to assume a known distribution of the random variable that you are sampling, then you can just plot that equation using the parameter estimates from the sample. In this case, if you know that the data is from a normal distribution, then you have an equation that you can plot.

    If you want to base a graph only on the data without assuming that the data came from a particular distribution, then you can do it this way: First plot points of the sample cumulative distribution. Then fit a smooth curve through the points making sure that it starts at 0 at the bottom and ends at 1 at the top. Finally, plot the slopes of the CDF curve to get a PDF.
     
  5. Oct 20, 2017 #4
    Thanks I managed to re-arrange the data into a new table.
     
  6. Oct 20, 2017 #5
    mfb

    Thank your response, I was hoping you could explain two additional queries I am having trouble with.


    The first is this, my Mean is 1853.910 and SD is 1.829. I have integrated the probability density function from :-
    • -1SD to +1SD (1852.081 - 1855.739) and I get a value of 68.269%.
    • -2SD to +2SD (1850.252 - 1857.568) and I get a value of 95.44997%
    • -3SD to +3SD (1848.423 - 1859.397) and I get a value of 99.73707%

    My question is what does 68.269%, 95.44997%, 99.73707% actually mean???

    What does it mean to say that between +/- 1 SD it is 68.269%.

    I think (but hoping you can confirm) that what 68.269% means is that if I randomly pick a measurement from my data set then there is a 68.269% chance that the measurement will fall within +/- 1SD.

    Or can I say that for the data set to be considered a normal distribution then 68.269% of the measurements must fall within +/- 1SD.

    Have I got this completely incorrect and misinterpreted? how would you explain what 68.269% means?


    The second question is what people call multipliers, for example:-
    • 95% = 1.96 * Standard Deviation
    • 99.7% = 2.935 * Standard Deviation
    Where does 1.96 and 2.935 (which are referred to as multipliers) come from? and why does multiplying 1.96 by the standard deviation result in 95%? I thought the percentage values come from integrating the probability density function.

    Can help explain or clarify?

    Thanks
     
  7. Oct 20, 2017 #6

    mfb

    User Avatar
    2016 Award

    Staff: Mentor

    If you randomly pick a measurement from a distribution that follows a Gaussian distribution, you get this probability. If you re-measure the length again, you get this probability that the value will be within +-1 SD.
    If you randomly pick from your small set of measurements, the probability will be something else.
    They are chosen to get 95% or 99.7% as integral, respectively. It doesn't make sense to write an equal sign there. They are just more entries to the table of "x% of the measurements will be within y SD of the mean" in the same way as you made three already.
     
  8. Oct 30, 2017 #7


    Hi mfb


    Again thank you for your insight.


    You the following:-


    If you re-measure the length again, you get this probability that the value will be within +-1 SD - This makes a lot of sense to me.


    However your comment about:-


    If you randomly pick from your small set of measurements, the probability will be something else.


    Correct me if I am wrong but I have 10 measurements, if I randomly pick a measurement from this small data set then the probability of picking any of the measurements is equally the same 1/10 or 10%. - is this what you were referring to when you said "the probability will be something else"?


    If I randomly pick a measurement from this data set (where each measurement is equally likely to be picked i.e. 10%) then is it correct to say the probability of the measurement being picked has a 68.269% chance of being between +/- 1SD?


    Your thoughts?
     
  9. Oct 31, 2017 #8

    mfb

    User Avatar
    2016 Award

    Staff: Mentor

    Right. You have some value of measurements within 1 standard deviation - but certainly not 6.8 measurements because that doesn't make sense.
    No.

    Think of rolling a die once: Before you roll you know you have a 1/6 chance to roll a 6. Afterwards you either rolled it (100% of your rolls were 6) or you did not (0% were 6), but there is no way 16.7% of your 1 rolls were 6.
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook

Have something to add?
Draft saved Draft deleted