Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Statistical significance in experimentally obtained data sets

  1. Jun 17, 2013 #1
    I have a set of data that was recorded from an engine that we are testing. We've noticed lately that a particular pressure value will sometimes spike with no apparent explanation, as seen in the attached graph. The pressure in question is passively regulated by a pump, but it is also dependent on operating factors in the engine. I'm trying to narrow down whether this spike is indicative of a problem with the pump or if there's something else wrong with the engine to cause this spike. Plotting the pressure against other measurements hasn't really helped, but I wanted to see if there's a way I can statistically measure the significance of changes that occur, or whether it's just typical variation.

    I recall from Stats that my professor used the Student's T-Distribution to measure whether a change between two data sets (in my case, before and after the pressure spike) was statistically significant, but when I tried it in Excel (using this tutorial, it gave me exactly a 100% probability that this pressure spike occurred by chance (we know from testing that's not the case). I've since read that the T-distribution is typically used when the standard deviation for a sample isn't directly measurable, but I do know that from the data, so is there another way to go about this?

    EDIT: Just to clarify, to obtain my T-dist value, I set mu = average pressure before the spike, xbar = average pressure after the spike, s = standard deviation of pressure after the spike, and n is the number of data points after the spike (201 in my case). I plugged that into the T-value formula and got 213.4, and the T-dist probability for that is 1.

    Attached Files:

    Last edited: Jun 17, 2013
  2. jcsd
  3. Jun 17, 2013 #2


    User Avatar
    Staff Emeritus
    Science Advisor
    Gold Member

    You interpreted your T-test backwards. That 100% probability number says that if the data sampled after the spike was drawn from the same distribution as the data before the spike, then there is a 100% probability you would have seen a smaller T-value than 213.4. So the probability you would see 213.4 or greater is essentially 0, which means that this is NOT just random noise.
  4. Jun 17, 2013 #3


    User Avatar
    Homework Helper

    "I plugged that into the T-value formula and got 213.4, and the T-dist probability for that is 1."

    Excel is a horrible tool for statistical analysis of any kind, for many reasons. Here you've run into an old "feature" of the program that is, in essence, a foolish way it calculates p-values. If you sketch the t-distribution and locate your t-value you see it is far to the right of the distribution. Depending on your alternative hypothesis the p-value is either the area to the right of this t, or twice the area to the right of this t. The number Excel gives is the area to its left, which means the p-value is 0.

    But I'm not sure this is the analysis you should do. The measurements are clearly not independent, since they come from the same mechanism, and t-tests require independent values.
    What is graphed on the horizontal axis of the plot you attached?
  5. Jun 17, 2013 #4
    My mistake, I will keep that in mind!

    Time, measurements are recorded once per second.
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook