Statistical significance in experimentally obtained data sets

Click For Summary

Discussion Overview

The discussion revolves around the statistical analysis of pressure spikes observed in engine testing data. Participants explore methods to determine the significance of these spikes and whether they indicate a problem with the pump or the engine itself. The focus is on the application of statistical tests, particularly the Student's T-Distribution, in analyzing experimental data.

Discussion Character

  • Technical explanation, Debate/contested, Experimental/applied

Main Points Raised

  • One participant describes observing unexplained pressure spikes in engine data and seeks statistical methods to assess their significance.
  • Another participant points out a misinterpretation of the T-test results, clarifying that a high T-value indicates that the observed spike is unlikely to be random noise.
  • A different participant criticizes the use of Excel for statistical analysis, suggesting that it miscalculates p-values and that the data may not meet the independence assumption required for T-tests.
  • There is a request for clarification regarding the horizontal axis of the attached graph, which is confirmed to represent time with measurements recorded once per second.

Areas of Agreement / Disagreement

Participants express differing views on the appropriate statistical methods to analyze the data, with some agreeing on the misinterpretation of the T-test while others raise concerns about the validity of the analysis due to potential dependencies in the data.

Contextual Notes

Participants note limitations related to the assumptions of independence in the data and the appropriateness of using Excel for statistical calculations.

Raddy13
Messages
30
Reaction score
0
I have a set of data that was recorded from an engine that we are testing. We've noticed lately that a particular pressure value will sometimes spike with no apparent explanation, as seen in the attached graph. The pressure in question is passively regulated by a pump, but it is also dependent on operating factors in the engine. I'm trying to narrow down whether this spike is indicative of a problem with the pump or if there's something else wrong with the engine to cause this spike. Plotting the pressure against other measurements hasn't really helped, but I wanted to see if there's a way I can statistically measure the significance of changes that occur, or whether it's just typical variation.

I recall from Stats that my professor used the Student's T-Distribution to measure whether a change between two data sets (in my case, before and after the pressure spike) was statistically significant, but when I tried it in Excel (using this tutorial, it gave me exactly a 100% probability that this pressure spike occurred by chance (we know from testing that's not the case). I've since read that the T-distribution is typically used when the standard deviation for a sample isn't directly measurable, but I do know that from the data, so is there another way to go about this?

EDIT: Just to clarify, to obtain my T-dist value, I set mu = average pressure before the spike, xbar = average pressure after the spike, s = standard deviation of pressure after the spike, and n is the number of data points after the spike (201 in my case). I plugged that into the T-value formula and got 213.4, and the T-dist probability for that is 1.
 

Attachments

  • pressure.png
    pressure.png
    5.2 KB · Views: 530
Last edited:
Physics news on Phys.org
You interpreted your T-test backwards. That 100% probability number says that if the data sampled after the spike was drawn from the same distribution as the data before the spike, then there is a 100% probability you would have seen a smaller T-value than 213.4. So the probability you would see 213.4 or greater is essentially 0, which means that this is NOT just random noise.
 
"I plugged that into the T-value formula and got 213.4, and the T-dist probability for that is 1."

Excel is a horrible tool for statistical analysis of any kind, for many reasons. Here you've run into an old "feature" of the program that is, in essence, a foolish way it calculates p-values. If you sketch the t-distribution and locate your t-value you see it is far to the right of the distribution. Depending on your alternative hypothesis the p-value is either the area to the right of this t, or twice the area to the right of this t. The number Excel gives is the area to its left, which means the p-value is 0.

But I'm not sure this is the analysis you should do. The measurements are clearly not independent, since they come from the same mechanism, and t-tests require independent values.
What is graphed on the horizontal axis of the plot you attached?
 
Office_Shredder said:
You interpreted your T-test backwards. That 100% probability number says that if the data sampled after the spike was drawn from the same distribution as the data before the spike, then there is a 100% probability you would have seen a smaller T-value than 213.4. So the probability you would see 213.4 or greater is essentially 0, which means that this is NOT just random noise.

My mistake, I will keep that in mind!

What is graphed on the horizontal axis of the plot you attached?

Time, measurements are recorded once per second.
 

Similar threads

  • · Replies 9 ·
Replies
9
Views
3K
  • · Replies 14 ·
Replies
14
Views
2K
  • · Replies 5 ·
Replies
5
Views
6K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 3 ·
Replies
3
Views
1K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K