Statistical significance in experimentally obtained data sets

Click For Summary
SUMMARY

The discussion focuses on analyzing a pressure spike in engine testing data using statistical significance methods. The user initially applied the Student's T-Distribution in Excel, resulting in a misleading 100% probability that the spike was due to chance. However, further clarification revealed that the T-value of 213.4 indicates a near-zero probability of the spike being random noise. The conversation highlights the inadequacies of Excel for statistical analysis and emphasizes the need for appropriate statistical methods when dealing with dependent measurements.

PREREQUISITES
  • Understanding of statistical concepts, specifically the Student's T-Distribution.
  • Familiarity with Excel for basic data analysis.
  • Knowledge of standard deviation and its calculation.
  • Basic principles of hypothesis testing.
NEXT STEPS
  • Learn about alternative statistical tests for dependent samples, such as the paired T-test.
  • Explore statistical software options like R or Python for more robust data analysis.
  • Study the concept of p-values and their interpretation in hypothesis testing.
  • Investigate the assumptions of statistical tests to ensure proper application in data analysis.
USEFUL FOR

Data analysts, engineers involved in testing and measurement, and anyone seeking to understand statistical significance in experimental data sets.

Raddy13
Messages
30
Reaction score
0
I have a set of data that was recorded from an engine that we are testing. We've noticed lately that a particular pressure value will sometimes spike with no apparent explanation, as seen in the attached graph. The pressure in question is passively regulated by a pump, but it is also dependent on operating factors in the engine. I'm trying to narrow down whether this spike is indicative of a problem with the pump or if there's something else wrong with the engine to cause this spike. Plotting the pressure against other measurements hasn't really helped, but I wanted to see if there's a way I can statistically measure the significance of changes that occur, or whether it's just typical variation.

I recall from Stats that my professor used the Student's T-Distribution to measure whether a change between two data sets (in my case, before and after the pressure spike) was statistically significant, but when I tried it in Excel (using this tutorial, it gave me exactly a 100% probability that this pressure spike occurred by chance (we know from testing that's not the case). I've since read that the T-distribution is typically used when the standard deviation for a sample isn't directly measurable, but I do know that from the data, so is there another way to go about this?

EDIT: Just to clarify, to obtain my T-dist value, I set mu = average pressure before the spike, xbar = average pressure after the spike, s = standard deviation of pressure after the spike, and n is the number of data points after the spike (201 in my case). I plugged that into the T-value formula and got 213.4, and the T-dist probability for that is 1.
 

Attachments

  • pressure.png
    pressure.png
    5.2 KB · Views: 523
Last edited:
Physics news on Phys.org
You interpreted your T-test backwards. That 100% probability number says that if the data sampled after the spike was drawn from the same distribution as the data before the spike, then there is a 100% probability you would have seen a smaller T-value than 213.4. So the probability you would see 213.4 or greater is essentially 0, which means that this is NOT just random noise.
 
"I plugged that into the T-value formula and got 213.4, and the T-dist probability for that is 1."

Excel is a horrible tool for statistical analysis of any kind, for many reasons. Here you've run into an old "feature" of the program that is, in essence, a foolish way it calculates p-values. If you sketch the t-distribution and locate your t-value you see it is far to the right of the distribution. Depending on your alternative hypothesis the p-value is either the area to the right of this t, or twice the area to the right of this t. The number Excel gives is the area to its left, which means the p-value is 0.

But I'm not sure this is the analysis you should do. The measurements are clearly not independent, since they come from the same mechanism, and t-tests require independent values.
What is graphed on the horizontal axis of the plot you attached?
 
Office_Shredder said:
You interpreted your T-test backwards. That 100% probability number says that if the data sampled after the spike was drawn from the same distribution as the data before the spike, then there is a 100% probability you would have seen a smaller T-value than 213.4. So the probability you would see 213.4 or greater is essentially 0, which means that this is NOT just random noise.

My mistake, I will keep that in mind!

What is graphed on the horizontal axis of the plot you attached?

Time, measurements are recorded once per second.
 
If there are an infinite number of natural numbers, and an infinite number of fractions in between any two natural numbers, and an infinite number of fractions in between any two of those fractions, and an infinite number of fractions in between any two of those fractions, and an infinite number of fractions in between any two of those fractions, and... then that must mean that there are not only infinite infinities, but an infinite number of those infinities. and an infinite number of those...

Similar threads

  • · Replies 9 ·
Replies
9
Views
3K
  • · Replies 14 ·
Replies
14
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 5 ·
Replies
5
Views
6K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 6 ·
Replies
6
Views
4K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 30 ·
2
Replies
30
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K