Statistical significance in experimentally obtained data sets

In summary, the data in question showed a spike in pressure that was not explained by either the pump or the engine operating conditions, and this spike could not be explained by chance. The pressure spike was observed in only a few engines, and it is unclear what caused it.
  • #1
Raddy13
30
0
I have a set of data that was recorded from an engine that we are testing. We've noticed lately that a particular pressure value will sometimes spike with no apparent explanation, as seen in the attached graph. The pressure in question is passively regulated by a pump, but it is also dependent on operating factors in the engine. I'm trying to narrow down whether this spike is indicative of a problem with the pump or if there's something else wrong with the engine to cause this spike. Plotting the pressure against other measurements hasn't really helped, but I wanted to see if there's a way I can statistically measure the significance of changes that occur, or whether it's just typical variation.

I recall from Stats that my professor used the Student's T-Distribution to measure whether a change between two data sets (in my case, before and after the pressure spike) was statistically significant, but when I tried it in Excel (using this tutorial, it gave me exactly a 100% probability that this pressure spike occurred by chance (we know from testing that's not the case). I've since read that the T-distribution is typically used when the standard deviation for a sample isn't directly measurable, but I do know that from the data, so is there another way to go about this?

EDIT: Just to clarify, to obtain my T-dist value, I set mu = average pressure before the spike, xbar = average pressure after the spike, s = standard deviation of pressure after the spike, and n is the number of data points after the spike (201 in my case). I plugged that into the T-value formula and got 213.4, and the T-dist probability for that is 1.
 

Attachments

  • pressure.png
    pressure.png
    5.2 KB · Views: 454
Last edited:
Physics news on Phys.org
  • #2
You interpreted your T-test backwards. That 100% probability number says that if the data sampled after the spike was drawn from the same distribution as the data before the spike, then there is a 100% probability you would have seen a smaller T-value than 213.4. So the probability you would see 213.4 or greater is essentially 0, which means that this is NOT just random noise.
 
  • #3
"I plugged that into the T-value formula and got 213.4, and the T-dist probability for that is 1."

Excel is a horrible tool for statistical analysis of any kind, for many reasons. Here you've run into an old "feature" of the program that is, in essence, a foolish way it calculates p-values. If you sketch the t-distribution and locate your t-value you see it is far to the right of the distribution. Depending on your alternative hypothesis the p-value is either the area to the right of this t, or twice the area to the right of this t. The number Excel gives is the area to its left, which means the p-value is 0.

But I'm not sure this is the analysis you should do. The measurements are clearly not independent, since they come from the same mechanism, and t-tests require independent values.
What is graphed on the horizontal axis of the plot you attached?
 
  • #4
Office_Shredder said:
You interpreted your T-test backwards. That 100% probability number says that if the data sampled after the spike was drawn from the same distribution as the data before the spike, then there is a 100% probability you would have seen a smaller T-value than 213.4. So the probability you would see 213.4 or greater is essentially 0, which means that this is NOT just random noise.

My mistake, I will keep that in mind!

What is graphed on the horizontal axis of the plot you attached?

Time, measurements are recorded once per second.
 
  • #5
0.

I would first suggest looking at the data more closely to determine if there are any outliers or unusual patterns that may be causing the spike in pressure. It's also important to consider any external factors that may be influencing the data, such as changes in temperature or humidity.

If there are no clear explanations for the spike, then it may be worth conducting further experiments to gather more data and see if the spike is reproducible. This can help to determine if the spike is a consistent issue or if it was just a one-time occurrence.

In terms of statistical significance, the T-distribution may not be the best approach in this scenario. As you mentioned, it is typically used when the standard deviation is not directly measurable. In this case, it may be more appropriate to use a different statistical test, such as the ANOVA (Analysis of Variance) test, which can compare the means of multiple groups.

Additionally, it's important to consider the sample size and variability of the data when determining statistical significance. A larger sample size and lower variability can increase the power of the statistical test and make it more likely to detect significant differences.

In conclusion, further investigation and potentially using a different statistical test may provide more insight into the significance of the pressure spike in your data. It's important to carefully consider all factors and gather enough data to make informed conclusions.
 

What is statistical significance?

Statistical significance is a measure of how likely it is that the results observed in an experiment are not due to chance. In other words, it tells us whether the differences or patterns seen in the data are real or just a random occurrence.

How is statistical significance determined?

Statistical significance is typically determined by conducting a hypothesis test, which involves comparing the observed data to what would be expected if there were no true difference or relationship between the variables being studied. The results of the test can then be used to calculate a p-value, which indicates the probability of obtaining the observed results if the null hypothesis (no difference or relationship) were true. A p-value of less than 0.05 is commonly used to indicate statistical significance.

Why is statistical significance important?

Statistical significance is important because it helps us determine whether the results of an experiment are meaningful. If a study does not show statistical significance, it means that the results are likely due to chance and not a true effect. On the other hand, if a study does show statistical significance, it suggests that there is a real difference or relationship between the variables being studied.

What factors can affect statistical significance?

The most important factor that can affect statistical significance is the sample size. Generally, larger sample sizes are more likely to yield statistically significant results. Other factors that can impact statistical significance include the strength of the effect being studied, the variability of the data, and the chosen significance level (usually 0.05). Additionally, the type of statistical test used and the assumptions made in the analysis can also influence the results.

What are the limitations of statistical significance?

It is important to note that statistical significance does not necessarily indicate the practical significance or importance of the results. In other words, just because a study shows statistical significance does not mean that the effect being studied is large or meaningful in real-world terms. It is also possible for a study to show no statistical significance even when a true effect exists, especially if the sample size is small. Additionally, statistical significance does not prove causation, it only indicates a relationship between variables.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
14
Views
280
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
710
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • High Energy, Nuclear, Particle Physics
Replies
3
Views
138
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
1K
Back
Top