Data repeatability (statistics question)

  • Thread starter Thread starter Melawrghk
  • Start date Start date
  • Tags Tags
    Data
Click For Summary
SUMMARY

This discussion focuses on determining whether two data sets represent the same values using statistical analysis. The user calculated a z-score of 63 using MATLAB's mean() and std() functions, leading to the conclusion that the two sets are statistically different. Despite the close means and relatively large standard deviations, the user is confused about the interpretation of the results. The discussion emphasizes the importance of comparing the difference between means against the statistical uncertainties derived from sample sizes rather than the standard deviations of the distributions.

PREREQUISITES
  • Understanding of z-scores and hypothesis testing
  • Familiarity with MATLAB functions mean() and std()
  • Knowledge of standard deviation and its implications in statistics
  • Concept of statistical uncertainty and its calculation
NEXT STEPS
  • Study the concept of statistical uncertainty and how it affects mean comparisons
  • Learn about hypothesis testing and the interpretation of z-scores
  • Explore MATLAB's statistical toolbox for advanced data analysis
  • Research the implications of sample size on statistical results
USEFUL FOR

Students in statistics, data analysts, and researchers in fields requiring data comparison and hypothesis testing, particularly those using MATLAB for statistical analysis.

Melawrghk
Messages
140
Reaction score
0

Homework Statement


I am trying to see if two sets of data represent the same values or not. I have:
Mean1 = 9.3155, stdev1 = 0.1334; mean2 = 9.3040, stdev2 = 0.1248;
N1 = N2 = 1000;
I got these values from my data using MATLAB (std() and mean());

Homework Equations



z = \frac{(mean1-mean2)}{\sqrt{stdev1^{2}/N1^{2}+stdev2^{2}/N2^{2}}}

The Attempt at a Solution



Null hypothesis: Sets are different.
Alternative: Sets are the same.

Using the formula above I get z score of 63, which accepts my Null hypothesis that the two series are different.

However, I don't really seem to understand why they would be considered different given the fairly large standard deviation and close means. The way I think is kind of like - the second mean fits within mean1+/-stdev1, so shouldn't the z score be smaller?

Statistics isn't my strong suit, and this is for an electronics thing, but I'm curious what I'm thinking wrong exactly.
 
Physics news on Phys.org
The standard deviation of a distribution s describes how individual entries of the distribution scatter. But it does not describe the uncertainty u with which the mean is determined. The uncertainty is determined from N numbers, not a single one. Therefore, its statistical uncertainty u is smaller than s, namely u² = s²/N (*). What you essentially want to do is comparing the difference between the means with the statistical inaccurancies you expect for them. Not comparing the difference to the width of the distributions. This would tell you to what extent you could take a single number and tell which of the two probability distributions it probably belongs to (assuming the distributions are different, of course).

Btw.: Excellent question. It's nice to see when students question results that seem wrong to them.


(*): Note by comparison that your equation for the not-further-specified "z" is a bit fishy. Also note that it is good habit to define/explain terms used. Just because everyone in your class knows what your teacher means by "z" does not imply that everyone around the world does.
 

Similar threads

  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 24 ·
Replies
24
Views
7K
  • · Replies 7 ·
Replies
7
Views
5K
  • · Replies 4 ·
Replies
4
Views
13K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 7 ·
Replies
7
Views
7K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 14 ·
Replies
14
Views
4K