Normalizing standard deviation between two data sets

In summary, normalizing standard deviation between two data sets involves converting both data sets to a "standard" normal distribution by dividing by the standard deviation and then using a t-test to compare the means of the two distributions. It is important to also graph the data to check for outliers and skewness, and if present, to also use a non-parametric test. It is not necessary to adjust significance levels when using both a t-test and non-parametric test for comparison.
  • #1
TheAnalogKid83
174
0
"Normalizing" standard deviation between two data sets

I have a baseline set of data collected which has a standard deviation of X, and then I collected another set of the data under a different condition (different temperature), and this has a different standard deviation Y. How do I cancel out the standard deviations, to only see the difference of the actual data as it varies with the condition change (temperature)? I try to stay vague, but if this doesn't make sense of what I'm looking for, I'll give my application example. Just a general topic to point me to would help if nothing else.
 
Physics news on Phys.org
  • #2


Seems to me that you could convert both to a "standard" normal distribution by taking [itex]x'= (x- \mu_x)/\sigma_x[/itex] and [itex]y'= (y- \mu_y)/\sigma_y[/itex]. If you don't want to worry about the means, just dividing by the standard deviation of each should give you a distribution with standard deviation 1.
 
  • #3


The change in temperature may have affected both the mean and standard deviation of the "true" probability distribution. If by eye the two sample standard deviations look the same, just use a normal t-test to see if the means are different. If the standard deviations look very different, or if you suspect on theoretical grounds that the standard deviations are different, then use a t-test variant in which the standard deviations are not assumed equal.

See for example:
4.3.3 Unequal sample sizes, unequal variance
http://en.wikipedia.org/wiki/Student's_t-test
 
  • #4


One more comment on this. Graph your data first - whether a simple dotplot, stemplots, boxplot, or histogram if the samples are large. Look for evidence of outliers and/or skewness. Both of these can cause problems with the classical procedures, as they are not robust in the face of departures from normality. If you see skewness (or even several outliers with overall symmetry) you should also do a non-parametric test (Wilcoxon or equivalent) as well. (I would suggest always doing this, but my training is in non-parametrics.) Intuitively, if the two results are in agreement, the t-test results may be good enough. If the two results are in great disagreement, you should suspect the t-test results.
(DO NOT be tempted to throw away outliers in order to obtain a specific result: unless the outliers are due to recording error, that is not valid)
 
  • #5


statdad said:
One more comment on this. Graph your data first - whether a simple dotplot, stemplots, boxplot, or histogram if the samples are large. Look for evidence of outliers and/or skewness. Both of these can cause problems with the classical procedures, as they are not robust in the face of departures from normality. If you see skewness (or even several outliers with overall symmetry) you should also do a non-parametric test (Wilcoxon or equivalent) as well. (I would suggest always doing this, but my training is in non-parametrics.) Intuitively, if the two results are in agreement, the t-test results may be good enough. If the two results are in great disagreement, you should suspect the t-test results.
(DO NOT be tempted to throw away outliers in order to obtain a specific result: unless the outliers are due to recording error, that is not valid)

Yes, I agree with that. One thing I've never understood properly is if you do two independent tests, say t-test and Wilcoxon, then should you change the p value for what you accept as "significant" (ie. analogous to Bonferroni and its ilk)? In which case, maybe do only the Wilcoxon if non-Gaussianity is suspected? I usually just distrust statistics and collect more data, unless I need the paper published immediately. :rolleyes:
 
  • #6


No data is truly normal (Gaussian), although the 'middle' can quite closely resemble normally distributed data. If your initial graphs indicate severe non-normality, it's usually best to avoid normal-based inferences altogether.
If you do both a t-test and a non-parametric test for comparison, as a simple check, there isn't any real need to adjust significance levels at all.
 

1. What is the purpose of normalizing standard deviation between two data sets?

The purpose of normalizing standard deviation between two data sets is to compare the variability or spread of the two data sets on a standardized scale. This allows for a more accurate and fair comparison between the two sets of data.

2. How is standard deviation normalized between two data sets?

Standard deviation is normalized between two data sets by calculating the z-score for each data point in the sets. The z-score is a measure of how many standard deviations a data point is away from the mean of the data set. By converting the data points to z-scores, the data sets can be compared on the same standardized scale.

3. Is normalizing standard deviation the same as standardizing data?

No, normalizing standard deviation is not the same as standardizing data. Normalizing standard deviation specifically looks at the variability or spread of the data sets, while standardizing data converts the data points to z-scores to compare on the same scale.

4. What are the benefits of normalizing standard deviation between two data sets?

The benefits of normalizing standard deviation between two data sets include being able to accurately compare the variability or spread of the data sets, identifying any outliers or extreme values, and reducing the impact of different units or scales of measurement on the comparison.

5. When should standard deviation be normalized between two data sets?

Standard deviation should be normalized between two data sets when there is a need to compare the variability or spread of the data sets, such as in scientific research or data analysis. It is also useful when the data sets have different units or scales of measurement, making direct comparison difficult.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
736
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
24
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
16
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
688
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
764
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
676
  • Set Theory, Logic, Probability, Statistics
Replies
15
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
Back
Top