How to deal with averaging before calculating statistics

  • Thread starter Monique
  • Start date
  • Tags
    Statistics
In summary, the conversation discusses the best method for analyzing data from a study with two conditions and multiple replicates over a period of time. The speaker suggests using ANOVA to compare means for multiple groups or Hotelling's T2 for comparing means of multivariate normal distributions. Ultimately, the speaker decides to use a t-test to compare the two experimental populations based on averaged data and area under the curve calculations.
  • #1
Monique
Staff Emeritus
Science Advisor
Gold Member
4,219
67
I have the following data set: condition A and condition B, with 4 replicates recorded over 3 time periods.

Hypothetically you can think of it as measuring the height of the sun in the sky in winter (A) compared to the summer (B), in 4 nearby villages (independent observations) over 3 days (the assumption is that the height is stable over the consecutive days).

Since I want to know the true behavior of each replicate, I average the data of the three time periods (to take out any irrelevant fluctuations). This is the key, but it is also a problem (I think).

Now I calculate the average and the SD of condition A and condition B (based on the 3-day average of each replicate) and want to do a statistics test on the height at t=12:00.

My question: what is the best way to deal with the data? If I perform a t-test on the average then a condition would have 4 independent replicas, but in fact underneath there are 3 dependent replicas.

Am I over-inflating the significance by doing that and how can it be corrected? Any thoughts are welcome :smile:

Example of three days worth of data, averaged into one
A1 ~~~ = ~
A2 ~~~ = ~
A3 ~~~ = ~
A4 ~~~ = ~

B1 ~~~ = ~
B2 ~~~ = ~
B3 ~~~ = ~
B4 ~~~ = ~
 
Physics news on Phys.org
  • #2
I think it's ok but instead of averaging, couldn't you use Anova?
 
  • #3
What is the zero hypothesis of the test? Is it that the means of variables A and B are equal?

Edit:
If yes, I think that averaging is OK. Also, I think you can use the Hotelling's T2 test of equality of random vectors.
 
Last edited:
  • #4
Hey Monique.

I would recommend if you are trying to compare means for multiple groups (which is what it sounds like) then use an ANOVA: this is what this technique was designed for.

Also, are there any assumptions for your data that you either know or don't know?
 
  • #5
I would summarize it a bit...

1) If one needs to test the hypothesis that several random variables (2 and more) with (nearly) normal distribution have the same mean value, one uses anova (alternatively Welch's t test or other tests, depending on assumptions).
2) If one needs to test the hypothesis that two random vectors with a multivariate normal distribution have the same mean value, Hotelling's T2 suits well.

But it depends on what the test is about, which is not clear from the OP.
 
  • #6
I didn't thank yet for the replies, but I did take the comments along in my evaluation so: thanks! By formulating the question I already came up with the answer without immediately realizing it.

I didn't use ANOVA, since the data needs to be plotted in a graph and there are only 2 conditions. For each time point I averaged the replicate measurement (~~~), calculated the area under the curve for time frames of each independent measurement (1-4) and and used a t-test to compare the two experimental populations (A, B).

I hadn't heard of Hotelling's T2 before, so I'll educate myself some more on that.
 

1. What is the purpose of averaging before calculating statistics?

The purpose of averaging before calculating statistics is to obtain a single representative value that summarizes the data set. This can help to simplify the data and make it easier to interpret and compare with other data sets.

2. When should I use averaging before calculating statistics?

Averaging before calculating statistics is typically used when dealing with numerical data, such as measurements or survey responses. It can also be useful when working with large data sets or when comparing multiple data sets.

3. How do I average before calculating statistics?

To average before calculating statistics, simply add all of the values in the data set together and divide by the total number of values. This will give you the mean, or average, of the data set. Alternatively, you can use statistical software or a calculator to calculate the average.

4. What are the potential drawbacks of averaging before calculating statistics?

One potential drawback of averaging before calculating statistics is that it can obscure important information or outliers in the data set. It can also oversimplify the data and not accurately represent the full range of values. Additionally, if the data set contains extreme values, such as very high or very low numbers, the average may not be a good representation of the data.

5. Are there any alternatives to averaging before calculating statistics?

Yes, there are alternative methods for summarizing data, such as using the median or mode instead of the mean. These methods may be more appropriate if the data set contains extreme values or is not normally distributed. It is important to consider the nature of the data and the purpose of the analysis when deciding which method to use.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
916
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
  • Engineering and Comp Sci Homework Help
Replies
7
Views
885
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
999
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
1K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
4
Views
824
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
  • Quantum Interpretations and Foundations
4
Replies
133
Views
7K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
2K
  • Engineering and Comp Sci Homework Help
Replies
1
Views
915
Back
Top