How to deal with averaging before calculating statistics

Monique · Apr 12, 2012

I have the following data set: condition A and condition B, with 4 replicates recorded over 3 time periods.

Hypothetically you can think of it as measuring the height of the sun in the sky in winter (A) compared to the summer (B), in 4 nearby villages (independent observations) over 3 days (the assumption is that the height is stable over the consecutive days).

Since I want to know the true behavior of each replicate, I average the data of the three time periods (to take out any irrelevant fluctuations). This is the key, but it is also a problem (I think).

Now I calculate the average and the SD of condition A and condition B (based on the 3-day average of each replicate) and want to do a statistics test on the height at t=12:00.

My question: what is the best way to deal with the data? If I perform a t-test on the average then a condition would have 4 independent replicas, but in fact underneath there are 3 dependent replicas.

Am I over-inflating the significance by doing that and how can it be corrected? Any thoughts are welcome

Example of three days worth of data, averaged into one
A1 ~~~ = ~
A2 ~~~ = ~
A3 ~~~ = ~
A4 ~~~ = ~

B1 ~~~ = ~
B2 ~~~ = ~
B3 ~~~ = ~
B4 ~~~ = ~

DrDu · Apr 12, 2012

I think it's ok but instead of averaging, couldn't you use Anova?

camillio · Apr 12, 2012

What is the zero hypothesis of the test? Is it that the means of variables A and B are equal?

Edit:
If yes, I think that averaging is OK. Also, I think you can use the Hotelling's T2 test of equality of random vectors.

chiro · Apr 12, 2012

Hey Monique.

I would recommend if you are trying to compare means for multiple groups (which is what it sounds like) then use an ANOVA: this is what this technique was designed for.

Also, are there any assumptions for your data that you either know or don't know?

camillio · Apr 13, 2012

I would summarize it a bit...

1) If one needs to test the hypothesis that several random variables (2 and more) with (nearly) normal distribution have the same mean value, one uses anova (alternatively Welch's t test or other tests, depending on assumptions).
2) If one needs to test the hypothesis that two random vectors with a multivariate normal distribution have the same mean value, Hotelling's T2 suits well.

But it depends on what the test is about, which is not clear from the OP.

Monique · May 29, 2012

I didn't thank yet for the replies, but I did take the comments along in my evaluation so: thanks! By formulating the question I already came up with the answer without immediately realizing it.

I didn't use ANOVA, since the data needs to be plotted in a graph and there are only 2 conditions. For each time point I averaged the replicate measurement (~~~), calculated the area under the curve for time frames of each independent measurement (1-4) and and used a t-test to compare the two experimental populations (A, B).

I hadn't heard of Hotelling's T2 before, so I'll educate myself some more on that.

How to deal with averaging before calculating statistics

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad The problem of points

Graduate Expected numbers of cards of a last color remaining

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect