Comparing two datasets: methods and statistics

FrankDrebon · Jul 27, 2010

Hi all,

Just looking for some opinions on how to approach reporting the variability in some data I have acquired. I know Rutherford is quoted as saying “if your experiment needs statistics, you ought to have done a better experiment”, but unfortunately in biophysics we’re always at the mercy of the variability of living “things”!

Basically, I have a set of measurements ‘A’ and a set of measurements ‘B’. Both measure the same property of a system, but the state of the system is slightly different in the two sets. What I want to calculate is the change in this property between the two states of the system.

I could do this by taking the mean of set A and the mean of set B and dividing one by the other. However, this gives me the “change in the averages”. What I (think I) want is the “average of the changes”, so I divide each A value by each B value and take the average of those comparisons.

As an example, suppose I measured this property over and over with the system in state A and got the results 7, 8, 8, 7, 5. Then I measured it in state B and got 10, 7, 9, 9, 8. The average value in state A is 7.0, the average value in state B is 8.6. State B obviously has a larger "property" than state A.

To calculate the average change between the sets, I’d divide each result in ‘B’ by each result in ‘A’ (25 comparisons) and take the average, in this case 1.265. Simply dividing 8.6 by 7 gives 1.229.

I then wish to provide a standard error to calculate confidence intervals. However, this requires the use of the sample size N. There are 10 measurements in two lots of 5 samples, and 25 comparisons. I can't decide which to use as the sample size! Thoughts?

I appreciate that there is probably no definite answer here, but your opinions would be appreciated. Also, if you think my “averages of the changes” method is stupid then please say so. Perhaps I could just calculate the “change in the averages” and calculate an error based on the standard deviations of the individual data sets? The two sides of my brain have been arguing which is the best way to analyse this data for weeks, and they can’t come to a conclusion...!

EnumaElish · Aug 1, 2010

Have you considered using: http://en.wikipedia.org/wiki/Student's_t-test#Unequal_sample_sizes.2C_unequal_variance ?

adriank · Aug 1, 2010

There is no justification in comparing individual values in A to corresponding values in B if there is no particular connection between them; for instance, if the order of the values in A and B doesn't matter.

That is to say: Is there some connection between, say, the first value in A and the first value in B, that does not exist between the first value in A and the second value in B? If not, you need to treat all pairs (value in A, value in B) equally.

EnumaElish · Aug 1, 2010

Good point, the average ratio is similar to a matched-pairs test, which may or may not be justified. On a related note, is there any reason to prefer the ratio A/B to the difference A-B (either pairwise, or between the two averages)? For example, if you believe that each of A and B is Lognormal, then you could test Log(A/B) = Log A - Log B, which would be Normal, for being equal to zero. Is that the case with your data?

blue_raver22 · Aug 8, 2010

I understand the challenges of dealing with variability in data, especially when working with living systems. It is important to approach the analysis of your data with a clear and objective mindset. In this case, it seems like you are trying to compare two datasets (A and B) and determine the change in a specific property between the two states of the system.

Your approach of calculating the average of the changes between the two sets (dividing each value in B by each value in A and taking the average) seems reasonable. However, it is important to also consider the standard error and confidence intervals in your analysis. In this case, it may be more appropriate to use the sample size of 25 (the number of comparisons) rather than the total number of measurements (10) as your sample size.

Additionally, it may be helpful to also calculate the standard deviation of each dataset and use that in your analysis. This can provide a measure of the variability within each dataset and can help to determine if the observed change in the property is statistically significant.

Ultimately, the best approach will depend on the specific goals of your study and the nature of your data. It may be helpful to consult with a statistician or seek out additional literature on similar studies to see how they have approached similar analyses. Overall, it is important to carefully consider the methods and statistics used in your analysis to ensure that your results are accurate and meaningful.

Comparing two datasets: methods and statistics

1. What are the different methods used to compare two datasets?

2. How do I choose the most appropriate method for comparing my datasets?

3. What is the difference between parametric and non-parametric methods?

4. How do I interpret the results of a statistical test for comparing two datasets?

5. Can visual techniques alone be used to compare two datasets?

Similar threads

Hot Threads

Recent Insights