Comparing two datasets: methods and statistics

  • Context: Graduate 
  • Thread starter Thread starter FrankDrebon
  • Start date Start date
  • Tags Tags
    Statistics
Click For Summary

Discussion Overview

The discussion revolves around methods for reporting variability in two datasets measuring the same property of a system under slightly different conditions. Participants explore statistical approaches to analyze the change between these datasets, focusing on the calculation of averages and standard errors.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant proposes calculating the average change between two sets of measurements by dividing each value in set B by the corresponding value in set A and averaging those ratios.
  • Another participant suggests using Student's t-test for unequal sample sizes and variances as a potential method for analysis.
  • A different participant questions the validity of comparing individual values in sets A and B without a specific connection between corresponding values, emphasizing the need for a justified pairing.
  • One participant notes that the average ratio resembles a matched-pairs test and raises the question of whether to prefer the ratio A/B over the difference A-B, suggesting a logarithmic approach if the data follows a Lognormal distribution.

Areas of Agreement / Disagreement

Participants express differing views on the appropriate methods for analyzing the datasets, with no consensus reached on the best approach. The discussion remains unresolved regarding the justification for comparing individual values and the preferred statistical method.

Contextual Notes

Participants highlight potential limitations in their approaches, including the need for justification in pairing values and the implications of using different statistical methods. The discussion reflects uncertainty regarding the appropriate sample size for calculating standard errors.

FrankDrebon
Messages
9
Reaction score
0
Hi all,

Just looking for some opinions on how to approach reporting the variability in some data I have acquired. I know Rutherford is quoted as saying “if your experiment needs statistics, you ought to have done a better experiment”, but unfortunately in biophysics we’re always at the mercy of the variability of living “things”!

Basically, I have a set of measurements ‘A’ and a set of measurements ‘B’. Both measure the same property of a system, but the state of the system is slightly different in the two sets. What I want to calculate is the change in this property between the two states of the system.

I could do this by taking the mean of set A and the mean of set B and dividing one by the other. However, this gives me the “change in the averages”. What I (think I) want is the “average of the changes”, so I divide each A value by each B value and take the average of those comparisons.

As an example, suppose I measured this property over and over with the system in state A and got the results 7, 8, 8, 7, 5. Then I measured it in state B and got 10, 7, 9, 9, 8. The average value in state A is 7.0, the average value in state B is 8.6. State B obviously has a larger "property" than state A.

To calculate the average change between the sets, I’d divide each result in ‘B’ by each result in ‘A’ (25 comparisons) and take the average, in this case 1.265. Simply dividing 8.6 by 7 gives 1.229.

I then wish to provide a standard error to calculate confidence intervals. However, this requires the use of the sample size N. There are 10 measurements in two lots of 5 samples, and 25 comparisons. I can't decide which to use as the sample size! Thoughts?

I appreciate that there is probably no definite answer here, but your opinions would be appreciated. Also, if you think my “averages of the changes” method is stupid then please say so. Perhaps I could just calculate the “change in the averages” and calculate an error based on the standard deviations of the individual data sets? The two sides of my brain have been arguing which is the best way to analyse this data for weeks, and they can’t come to a conclusion...!
 
Physics news on Phys.org
There is no justification in comparing individual values in A to corresponding values in B if there is no particular connection between them; for instance, if the order of the values in A and B doesn't matter.

That is to say: Is there some connection between, say, the first value in A and the first value in B, that does not exist between the first value in A and the second value in B? If not, you need to treat all pairs (value in A, value in B) equally.
 
Good point, the average ratio is similar to a matched-pairs test, which may or may not be justified. On a related note, is there any reason to prefer the ratio A/B to the difference A-B (either pairwise, or between the two averages)? For example, if you believe that each of A and B is Lognormal, then you could test Log(A/B) = Log A - Log B, which would be Normal, for being equal to zero. Is that the case with your data?
 
Last edited:

Similar threads

  • · Replies 5 ·
Replies
5
Views
4K
  • · Replies 24 ·
Replies
24
Views
7K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 21 ·
Replies
21
Views
3K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 5 ·
Replies
5
Views
6K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 4 ·
Replies
4
Views
3K