2 samples T test in case of non-normal distribution

  • Context: Graduate 
  • Thread starter Thread starter uzi kiko
  • Start date Start date
  • Tags Tags
    Distribution Test
Click For Summary

Discussion Overview

The discussion revolves around the analysis of experimental data concerning the change in impedance of a coil in relation to varying environmental parameters. Participants explore statistical methods for comparing sample groups, particularly in the context of non-normal distribution of data.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant describes an experiment measuring impedance changes with 20 samples for each of several environmental parameter changes, seeking to determine significant differences using a T test despite non-normal distribution.
  • Another participant questions the appropriateness of using a T test, suggesting that a mathematical model should guide the analysis of the data.
  • A participant asks whether the environmental parameter is numerical or categorical, clarifying that it is numerical, specifically the volume ratio between blood and brain tissue.
  • Some participants propose using regression analysis instead of a T test to understand the relationship between impedance and the blood/brain ratio.
  • One participant expresses concern that the T test approach is heavily dependent on sample size, potentially leading to misleading conclusions about device sensitivity.
  • Another participant suggests acquiring a large number of samples to establish a confidence interval for the baseline condition and then using regression to determine a threshold for detecting blood presence.
  • A later reply proposes a test of difference of proportions as a non-parametric alternative, which may be suitable given the data characteristics.

Areas of Agreement / Disagreement

Participants express differing views on the appropriate statistical methods to apply, with some advocating for regression analysis while others consider the T test or alternative non-parametric tests. No consensus is reached on a single method for analysis.

Contextual Notes

Participants note the limitations of their approaches, including the dependence on sample size and the need for a clear mathematical model to guide analysis. The discussion reflects uncertainty regarding the best statistical method to apply given the non-normal distribution of the data.

Who May Find This Useful

Researchers and practitioners involved in experimental design and data analysis, particularly in fields related to physics, engineering, and biomedical applications.

uzi kiko
Messages
22
Reaction score
3
Hi All

I made an experiment where I measured the change of the impedance of a coil when I changed a environment parameter X times.
For each change I collected ~20 samples.
So I have a table with X lines that represent the number the change of the parameter and 20 columns that represent the repeat samples.
Now I would like to compare each line to the others and to find out if there is a significance difference between them.
Naturally I was thinking about T test. Unfortunately the measurement distribution is not normal, although the distribution is symmetric around the mean. ( I run Two-sample Kolmogorov-Smirnov test and found out that the test reject the hypothesis that the my sample arrived from normal distribution). You can find figure of the distribution here:
https://drive.google.com/open?id=1yp_Ufa4-N8kQD1twVCnHszL9RYLV-_3u

I know that if I had a higher number of samples I could average the sample groups till I will reach normal distribution and then use a T test, but I would like to avoid the idea to do the experiment again.

Now (Sorry about the long introduction... ) my questions are:
1) Do you know about a transformation that I can apply on my distribution so I will be able to use T test?
2) In case that there is no kind of transformation, which non parametric test would you suggest me to use?

Thanks a lot
Mosh

open
 
  • Like
Likes   Reactions: WWGD
Physics news on Phys.org
uzi kiko said:
I made an experiment where I measured the change of the impedance of a coil when I changed a environment parameter X times.
For each change I collected ~20 samples.
So I have a table with X lines that represent the number the change of the parameter and 20 columns that represent the repeat samples.
Now I would like to compare each line to the others and to find out if there is a significance difference between them.
open
This seems an odd way to analyze that sort of data. Presumably your intent in performing the experiment is to test some theory, and your theory includes a mathematical model that you can use to predict the relationship between your environment parameter and your measured impedance. If that is the case, then what parameters of the model do you want to estimate or verify using your data? If not, then why are you doing the experiment?
 
uzi kiko said:
I changed a environment parameter X times.
For each change I collected ~20 samples.
So I have a table with X lines that represent the number the change of the parameter and 20 columns that represent the repeat samples.
Is the environment parameter a numerical value or a categorical value?
 
Thanks a lot for your quick responses.

Regarding tnich question:
I am now at the next stage of my research.
(You can find my paper here:
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0186381)

The first stage we indeed made a mathematical model. Now I am at the stage of verifying/identifying of my experiment setup and specifically I would like to understand what is the resolution of my experiment setup.

Regarding Dale question:
My environment parameter is numerical value. (The volume ratio between blood and brain tissue)
 
uzi kiko said:
My environment parameter is numerical value. (The volume ratio between blood and brain tissue)
Then you want to use a regression approach, not a t test.
 
Thank you Dale,
I agree that if I want to understand the relation between the impedance and blood/brain ratio I should use regression test.
But the most important parameter that I have to identify is the minimum blood/brain ratio that my device can measure.

I would like to explain the way I was thinking to use the T test:
Let's say I have 6 measurements, each measurement contains 20 samples.
The first measurement is my baseline where there is only brain tissue without any blood.
Now I am taking the next measurement (Let's say with 2ml of blood). If there is a significant different between the 2 groups I can say the my device is sensitive enough for change of 2 ml of blood.
But if there is no significant different between the 2 groups - I will take the sample of 4 ml blood and compare, and so on.

Do you think that I can do this with regression test?
 
The problem is that this approach is strongly dependent on the number of samples. Suppose you use 20 samples and find a significant difference at 4 ml but not at 2 ml. Then, without changing your device you do the same experiment but with 200 samples. In that case you would be likely to find a significant difference at 2 ml also. Has your device become better? No. So this process doesn’t characterize the device.
 
Here are my thoughts. First, you want to do the regression so that you get an idea about the relationship between mL of blood and impedance.

Then, you want to characterize the 0 mL blood condition very well. You should acquire as many samples as feasible, and calculate a 95% confidence interval.

Using your regression you can convert that upper 95% limit to a mL blood measurement. That would likely be your best lower threshold. So use that volume (probably round up) as your candidate threshold and acquire a bunch of data at that volume also, and maybe one more slightly larger volume too.

Once you have that, you can do a ROC analysis to determine your best threshold for discriminating between blood and no blood as well as your sensitivity and specificity at that threshold.
 
Last edited:
  • Like
Likes   Reactions: WWGD
Thank you very much!
 
  • #10
How about a test of difference of proportions ( with the baseline being 0) needs simple sampling , which I think you have, independence the same and it is non-parametric. Wouldn't that work
 

Similar threads

  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 30 ·
2
Replies
30
Views
4K
  • · Replies 2 ·
Replies
2
Views
1K
Replies
5
Views
6K
Replies
1
Views
4K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 3 ·
Replies
3
Views
1K