A 2 samples T test in case of non-normal distribution

  • A
  • Thread starter Thread starter uzi kiko
  • Start date Start date
  • Tags Tags
    Distribution Test
AI Thread Summary
The discussion centers on analyzing impedance changes in a coil due to variations in a numerical environmental parameter, with the goal of determining significant differences between sample groups. The initial consideration was a two-sample t-test, but the data distribution is non-normal, prompting inquiries about potential transformations or non-parametric alternatives. Participants suggest using regression analysis to understand the relationship between impedance and the environmental parameter, emphasizing the importance of accurately characterizing the baseline measurement. Additionally, a test of difference of proportions is proposed as a non-parametric option for analysis. The conversation highlights the need for careful statistical methods to validate the sensitivity of the measurement device.
uzi kiko
Messages
22
Reaction score
3
Hi All

I made an experiment where I measured the change of the impedance of a coil when I changed a environment parameter X times.
For each change I collected ~20 samples.
So I have a table with X lines that represent the number the change of the parameter and 20 columns that represent the repeat samples.
Now I would like to compare each line to the others and to find out if there is a significance difference between them.
Naturally I was thinking about T test. Unfortunately the measurement distribution is not normal, although the distribution is symmetric around the mean. ( I run Two-sample Kolmogorov-Smirnov test and found out that the test reject the hypothesis that the my sample arrived from normal distribution). You can find figure of the distribution here:
https://drive.google.com/open?id=1yp_Ufa4-N8kQD1twVCnHszL9RYLV-_3u

I know that if I had a higher number of samples I could average the sample groups till I will reach normal distribution and then use a T test, but I would like to avoid the idea to do the experiment again.

Now (Sorry about the long introduction... ) my questions are:
1) Do you know about a transformation that I can apply on my distribution so I will be able to use T test?
2) In case that there is no kind of transformation, which non parametric test would you suggest me to use?

Thanks a lot
Mosh

open
 
  • Like
Likes WWGD
Physics news on Phys.org
uzi kiko said:
I made an experiment where I measured the change of the impedance of a coil when I changed a environment parameter X times.
For each change I collected ~20 samples.
So I have a table with X lines that represent the number the change of the parameter and 20 columns that represent the repeat samples.
Now I would like to compare each line to the others and to find out if there is a significance difference between them.
open
This seems an odd way to analyze that sort of data. Presumably your intent in performing the experiment is to test some theory, and your theory includes a mathematical model that you can use to predict the relationship between your environment parameter and your measured impedance. If that is the case, then what parameters of the model do you want to estimate or verify using your data? If not, then why are you doing the experiment?
 
uzi kiko said:
I changed a environment parameter X times.
For each change I collected ~20 samples.
So I have a table with X lines that represent the number the change of the parameter and 20 columns that represent the repeat samples.
Is the environment parameter a numerical value or a categorical value?
 
Thanks a lot for your quick responses.

Regarding tnich question:
I am now at the next stage of my research.
(You can find my paper here:
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0186381)

The first stage we indeed made a mathematical model. Now I am at the stage of verifying/identifying of my experiment setup and specifically I would like to understand what is the resolution of my experiment setup.

Regarding Dale question:
My environment parameter is numerical value. (The volume ratio between blood and brain tissue)
 
uzi kiko said:
My environment parameter is numerical value. (The volume ratio between blood and brain tissue)
Then you want to use a regression approach, not a t test.
 
Thank you Dale,
I agree that if I want to understand the relation between the impedance and blood/brain ratio I should use regression test.
But the most important parameter that I have to identify is the minimum blood/brain ratio that my device can measure.

I would like to explain the way I was thinking to use the T test:
Let's say I have 6 measurements, each measurement contains 20 samples.
The first measurement is my baseline where there is only brain tissue without any blood.
Now I am taking the next measurement (Let's say with 2ml of blood). If there is a significant different between the 2 groups I can say the my device is sensitive enough for change of 2 ml of blood.
But if there is no significant different between the 2 groups - I will take the sample of 4 ml blood and compare, and so on.

Do you think that I can do this with regression test?
 
The problem is that this approach is strongly dependent on the number of samples. Suppose you use 20 samples and find a significant difference at 4 ml but not at 2 ml. Then, without changing your device you do the same experiment but with 200 samples. In that case you would be likely to find a significant difference at 2 ml also. Has your device become better? No. So this process doesn’t characterize the device.
 
Here are my thoughts. First, you want to do the regression so that you get an idea about the relationship between mL of blood and impedance.

Then, you want to characterize the 0 mL blood condition very well. You should acquire as many samples as feasible, and calculate a 95% confidence interval.

Using your regression you can convert that upper 95% limit to a mL blood measurement. That would likely be your best lower threshold. So use that volume (probably round up) as your candidate threshold and acquire a bunch of data at that volume also, and maybe one more slightly larger volume too.

Once you have that, you can do a ROC analysis to determine your best threshold for discriminating between blood and no blood as well as your sensitivity and specificity at that threshold.
 
Last edited:
  • Like
Likes WWGD
Thank you very much!
 
  • #10
How about a test of difference of proportions ( with the baseline being 0) needs simple sampling , which I think you have, independence the same and it is non-parametric. Wouldn't that work
 
Back
Top