# 2 samples T test in case of non-normal distribution

• A
• uzi kiko
In summary, the conversation discusses an experiment where the impedance of a coil was measured while changing an environment parameter and collecting 20 samples for each change. The goal is to compare the different lines of data and determine if there is a significant difference using a T test. However, the distribution of the data is not normal, so the speaker is looking for alternative tests or transformations to use. The conversation also touches on using regression analysis and a test of difference of proportions to analyze the data and determine the minimum blood/brain ratio that the device can measure.

#### uzi kiko

Hi All

I made an experiment where I measured the change of the impedance of a coil when I changed a environment parameter X times.
For each change I collected ~20 samples.
So I have a table with X lines that represent the number the change of the parameter and 20 columns that represent the repeat samples.
Now I would like to compare each line to the others and to find out if there is a significance difference between them.
Naturally I was thinking about T test. Unfortunately the measurement distribution is not normal, although the distribution is symmetric around the mean. ( I run Two-sample Kolmogorov-Smirnov test and found out that the test reject the hypothesis that the my sample arrived from normal distribution). You can find figure of the distribution here:

I know that if I had a higher number of samples I could average the sample groups till I will reach normal distribution and then use a T test, but I would like to avoid the idea to do the experiment again.

Now (Sorry about the long introduction... ) my questions are:
1) Do you know about a transformation that I can apply on my distribution so I will be able to use T test?
2) In case that there is no kind of transformation, which non parametric test would you suggest me to use?

Thanks a lot
Mosh

WWGD
uzi kiko said:
I made an experiment where I measured the change of the impedance of a coil when I changed a environment parameter X times.
For each change I collected ~20 samples.
So I have a table with X lines that represent the number the change of the parameter and 20 columns that represent the repeat samples.
Now I would like to compare each line to the others and to find out if there is a significance difference between them.
This seems an odd way to analyze that sort of data. Presumably your intent in performing the experiment is to test some theory, and your theory includes a mathematical model that you can use to predict the relationship between your environment parameter and your measured impedance. If that is the case, then what parameters of the model do you want to estimate or verify using your data? If not, then why are you doing the experiment?

uzi kiko said:
I changed a environment parameter X times.
For each change I collected ~20 samples.
So I have a table with X lines that represent the number the change of the parameter and 20 columns that represent the repeat samples.
Is the environment parameter a numerical value or a categorical value?

Thanks a lot for your quick responses.

Regarding tnich question:
I am now at the next stage of my research.
(You can find my paper here:
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0186381)

The first stage we indeed made a mathematical model. Now I am at the stage of verifying/identifying of my experiment setup and specifically I would like to understand what is the resolution of my experiment setup.

Regarding Dale question:
My environment parameter is numerical value. (The volume ratio between blood and brain tissue)

uzi kiko said:
My environment parameter is numerical value. (The volume ratio between blood and brain tissue)
Then you want to use a regression approach, not a t test.

Thank you Dale,
I agree that if I want to understand the relation between the impedance and blood/brain ratio I should use regression test.
But the most important parameter that I have to identify is the minimum blood/brain ratio that my device can measure.

I would like to explain the way I was thinking to use the T test:
Let's say I have 6 measurements, each measurement contains 20 samples.
The first measurement is my baseline where there is only brain tissue without any blood.
Now I am taking the next measurement (Let's say with 2ml of blood). If there is a significant different between the 2 groups I can say the my device is sensitive enough for change of 2 ml of blood.
But if there is no significant different between the 2 groups - I will take the sample of 4 ml blood and compare, and so on.

Do you think that I can do this with regression test?

The problem is that this approach is strongly dependent on the number of samples. Suppose you use 20 samples and find a significant difference at 4 ml but not at 2 ml. Then, without changing your device you do the same experiment but with 200 samples. In that case you would be likely to find a significant difference at 2 ml also. Has your device become better? No. So this process doesn’t characterize the device.

Here are my thoughts. First, you want to do the regression so that you get an idea about the relationship between mL of blood and impedance.

Then, you want to characterize the 0 mL blood condition very well. You should acquire as many samples as feasible, and calculate a 95% confidence interval.

Using your regression you can convert that upper 95% limit to a mL blood measurement. That would likely be your best lower threshold. So use that volume (probably round up) as your candidate threshold and acquire a bunch of data at that volume also, and maybe one more slightly larger volume too.

Once you have that, you can do a ROC analysis to determine your best threshold for discriminating between blood and no blood as well as your sensitivity and specificity at that threshold.

Last edited:
WWGD
Thank you very much!

How about a test of difference of proportions ( with the baseline being 0) needs simple sampling , which I think you have, independence the same and it is non-parametric. Wouldn't that work