- #1
ElijahRockers
Gold Member
- 270
- 10
Inexperienced data analyst here with a real-world example,
I have attached a zip-file with screenshots and p-values of the following data. The "reference regions" are Cerebellum White, Cerebellum Gray, and Temporal Cortex. The top-most graphs depict the curves in the indicated region for young and old subjects. The bottom-most graph has two curves, one for the averaged old values, and one for the averaged young values.
Say I have MRI data for 11 different human subjects which allows me to see the concentration of some chemical compound in specific areas of the brain over time. I have a total of 180 time points for each subject. The data are noisy, but you can clearly see the peak immediately after injection, and the steady slow concentration decay for a time afterward.
I separate them into two groups, 5 younger and 6 older subjects.
Our hypothesis: We expect the older subjects' curves to decay more slowly than young subjects in some areas of the brain, but in the reference regions we would not expect much of a difference.
I use MATLAB to perform a two-sample t-test ('ttest2') on the average of the young subjects, against the average of the old subjects, and get P-values for each of the regions of the brain I am interested in.
What happens is that my p-values seem to somewhat reflect what I was expecting, i.e. the p-values for the reference regions are much higher than those of the other regions.
However, all of the p-values are very low to begin with (they are all statistically significant, P<0.05 ), which seems strange, and excluding a single subject from the analysis can drastically change the p-values by several orders of magnitude.
Why are my p-values so low? In the regions where I would expect a significant difference, the p-values are on the order of 10-17. This seems wayyy lower than I was expecting, the curves are not THAT different, right?
Is this because of the signal to noise ratio? Because I have a small number of subjects? Or because I have a large number of time points? Or some combination of these, or something else I may not have considered?
Any suggestions as to what I should do at this point?
[EDIT] Attachment deleted by Mentor.
I have attached a zip-file with screenshots and p-values of the following data. The "reference regions" are Cerebellum White, Cerebellum Gray, and Temporal Cortex. The top-most graphs depict the curves in the indicated region for young and old subjects. The bottom-most graph has two curves, one for the averaged old values, and one for the averaged young values.
Say I have MRI data for 11 different human subjects which allows me to see the concentration of some chemical compound in specific areas of the brain over time. I have a total of 180 time points for each subject. The data are noisy, but you can clearly see the peak immediately after injection, and the steady slow concentration decay for a time afterward.
I separate them into two groups, 5 younger and 6 older subjects.
Our hypothesis: We expect the older subjects' curves to decay more slowly than young subjects in some areas of the brain, but in the reference regions we would not expect much of a difference.
I use MATLAB to perform a two-sample t-test ('ttest2') on the average of the young subjects, against the average of the old subjects, and get P-values for each of the regions of the brain I am interested in.
What happens is that my p-values seem to somewhat reflect what I was expecting, i.e. the p-values for the reference regions are much higher than those of the other regions.
However, all of the p-values are very low to begin with (they are all statistically significant, P<0.05 ), which seems strange, and excluding a single subject from the analysis can drastically change the p-values by several orders of magnitude.
Why are my p-values so low? In the regions where I would expect a significant difference, the p-values are on the order of 10-17. This seems wayyy lower than I was expecting, the curves are not THAT different, right?
Is this because of the signal to noise ratio? Because I have a small number of subjects? Or because I have a large number of time points? Or some combination of these, or something else I may not have considered?
Any suggestions as to what I should do at this point?
[EDIT] Attachment deleted by Mentor.
Last edited by a moderator: