Neurologist: What P-values should I be expecting?

In summary, the conversation discusses a data analyst's real-world example involving MRI data for 11 subjects and their concentration of a chemical compound in specific areas of the brain over time. The data is noisy but shows a clear peak after injection and a slow decay afterward. The analyst separates the subjects into two groups, younger and older, and uses a two-sample t-test to compare the average of the two groups in different regions of the brain. However, the p-values obtained are much lower than expected and can drastically change when excluding a single subject from the analysis. The reason for this is because a p-value is not a measure of the size of the difference between the two groups, but rather the probability of obtaining the data if the difference were zero
  • #1
ElijahRockers
Gold Member
270
10
Inexperienced data analyst here with a real-world example,

I have attached a zip-file with screenshots and p-values of the following data. The "reference regions" are Cerebellum White, Cerebellum Gray, and Temporal Cortex. The top-most graphs depict the curves in the indicated region for young and old subjects. The bottom-most graph has two curves, one for the averaged old values, and one for the averaged young values.

Say I have MRI data for 11 different human subjects which allows me to see the concentration of some chemical compound in specific areas of the brain over time. I have a total of 180 time points for each subject. The data are noisy, but you can clearly see the peak immediately after injection, and the steady slow concentration decay for a time afterward.

I separate them into two groups, 5 younger and 6 older subjects.

Our hypothesis: We expect the older subjects' curves to decay more slowly than young subjects in some areas of the brain, but in the reference regions we would not expect much of a difference.

I use MATLAB to perform a two-sample t-test ('ttest2') on the average of the young subjects, against the average of the old subjects, and get P-values for each of the regions of the brain I am interested in.

What happens is that my p-values seem to somewhat reflect what I was expecting, i.e. the p-values for the reference regions are much higher than those of the other regions.

However, all of the p-values are very low to begin with (they are all statistically significant, P<0.05 ), which seems strange, and excluding a single subject from the analysis can drastically change the p-values by several orders of magnitude.

Why are my p-values so low? In the regions where I would expect a significant difference, the p-values are on the order of 10-17. This seems wayyy lower than I was expecting, the curves are not THAT different, right?

Is this because of the signal to noise ratio? Because I have a small number of subjects? Or because I have a large number of time points? Or some combination of these, or something else I may not have considered?

Any suggestions as to what I should do at this point?

[EDIT] Attachment deleted by Mentor.
 
Last edited by a moderator:
Physics news on Phys.org
  • #2
I deleted your zip attachment as we don't allow those types of attachments.

Instead could you place the data in a new post in code tags?

[.code=data.]

your data goes here

[./code.]

Remember to remove the dots from the code tags.
 
  • #3
ElijahRockers said:
Why are my p-values so low? In the regions where I would expect a significant difference, the p-values are on the order of 10-17. This seems wayyy lower than I was expecting, the curves are not THAT different, right?

Is this because of the signal to noise ratio? Because I have a small number of subjects? Or because I have a large number of time points? Or some combination of these, or something else I may not have considered?
So this is a pretty common misconception. A p-value is not a measure of how large a difference is. It is just a measure of how probable the data is if the difference were zero. If you collect a large enough sample, even the most miniscule differences can become enormously significant. So even in the reference regions it is unlikely that the true difference is zero.

With that said, your statistical test sounds like it might be suboptimal. The data almost surely are not normally distributed. What might be normally distributed would be the residuals to some curve fit. It seems that it would be more reasonable to fit the time courses to some appropriate signal model and then do your statistics on the fit parameters. Also, since you have different N for your Two groups I would tend to use an ANOVA or a linear model rather than a T test.
 
Last edited:
  • Like
Likes ElijahRockers
  • #4
DaleSpam said:
It is just a measure of how probable the data is if the difference were zero.

Could you maybe clarify what you mean by 'if the difference were zero'? (Or perhaps suggest some light reading on the subject?)

I have very little experience with statistics, and a tad bit more with probability: My basic understanding is that our two sets of data are the results of two different probability distribution functions (PDF), and the p-value is the probability that two sets of data come from PDFs with the same mean.

DaleSpam said:
It seems that it would be more reasonable to fit the time courses to some appropriate signal model and then do your statistics on the fit parameters.

It has been suggested that I could, perhaps, use some combination of the gamma variate and some other function to model my data and analyze the parameters. If I do this, am I basically finding the parameters of the gamma variate that best fit each curve, and then performing a statistical analysis on the parameters?

I think that would be some kind of non-linear fitting, no? Is this just something I would have to do computationally, using some form of trial and error to close in on the best fit? Or is there some more 'elegant' way to do this?
 
  • #5
ElijahRockers said:
the p-value is the probability that two sets of data come from PDFs with the same mean.

NO! The p-value is not this probability AT ALL. This is a very popular misconception. If you want to interpret something as this probability, you need Bayesian statistics.

It has been suggested that I could, perhaps, use some combination of the gamma variate and some other function to model my data and analyze the parameters. If I do this, am I basically finding the parameters of the gamma variate that best fit each curve, and then performing a statistical analysis on the parameters?

I think that would be some kind of non-linear fitting, no? Is this just something I would have to do computationally, using some form of trial and error to close in on the best fit? Or is there some more 'elegant' way to do this?

But are you interested whether the two curves are the same? They're clearly not for most of the data you linked. In a lot of cases I see one curves always lying below the other curve with a significant distance. This is not what I would expect for two curves that are supposed to be the same.

But in your OP your question is not "the two curves are the same" but rather "the rate with which the curves decrease is the same". This is a very different question!
 
  • #6
Attached a condensed version of the original attachment.

micromass said:
The p-value is not this probability AT ALL.

I only gleaned that information from the MATLAB help file on the t-test. I am kind of scrambling to learn this stuff.

micromass said:
In a lot of cases I see one curve always lying below the other curve with a significant distance.

This is what we are expecting. One group would be different than the other group, that is, the older group should have a roughly higher concentration than the younger group for all or most time points. I can see the difference with my eyeballs in some regions, but it is not clear from the statistical test. This is why I am looking for help. (That, and I have no prior experience with statistics)
 
  • #7
Condensed version of the data attached to this post, in .png format.
 

Attachments

  • combined_MRI.png
    combined_MRI.png
    20.9 KB · Views: 453
  • #8
ElijahRockers said:
Could you maybe clarify what you mean by 'if the difference were zero'? (Or perhaps suggest some light reading on the subject?)
In statistics this is called the "null hypothesis". Basically, you assume that the means are in fact identical. Then using that assumption you calculate how likely it would be to get the data you measured. That is what a p value is, the probability of the data, given the null hypothesis.
 
  • Like
Likes ElijahRockers
  • #9
DaleSpam said:
That is what a p value is, the probability of the data, given the null hypothesis.

That is also not what a p-value is. Think about it, if the null-hypothesis is "the data is normally distributed with mean 0 and standard deviation 1", then the probability of the data (any data) is 0.

What the p-value actually is, is the probability of rejecting the null-hypothesis when the null-hypothesis is actually true. In statistics, the null-hypothesis is often something we want to reject. The p-value is the probability that we make errors when doing this rejection. We want this to be as small as possible: smaller p-value means smaller probabilities of making a wrong decision.
 
  • #10
micromass said:
That is also not what a p-value is. Think about it, if the null-hypothesis is "the data is normally distributed with mean 0 and standard deviation 1", then the probability of the data (any data) is 0.
Sure, I was keeping things as simple as I could. Technically the p value is the probability that a random sample produces a test statistic that is at least as extreme as the test statistic that was actually observed given the null hypothesis. I didn't think that the additional wordiness was helpful to the OP here.
micromass said:
What the p-value actually is, is the probability of rejecting the null-hypothesis when the null-hypothesis is actually true.
While that is true, that doesn't help the OP understand why he is getting the results he has found. He has some data which "are not THAT different" and is surprised by how small the p values are. He knows that the p value should lead him to reject the null hypothesis, but is surprised by that because he has the common idea that we reject the p value when the difference is large.

Instead, we reject the p value when the observed difference is unlikely. We can make an arbitrarily small difference unlikely (and therefore statistically significant) simply by including an arbitrarily large sample. So a low p value does not indicate a large effect, as expected by many people.
 
Last edited:
  • #11
ElijahRockers said:
I think that would be some kind of non-linear fitting, no? Is this just something I would have to do computationally, using some form of trial and error to close in on the best fit? Or is there some more 'elegant' way to do this?
since you are already using Matlab there are nonlinear fitting routines already available. You would want to start with those.

Since you are interested in the rate of decay, your chosen fit function should have a parameter which adjusts the "height" and a parameter which adjusts the decay rate. From your plots it seems likely that all of the heights will be significantly different, but maybe not the decay rates.
 
  • #12
micromass said:
What the p-value actually is, is the probability of rejecting the null-hypothesis when the null-hypothesis is actually true.

This gives me a little better perspective, thanks.

micromass said:
if the null-hypothesis is "the data is normally distributed with mean 0 and standard deviation 1", then the probability of the data (any data) is 0.

I'm having a hard time wrapping my head around that. The PDF for a normal distribution is very well known, you can plug in any value x and, given the mean and sd, return the probability of observing that value right?

DaleSpam said:
there are nonlinear fitting routines already available.

Thanks, this is a great start, thank you both very much for your help.
 
  • #13
ElijahRockers said:
I'm having a hard time wrapping my head around that. The PDF for a normal distribution is very well known, you can plug in any value x and, given the mean and sd, return the probability of observing that value right?
For a continuous random variable the height of the PDF is the probability density, not the probability. So to get the probability you have to integrate over some range. If you integrate over a single point the integral is zero.
 
  • Like
Likes ElijahRockers
  • #14
DaleSpam said:
For a continuous random variable the height of the PDF is the probability density, not the probability. So to get the probability you have to integrate over some range. If you integrate over a single point the integral is zero.

To reinforce that thought, think of a thin rod that has a "mass density per unit length". At a given point the point has zero mass. The mass density at the point is used in calculations that estimate the mass of small intervals of rod that contain that point, but the number you estimate for mass depends on the size of the small interval.

In practice, measurements of a continuous variable in an experiment are not actually points because there is uncertainty due to the measuring apparatus. So if you attempt calculate the probability of a given observation you need to account for the the uncertainty of the measurement to give yourself a finite interval. The probability of realizing a particular value (specified with complete certainty) from a normal distribution is zero even though the probability density function at that value is not zero.

Statistical tests are procedures, not proofs. When applied to continuous random values, such procedures are phrased in terms of intervals or "acceptance regions". For example, "...if the mean is less than..." or "if the mean is between the values...". If they were phrased in terms of single values (e.g. "...if the mean is exactly 1.3794..."), the probabilities involved would be zero.
 
  • Like
Likes ElijahRockers
  • #15
Stephen Tashi said:
To reinforce that thought, think of a thin rod that has a "mass density per unit length". At a given point the point has zero mass. The mass density at the point is used in calculations that estimate the mass of small intervals of rod that contain that point, but the number you estimate for mass depends on the size of the small interval.

Interesting, how does this contrast with discrete examples?

To see if I am understanding: If I consider the PMF of the sum of two dice, can't I say I have exactly a 1/36 probability to roll snake eyes? Now suppose I have two 'continuous' dice, that can be any value from 1-6, they would have 0 probability to roll snakeyes (or any particular value, for that matter), because there are infinitely many other values the dice could 'roll' instead?

Edit: I think it is dawning on me why we use probability mass vs. probability density
 
  • #16
Yes, that is exactly correct.

If you have a 6-sided dice then the probability of rolling a 1 is 1/6. If you have a 20 sided dice then the probability of rolling a 1 is 1/20. If you have an infinite sided dice then the probability of rolling a 1 is 0.
 
  • Like
Likes ElijahRockers
  • #17
DaleSpam said:
Yes, that is exactly correct.

If you have a 6-sided dice then the probability of rolling a 1 is 1/6. If you have a 20 sided dice then the probability of rolling a 1 is 1/20. If you have an infinite sided dice then the probability of rolling a 1 is 0.
If all faces of the infinite-sided , as the limiting case (I think this can be made more precise, maybe with techniqes/results from B.B.I's "Metric Geometry" book) of a dice with a large number of faces, are equal * , I think the limiting position is a sphere. So we would then be "rolling a sphere"* Guess a theoretical, I guess, physically difficult, if not impossible to construct for large number of (equal) sides.
 
  • Like
Likes ElijahRockers
  • #18
ElijahRockers said:
I use MATLAB to perform a two-sample t-test ('ttest2') on the average of the young subjects, against the average of the old subjects, and get P-values for each of the regions of the brain I am interested in.

What happens is that my p-values seem to somewhat reflect what I was expecting, i.e. the p-values for the reference regions are much higher than those of the other regions.

However, all of the p-values are very low to begin with (they are all statistically significant, P<0.05 ), which seems strange, and excluding a single subject from the analysis can drastically change the p-values by several orders of magnitude.

If I understood correctly: For each subject you have a time series (180 points). You then average the time series across subjects within each group.

Could you please clarify, what exactly did you feed into the ttest2 function? The entire average time series of 180 point each?
 
  • #19
PMRDK said:
If I understood correctly: For each subject you have a time series (180 points). You then average the time series across subjects within each group.

Could you please clarify, what exactly did you feed into the ttest2 function? The entire average time series of 180 point each?

Yep, I averaged across subjects within the groups, resulting in two curves of 180 time points each. One for the old subjects, and one for the young subjects. I compared these two curves using ttest2.

We have since hired a statistician, and she has done some modeling, some statistical analysis that probably makes more sense... I am hoping that she will not mind teaching me the process so I can recreate it on my own.
 
  • #20
micromass said:
That is also not what a p-value is. Think about it, if the null-hypothesis is "the data is normally distributed with mean 0 and standard deviation 1", then the probability of the data (any data) is 0.

What the p-value actually is, is the probability of rejecting the null-hypothesis when the null-hypothesis is actually true. In statistics, the null-hypothesis is often something we want to reject. The p-value is the probability that we make errors when doing this rejection. We want this to be as small as possible: smaller p-value means smaller probabilities of making a wrong decision.

The p-value is the probability of observing an effect greater than or equal to the observed effect, under the null hypothesis. The p-value cannot be interpreted as the probability of making a type-1 error: How then would you interpret a p-value of >0.05? If our significance threshold is 0.05, then the probability that we have made a type-1 error is zero, since we haven't even rejected the null hypothesis. The p-value is almost entirely unrelated to the probability of falsely rejecting the null hypothesis, which would require Bayesian methods to calculate.
 

1. What is a P-value and why is it important in neurology research?

A P-value is a statistical measure that indicates the likelihood of obtaining a result at least as extreme as the one observed, assuming that the null hypothesis is true. In neurology research, P-values are important because they help determine the significance of a study's findings. A low P-value (usually less than 0.05) suggests that the results are not due to chance and can be considered statistically significant.

2. What factors can influence the P-value in neurology research?

There are several factors that can influence the P-value in neurology research, including sample size, variability of data, and the chosen statistical test. A larger sample size can decrease the P-value, while a smaller sample size can increase it. Similarly, if the data has a lot of variability, the P-value may be higher, making it harder to determine statistical significance. The choice of statistical test can also impact the P-value, as different tests may have different levels of sensitivity to detect significant differences.

3. What is the ideal P-value in neurology research?

There is no specific value that can be considered as the ideal P-value in neurology research. However, a P-value of less than 0.05 is generally accepted as statistically significant in most studies. It is important to note that the interpretation of the P-value should be considered in the context of the research question and study design.

4. Can a P-value alone determine the validity of a neurology study?

No, a P-value alone cannot determine the validity of a neurology study. While a low P-value may suggest that the results are not due to chance, it does not necessarily mean that the study is valid. Other factors such as study design, methodology, and potential biases should also be considered when evaluating the validity of a study.

5. Are there other statistical measures that should be considered in addition to the P-value in neurology research?

Yes, there are other statistical measures that should be considered in addition to the P-value in neurology research. These include effect size, confidence intervals, and power analysis. These measures can provide additional information about the strength and magnitude of the observed results, and can help interpret the significance of the P-value.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
926
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
20
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
20
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
17
Views
2K
Back
Top