Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Neurologist: What P-values should I be expecting?

  1. Nov 30, 2015 #1


    User Avatar
    Gold Member

    Inexperienced data analyst here with a real-world example,

    I have attached a zip-file with screenshots and p-values of the following data. The "reference regions" are Cerebellum White, Cerebellum Gray, and Temporal Cortex. The top-most graphs depict the curves in the indicated region for young and old subjects. The bottom-most graph has two curves, one for the averaged old values, and one for the averaged young values.

    Say I have MRI data for 11 different human subjects which allows me to see the concentration of some chemical compound in specific areas of the brain over time. I have a total of 180 time points for each subject. The data are noisy, but you can clearly see the peak immediately after injection, and the steady slow concentration decay for a time afterward.

    I separate them into two groups, 5 younger and 6 older subjects.

    Our hypothesis: We expect the older subjects' curves to decay more slowly than young subjects in some areas of the brain, but in the reference regions we would not expect much of a difference.

    I use MATLAB to perform a two-sample t-test ('ttest2') on the average of the young subjects, against the average of the old subjects, and get P-values for each of the regions of the brain I am interested in.

    What happens is that my p-values seem to somewhat reflect what I was expecting, i.e. the p-values for the reference regions are much higher than those of the other regions.

    However, all of the p-values are very low to begin with (they are all statistically significant, P<0.05 ), which seems strange, and excluding a single subject from the analysis can drastically change the p-values by several orders of magnitude.

    Why are my p-values so low? In the regions where I would expect a significant difference, the p-values are on the order of 10-17. This seems wayyy lower than I was expecting, the curves are not THAT different, right?

    Is this because of the signal to noise ratio? Because I have a small number of subjects? Or because I have a large number of time points? Or some combination of these, or something else I may not have considered?

    Any suggestions as to what I should do at this point?

    [EDIT] Attachment deleted by Mentor.
    Last edited by a moderator: Nov 30, 2015
  2. jcsd
  3. Nov 30, 2015 #2


    Staff: Mentor

    I deleted your zip attachment as we don't allow those types of attachments.

    Instead could you place the data in a new post in code tags?


    your data goes here


    Remember to remove the dots from the code tags.
  4. Nov 30, 2015 #3


    Staff: Mentor

    So this is a pretty common misconception. A p-value is not a measure of how large a difference is. It is just a measure of how probable the data is if the difference were zero. If you collect a large enough sample, even the most miniscule differences can become enormously significant. So even in the reference regions it is unlikely that the true difference is zero.

    With that said, your statistical test sounds like it might be suboptimal. The data almost surely are not normally distributed. What might be normally distributed would be the residuals to some curve fit. It seems that it would be more reasonable to fit the time courses to some appropriate signal model and then do your statistics on the fit parameters. Also, since you have different N for your Two groups I would tend to use an ANOVA or a linear model rather than a T test.
    Last edited: Dec 1, 2015
  5. Dec 1, 2015 #4


    User Avatar
    Gold Member

    Could you maybe clarify what you mean by 'if the difference were zero'? (Or perhaps suggest some light reading on the subject?)

    I have very little experience with statistics, and a tad bit more with probability: My basic understanding is that our two sets of data are the results of two different probability distribution functions (PDF), and the p-value is the probability that two sets of data come from PDFs with the same mean.

    It has been suggested that I could, perhaps, use some combination of the gamma variate and some other function to model my data and analyze the parameters. If I do this, am I basically finding the parameters of the gamma variate that best fit each curve, and then performing a statistical analysis on the parameters?

    I think that would be some kind of non-linear fitting, no? Is this just something I would have to do computationally, using some form of trial and error to close in on the best fit? Or is there some more 'elegant' way to do this?
  6. Dec 1, 2015 #5
    NO!!!! The p-value is not this probability AT ALL. This is a very popular misconception. If you want to interpret something as this probability, you need Bayesian statistics.

    But are you interested whether the two curves are the same? They're clearly not for most of the data you linked. In a lot of cases I see one curves always lying below the other curve with a significant distance. This is not what I would expect for two curves that are supposed to be the same.

    But in your OP your question is not "the two curves are the same" but rather "the rate with which the curves decrease is the same". This is a very different question!
  7. Dec 1, 2015 #6


    User Avatar
    Gold Member

    Attached a condensed version of the original attachment.

    I only gleaned that information from the MATLAB help file on the t-test. I am kind of scrambling to learn this stuff.

    This is what we are expecting. One group would be different than the other group, that is, the older group should have a roughly higher concentration than the younger group for all or most time points. I can see the difference with my eyeballs in some regions, but it is not clear from the statistical test. This is why I am looking for help. (That, and I have no prior experience with statistics)
  8. Dec 1, 2015 #7


    User Avatar
    Gold Member

    Condensed version of the data attached to this post, in .png format.

    Attached Files:

  9. Dec 1, 2015 #8


    Staff: Mentor

    In statistics this is called the "null hypothesis". Basically, you assume that the means are in fact identical. Then using that assumption you calculate how likely it would be to get the data you measured. That is what a p value is, the probability of the data, given the null hypothesis.
  10. Dec 2, 2015 #9
    That is also not what a p-value is. Think about it, if the null-hypothesis is "the data is normally distributed with mean 0 and standard deviation 1", then the probability of the data (any data) is 0.

    What the p-value actually is, is the probability of rejecting the null-hypothesis when the null-hypothesis is actually true. In statistics, the null-hypothesis is often something we want to reject. The p-value is the probability that we make errors when doing this rejection. We want this to be as small as possible: smaller p-value means smaller probabilities of making a wrong decision.
  11. Dec 2, 2015 #10


    Staff: Mentor

    Sure, I was keeping things as simple as I could. Technically the p value is the probability that a random sample produces a test statistic that is at least as extreme as the test statistic that was actually observed given the null hypothesis. I didn't think that the additional wordiness was helpful to the OP here.

    While that is true, that doesn't help the OP understand why he is getting the results he has found. He has some data which "are not THAT different" and is surprised by how small the p values are. He knows that the p value should lead him to reject the null hypothesis, but is surprised by that because he has the common idea that we reject the p value when the difference is large.

    Instead, we reject the p value when the observed difference is unlikely. We can make an arbitrarily small difference unlikely (and therefore statistically significant) simply by including an arbitrarily large sample. So a low p value does not indicate a large effect, as expected by many people.
    Last edited: Dec 2, 2015
  12. Dec 2, 2015 #11


    Staff: Mentor

    since you are already using Matlab there are nonlinear fitting routines already available. You would want to start with those.

    Since you are interested in the rate of decay, your chosen fit function should have a parameter which adjusts the "height" and a parameter which adjusts the decay rate. From your plots it seems likely that all of the heights will be significantly different, but maybe not the decay rates.
  13. Dec 2, 2015 #12


    User Avatar
    Gold Member

    This gives me a little better perspective, thanks.

    I'm having a hard time wrapping my head around that. The PDF for a normal distribution is very well known, you can plug in any value x and, given the mean and sd, return the probability of observing that value right?

    Thanks, this is a great start, thank you both very much for your help.
  14. Dec 2, 2015 #13


    Staff: Mentor

    For a continuous random variable the height of the PDF is the probability density, not the probability. So to get the probability you have to integrate over some range. If you integrate over a single point the integral is zero.
  15. Dec 2, 2015 #14

    Stephen Tashi

    User Avatar
    Science Advisor

    To reinforce that thought, think of a thin rod that has a "mass density per unit length". At a given point the point has zero mass. The mass density at the point is used in calculations that estimate the mass of small intervals of rod that contain that point, but the number you estimate for mass depends on the size of the small interval.

    In practice, measurements of a continuous variable in an experiment are not actually points because there is uncertainty due to the measuring apparatus. So if you attempt calculate the probability of a given observation you need to account for the the uncertainty of the measurement to give yourself a finite interval. The probability of realizing a particular value (specified with complete certainty) from a normal distribution is zero even though the probability density function at that value is not zero.

    Statistical tests are procedures, not proofs. When applied to continuous random values, such procedures are phrased in terms of intervals or "acceptance regions". For example, "...if the mean is less than..." or "if the mean is between the values...". If they were phrased in terms of single values (e.g. "...if the mean is exactly 1.3794..."), the probabilities involved would be zero.
  16. Dec 2, 2015 #15


    User Avatar
    Gold Member

    Interesting, how does this contrast with discrete examples?

    To see if I am understanding: If I consider the PMF of the sum of two dice, can't I say I have exactly a 1/36 probability to roll snake eyes? Now suppose I have two 'continuous' dice, that can be any value from 1-6, they would have 0 probability to roll snakeyes (or any particular value, for that matter), because there are infinitely many other values the dice could 'roll' instead?

    Edit: I think it is dawning on me why we use probability mass vs. probability density
  17. Dec 2, 2015 #16


    Staff: Mentor

    Yes, that is exactly correct.

    If you have a 6-sided dice then the probability of rolling a 1 is 1/6. If you have a 20 sided dice then the probability of rolling a 1 is 1/20. If you have an infinite sided dice then the probability of rolling a 1 is 0.
  18. Dec 2, 2015 #17


    User Avatar
    Science Advisor
    Gold Member

    If all faces of the infinite-sided , as the limiting case (I think this can be made more precise, maybe with techniqes/results from B.B.I's "Metric Geometry" book) of a dice with a large number of faces, are equal * , I think the limiting position is a sphere. So we would then be "rolling a sphere"

    * Guess a theoretical, I guess, physically difficult, if not impossible to construct for large number of (equal) sides.
  19. Dec 8, 2015 #18
    If I understood correctly: For each subject you have a time series (180 points). You then average the time series across subjects within each group.

    Could you please clarify, what exactly did you feed into the ttest2 function? The entire average time series of 180 point each?
  20. Dec 8, 2015 #19


    User Avatar
    Gold Member

    Yep, I averaged across subjects within the groups, resulting in two curves of 180 time points each. One for the old subjects, and one for the young subjects. I compared these two curves using ttest2.

    We have since hired a statistician, and she has done some modeling, some statistical analysis that probably makes more sense... I am hoping that she will not mind teaching me the process so I can recreate it on my own.
  21. Dec 15, 2015 #20
    The p-value is the probability of observing an effect greater than or equal to the observed effect, under the null hypothesis. The p-value cannot be interpreted as the probability of making a type-1 error: How then would you interpret a p-value of >0.05? If our significance threshold is 0.05, then the probability that we have made a type-1 error is zero, since we haven't even rejected the null hypothesis. The p-value is almost entirely unrelated to the probability of falsely rejecting the null hypothesis, which would require Bayesian methods to calculate.
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook