How to Determine Statistical Significance Between Two Image Regions?

Wo0p · Jan 2, 2010

Hi all,

I have two images, without any exact spatial correspondence

I want to find if the difference between two regions is statistically significant

How would I do this?

EnumaElish · Jan 2, 2010

What do you mean by "exact spatial correspondence"?

Wo0p · Jan 2, 2010

Hi EnumaElish,

The images are of the human brain

They would be very similar except one is (slightly) distorted

Otherwise they would be the same

Hence no exact spatial correspondence

Wo0p · Jan 3, 2010

Anyone?

To clarify my question... I would be selecting regions which look as though they belong to the same anatomy

Then I need to tell if intensities in the two regions are significantly different

If it matters, I can select regions which contain the same no. of voxels

Please let me know if I'm not making sense, statistics in general confuse me :-/

EnumaElish · Jan 3, 2010

Okay, this makes it a lot easier. On each image you could set up an N-by-M grid, indexed by i = 1, ..., NM. For example, each i could be a pixel (I don't know what a voxel is). Then you could define a distance function d between pixel i of the first image and pixel i of the second image; e.g. d = 1 if both pixels are the same color (or intensity), d = 0 otherwise. This would give you a string of NM numbers (ones and zeros). This would be your data, and you could test whether the mean distance is statistically different from zero using a t test.

Or you can define a more complicated distance function, e.g., d_i = |y(1_i) - y(2_i)|, where y is a measure of intensity (you need to define this), 1_i is the i'th cell in the first image, 2_i is the i'th cell of the second image, and d_i is the distance between cells 1_i and 2_i. Again, this will give you a string of NM numbers (d_i's), and you can test whether the mean distance is statistically different from zero using a t test.

A characteristic of statistical tests, such as the t test, is that you can almost always find a statistically significant difference by increasing the sample sizes arbitrarily. So, even with N = M = 10 you do not find a statistically significant difference, with N = M = 10,000 you are much more likely to find a statistically significant difference. This is less of a problem if you are looking for a uniform rule to compare several pairs of images and to make relativistic statements such as "A is different from B more than A is different from C" but it's more problematic if you are looking for "the" statistical difference between two "canonical" images. In the latter case, you need to make a judgment call and come up with a "justifiable" grid size (based on the expertise in your field). Knowing next to nothing about what you are trying to accomplish with these images and the difference between them, one idea may be to zoom in on the region of interest (e.g., "the frontal lobe") and discard the rest of the images; that way your grid will not be "too general."

Wo0p · Jan 3, 2010

EnumaElish,

Whoah... thanks for that reply. Great read and very insightful :)

However I now have several new questions:

1. You mention two distance metrics, the first one di = 1 if two intensities at a pixel are equal, di = 0 otherwise and the second di = |y(1i) - y(2i)| where y is "some measure of intensity." What about the Euclidean distance (i.e. square root of sum of intensity sq.)? How do I tell which metric is most appropriate?

2. Consider if I have two copies of the same image, call it image A. Now I distort A by exchanging pairs of intensities at random, creating image B. Image A and B will have the same histogram, but will look quite different. Will a comparison between A and B find a statistically significant difference?

3. If I select two regions for comparison, according to your post they should have the same number of pixels and also the same dimensions. I assume this is a requirement of the t-test. Are there any statistical tests which do not have these requirements?

BTW a voxel is just a discrete volume element, in the same way a pixel is a discrete surface element

EnumaElish · Jan 5, 2010

1. Euclidean distance is nonlinear, and can be costly to implement for that reason. However, see this article.

2. This reminds me that you can use the Chi-squared test to test the difference between histograms -- again, not knowing the purpose of your testing, this is just another way to approach the problem. But if you were using any of the distance metrics noted above (discrete, absolute difference, or Euclidean) then almost surely you will find an arithmetic difference between the two means; and you can use a t-test to determine its statistical significance.

3. Yes; I suggest a regression approach. Let y be a vector of intensity measurements from either of the two images (say, K1 measurements from the first image and K2 measurements from the second image). Define x such that x(k) = 0 if y(k) belongs to the first image, and x(k) = 1 if y(k) belongs to the second image, for k = 1, ..., K1+K2. You can estimate the regression y = a + b x + u, using least squares. Coefficient b equals "mean y from the second image minus mean y of the first image." If b is statistically significant, then the difference between the two means is statistically significant. [Note that the distance function approach depends on a geometrical correspondence between the two images that the regression approach ignores -- to follow the permutation example in your post, in a case where the distance approach produces an arithmetic difference between the original and the permuted images, the regression approach will not.]

Wo0p · Jan 6, 2010

EnumaElish said:

2. This reminds me that you can use the Chi-squared test to test the difference between histograms

Really? This might be exactly what I need

EnumaElish said:

3. Yes; I suggest a regression approach. Let y be a vector of intensity measurements from either of the two images (say, K1 measurements from the first image and K2 measurements from the second image). Define x such that x(k) = 0 if y(k) belongs to the first image, and x(k) = 1 if y(k) belongs to the second image, for k = 1, ..., K1+K2. You can estimate the regression y = a + b x + u, using least squares. Coefficient b equals "mean y from the second image minus mean y of the first image." If b is statistically significant, then the difference between the two means is statistically significant. [Note that the distance function approach depends on a geometrical correspondence between the two images that the regression approach ignores -- to follow the permutation example in your post, in a case where the distance approach produces an arithmetic difference between the original and the permuted images, the regression approach will not.]

OK you lost me here. What is the vector u? Is it something I know is it part of the regressing?

"If b is statistically significant"

You mean sign. different from 0?

I have never ever come across these methods, so I'm probably missing something...

BTW thanks for the advice ! =]

EnumaElish · Jan 6, 2010

Wo0p said:

Really? This might be exactly what I need

See http://www.physics.csbsju.edu/stats/chi-square.html, http://www.itl.nist.gov/div898/handbook/eda/section3/eda35f.htm and http://en.wikipedia.org/wiki/Pearson's_chi-square_test

OK you lost me here. What is the vector u? Is it something I know is it part of the regressing?

Sorry, you only provide the y's and the x's, u is the random error term (a.k.a. the residual), or y - a - b x. For instance, if y (intensity measurements) contain a random measurement error ("noise") then u captures this "noise." More generally, any variation among the y's other than the variation that can be explained by the right-hand side variable(s) becomes residual variation. Standard statistical packages (including Excel) estimate and can print out the u terms in addition to the coefficients a and b, their statistical significance, and an array of additional statistics.

"If b is statistically significant"

You mean sign. different from 0?

Yes, that's what I meant -- that's the standard usage in regression analysis lingo.

I have never ever come across these methods, so I'm probably missing something...

See http://onlinestatbook.com/ (At the top click on Contents, then "Introduction to Simple Linear Regression" under "XII. Prediction" -- notice Prerequisites listed at the top of the page.)

Also see:
http://en.wikipedia.org/wiki/Regression_analysis
http://www.law.uchicago.edu/files/files/20.Sykes_.Regression.pdf
http://www.nlreg.com/intro.htm
http://www.statsoft.com/textbook/multiple-regression/

EnumaElish · Jan 7, 2010

Above, I wrote:

EnumaElish said:

[Note that the distance function approach depends on a geometrical correspondence between the two images that the regression approach ignores -- to follow the permutation example in your post, in a case where the distance approach produces an arithmetic difference between the original and the permuted images, the regression approach will not.]

I did not mean to imply that the distance approach will always produce a nonzero mean difference -- in fact, if the permutation is random then the expected (true mean) difference is zero, as the following example illustrates:

y , RandomRank , y* , d
0.019433061 , 7 , 0.290617864 , -0.271184804
0.114136996 , 14 , 0.264838615 , -0.150701619
0.136433932 , 8 , 0.333835024 , -0.197401092
0.138333371 , 4 , 0.138333371 , 0
0.19700036 , 13 , 0.312202404 , -0.115202044
0.264838615 , 2 , 0.705108738 , -0.440270123
0.290617864 , 1 , 0.019433061 , 0.271184804
0.312202404 , 5 , 0.136433932 , 0.175768472
0.317085208 , 16 , 0.99010499 , -0.673019782
0.333835024 , 3 , 0.662772917 , -0.328937893
0.567207525 , 20[/color] , 0.974873342 , -0.407665817
0.650723461 , 17 , 0.834248815 , -0.183525355
0.662772917 , 10 , 0.19700036 , 0.465772557
0.679963742 , 15 , 0.114136996 , 0.565826746
0.705108738 , 6 , 0.679963742 , 0.025144996
0.834248815 , 12 , 0.317085208 , 0.517163608
0.850628183 , 19 , 0.650723461 , 0.199904722
0.960884675 , 18 , 0.960884675 , 0
0.974873342 , 11 , 0.850628183 , 0.124245159
0.99010499 , 9 , 0.567207525[/color] , 0.422897465

Above, y is the original measurement, y* is the re-ordered measurement (y values re-ordered according to a random rank assigned to the original value), and d = y - y*. It can be verified that d averages out to nearly zero.

Wo0p · Jan 7, 2010

EnumaElish said:

Above, I wrote:I did not mean to imply that the distance approach will always produce a nonzero mean difference -- in fact, if the permutation is random then the expected (true mean) difference is zero.

Gotcha. All I wanted to say here was that there was no geometrical correspondence, as you put it, between the two images. As far as I understand (which isn't very far), the t-test assumes this kind of relationship.

I will try out both the Chi-squared and regression approach and post my results.

Wo0p · Jan 10, 2010

As per Enuma's initial suggestion, and after some sanity checking, I decided to go with a vanilla t-test using absolute difference between the two images d=|y(1i)-y(2i)| as my distance measure...

In Matlab its implemented as ttest2(A,B,pvalue)

How to Determine Statistical Significance Between Two Image Regions?

Similar threads

Hot Threads

B A Little Probability Puzzle

I Need help solving this Existence Algorithm for truth

I Stochastic calculus: Ito's lemma and differentials

I Help me understand skewness in QQ-plots please

I Intransitive implication

Recent Insights

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem