Normalized correlation with a constant vector

daviddoria · Feb 24, 2012

I am confused how to interpret the result of preforming a normalized correlation with a constant vector. Since you have to divide by the standard devation of both vectors (reference: http://en.wikipedia.org/wiki/Cross-correlation#Normalized_cross-correlation ) , if one of them is constant (say a vector of all 5's, which has standard deviation=0), then the correlation is infinity, but in fact the correlation should be zero right? This isn't just a corner case, in general if the standard deviation of one of the vectors is small, the correlation to any other vector is very high. Can anyone explain my misinterpretation?

Thanks,

David

Stephen Tashi · Feb 26, 2012

daviddoria said:

if one of them is constant (say a vector of all 5's, which has standard deviation=0), then the correlation is infinity, but in fact the correlation should be zero right?

In that case, the expression for correlation takes the form 0/0, so you can't say it is infinity.

You raise an interesting question. It is important in practical applications of image processing. It's also a question about pure mathematics, but in that that respect it's more of a nitpicking detail.

In pure mathematics, perhaps some statistics texts define a value for the correlation in this case, but unless a special definition is given, all you can say about the mathematical expression is that it is undefined.

(If anyone wishes to delve into this technicality, we should begin by making a distinction among three distinct topics: covariance of two random variables, sample covariance, and estimator(s) of covariance. Things that are properties of samples (e.g. their variance) have somewhat arbitrary definitions (e.g. do we compute variance by dividing by N or N-1? ) and different books define them differently. Things that are properties of random variables and estimators of parameters have standard definitions, but I don't know if they are standardized in dealing with all the exceptional situations.)

As a practical concern, I think you are worried that if you have image patch A and are trying to match it to other image patches in a photo, that it may have a large correlation to a nearly constant patch B, which it does not resemble. As far as I know, that might happen. Expressions that approach 0/0 can take large or small values depending on how they approach it.

I'm sure your next question is whether there is some modification of the correlation formula that would produce a function that would avoid this problem. Off hand, I don't know of one. I'll have to think about it.

daviddoria · Feb 26, 2012

Stephen Tashi said:

As a practical concern, I think you are worried that if you have image patch A and are trying to match it to other image patches in a photo, that it may have a large correlation to a nearly constant patch B, which it does not resemble. As far as I know, that might happen.

Yes, that is exactly what I am observing.

I'm sure your next question is whether there is some modification of the correlation formula that would produce a function that would avoid this problem. Off hand, I don't know of one. I'll have to think about it.

You got it :)

Stephen Tashi · Feb 27, 2012

For each interior pixel in a patch at location (i,j), you can compute the difference (in each of the RGB intensities) between it and the 8 adjacent pixels. You could treat these differences as the data to be matched and match it with the cross-correlation function. That way a uniform patch wouldn't match well with a patch that had more variation. (This can be regarded as a special case of my suggestion in the other thread about using transformations. In this case the transformations are displacement of the patch by 1 pixel.)

If you want to cut the computational costs, you could only use the differences from a sample of pixels in the patch. Perhaps it wouldn't even have to be a truly random sample. You might be able to pick a subset of pixels that were in a deterministic but "random looking" pattern.

CellCoree · Mar 5, 2012

The result of performing a normalized correlation with a constant vector can be confusing because, as you mentioned, dividing by the standard deviation of a constant vector will result in an infinite correlation. However, this does not mean that the correlation is actually infinite or that the correlation should be zero. The issue here is that when one of the vectors has a standard deviation of 0, it means that all of its values are the same, but it does not necessarily mean that those values are 0. Therefore, the correlation will still be high when compared to other vectors.

In general, when calculating correlation, it is important to consider the variability of the data. A constant vector has no variability, so it will always result in a high correlation with any other vector. This does not necessarily mean that there is a strong relationship between the two vectors. It is important to also consider the context and the data itself when interpreting correlation results.

Additionally, it may be helpful to consider other measures of correlation, such as Spearman's rank correlation or Kendall's tau, which are less affected by constant values. These measures rank the data rather than using the actual values, making them more robust to extreme values or constant values.

Overall, the interpretation of correlation results should not solely rely on the standard deviation of the vectors involved. It is important to consider the data and the context in order to accurately interpret the relationship between the variables being compared.

Normalized correlation with a constant vector

What is normalized correlation with a constant vector?

How is normalized correlation with a constant vector calculated?

What does a high normalized correlation with a constant vector indicate?

What are the limitations of using normalized correlation with a constant vector?

How is normalized correlation with a constant vector used in scientific research?

Similar threads

Hot Threads

Recent Insights