Vector difference metric that considers the variance of the components

1. Feb 21, 2012

daviddoria

I am trying to match little square patches in an image. You can imagine that these patches have been "vectorized" in that the values are reordered consistently into a 1D array. At first glance, it seems reasonable to simply do a Euclidean distance style comparison of two of these arrays to get a "similarity" measure. This works fine in many cases (the "best" patch according to this metric looks very much like the query patch). However, there is a degenerate case (that actually happens quite often in practice) where a patch matches extremely well to a near-constant patch that has values near the average of the query patch. For example, say I have a picture of a house - I would expect a patch of grass to match really well to other patches of grass. But what I see happening is that ocassionally a patch of grass will match to a "smooth/solid" patch say of the side of the house (if the smooth patch happens to have values near the average of the grass patch). This match is *obviously* wrong to a human observer, so I want my distance metric to account for this. These cases can be detected by comparing the variance of the query patch to the variance of the best matching patch. If the variances are very different, but the Euclidean distance is very low, then we probably have this case. What I am looking for is some "real/official" metric that takes this into account inside the metric. Of course I do a simple:

Difference = EuclideanDistance + lambda * VarianceDistance

but then I would have to learn lambda, which would require me to have training data (known good pairs of patches, etc) - which is very annoying :)

Thanks!

David

2. Feb 21, 2012

Stephen Tashi

What do you mean by "the values"? Are the values just the RBG color values of the pixels in the patch? Or is some sort of statistical analysis or edge detection etc. done and the "the values" are a summary of these statistics?

If the values are simply the RGB data for the pixels, I find it surprising that two patches of an image of grass would match well in Euclidean distance.

Since there are all sorts of texture matching techniques, you should explain what general types of methods you are willing to use to accomplish this patch matching. Must the method be simplistic and extremely fast?

You made a remark about "learning lambda" and needing "training data". Are you working on a technique that learns from exemplars or are you trying to avoid having to use such a technique?

3. Feb 21, 2012

daviddoria

Hi Stephen,

The values are just the RGB pixel values.

It does very well in most cases, but when it fails (like the case I described), it fails miserably.

Yes, simple and fast is definitely a requirement.

I would like to avoid this at all costs :)

I hope this clarifies some things - I look forward to your replies!

David

4. Feb 21, 2012

Stephen Tashi

It sounds like what you want is a method of matching patches that is based on some fundamental mathematical principle and which will work on matching any sort of patch to another patch representing the same physical phenomena. I don't think there is any such mathematical prinicple. If there were, writing image recognition software would be no problem.

I don't think there is any way of avoiding studying the properties of the particular phenomena ( grass, sides of houses, etc.) that your are trying to match.

For example, I'd think that matching techniques must depend on the image resolution. One way I can visualize an image of a patch of grass is a high resolution picture, sharp enough to show the individual stems of grass and the very dark shadows that surround them at their base. Another way I can visualize an image of a patch of grass is at a low resolution where a patch of grass looks like blobs of colors and the pixels represent an averaging together of the colors of the stems of grass and the shadows at their base. It would surprise me if two high resolution pictures of different patches of grass matched in a Euclidean metric since one might have a dark shadow in a pixel where the other had a stem. It is plausible that two low resolution images would match since each pixel is an average of colors representing different things.

Based on my own experience with images (they were most often grayscale images), a physical object like a patch of grass can produce a wide range of pixel statistics depending on the resolution of the image, the direction of the sun when the picture is taken, whether the patch of grass is in the shade of a tree or not, etc. So I hope you are only trying to match patches of grass within the same picture, not patches between two pictures taken on different days, in different weather etc.

Motivated only by my own interest in groups and transformations, I have the following thoughts. An image of a small portion of the side of a house could be turned upside down or sideways and it still might look like the side of a house. Is this true of patches of grass, to the resolution that you are dealing with them? Let's suppose it isn't.

In general, suppose that I define a small set of different transformations of a patch of pixels, T1, T2, T3, ..Tn They could be simple like "flip upside down", "rotate 90 deg clockwise" or they could be more complicated algorithms.

For a given patch A, I compute the Euclidean (or other metric) between A and itself after it is transformed by this set of transformations and a get a set of n "distances" (D1,D2,D3,D4,..Dn), I think of this as a vector or function $f_A(n)$ that has n values. View the problem of comparing patch A to patch B as the problem of comparing their respective "self-match" functions, $f_A$ and $f_B$. There are various way to compare functions. I don't see any way to say a priori which one's might work. This process makes no direction comparison between the pixels of A and the pixels of B.

5. Feb 22, 2012

daviddoria

Yep, that's what I want :) Of course there isn't a perfect one, but I thought someone here might have an idea of a better one than a simple Euclidean distance.

Assume everything is quite low resolution. For instance, this picture of the entire house/grass/road is ~500x500.

Yes, only grass from the same image at the same time of day should match with itself (this is not a symantic labeling problem, just a straight patch matching problem).

Some patches (e.g. grass in the middle of the yard) should match to other patches (more grass from the middle of the yard) no matter which way they are turned - grass is grass - we can call these "uniform patches" (the full patch is the same texture). However, if you take a patch that straddles the grass and the sidewalk for example, it very much has an orientation (it should only match patches with the grass on the same side of both patches).

I see what you're saying, but doesn't this still have pretty much the same problem? If you flip a patch vertically and compare it pixel-wise to its old self, won't that have just as much chance for "statistical accidents" as comparing two different patches pixel-wise? It still seems to be that there should be some distance function that doesn't treat the whole set of pixels as a very rigid/fixed set of values, but rather has some looser interpretation in that the pixels are random variables with some distribution. I.e. a black pixel really shouldn't match at all to a white pixel, but a green pixel should pretty much match to a slightly different green-ish pixel. Of course the Euclidean distance captures that last bit, but is very unforgiving. I could imagine something like "for each pixel, compute the difference as the min(difference to each of the pixels in a 3x3 region around the corresponding pixel in the other image)". This makes it spatially more forgiving, but this massively compounds the computational cost of computing such a value.

Or maybe just a min(EuclideanDistance, Constant) type of function? (A "truncated quadratic" type of idea)?

Again, I'm just rambling, hoping that we can scrounge up some keywords that will get me thinking in a different direction :)

David

6. Feb 22, 2012

Stephen Tashi

Every statistical method has the potential for statistical accidents. In the transform method, a bad match is as informative as a good match. There would have to be a series of statistical accidents to make the matching functions look deceptively the same.

You could histogram the distribution of the pixel values in the patches and compare the histograms. That method is completely spatially forgiving. Probably too forgiving!

How useful is the general idea of creating a vector of properties of some kind for each patch and then comparing the vectors? There would be computation cost for each patch, but to compare a pair of patches, you wouldn't be doing an operation on each pair of pixels.

Some keywords I've found useful: "texture matching", "steerable filters", "redundant wavelet bases".

7. Feb 22, 2012

chiro

Have you considered getting frequency information from the image and then using statistical techniques based on the frequency information to gauge different measures of similarity?

8. Feb 22, 2012

daviddoria

@chiro - what kind of statistical techniques? Are you suggesting just computing a Euclidean style distance between local FFT components or something like that? Doesn't this ONLY contain frequency information, and no information in the spatial domain (i.e. the colors don't have to match at all)?

@Stephen - yea, histograms are too forgiving as you suggested. The problem is that in my real problem I am not always matching full square patches, but rather odd shaped regions, and these regions change shape at each iteration (without getting into too much detail, at each iteration one of these funny shaped regions is compared to all other regions of the same shape in the image). I'll read up on those filtering terms you mentioned and see what I find.

9. Feb 22, 2012

chiro

I don't know the specifics of the relationship with the color information for image processing, but yes essentially calculate the frequency information, transform it or filter it if you have to and then use statistical techniques to guage some kind of confidence or credibility that a particular part of the image is the same as a part on another image.

Once you have identified these so called parts of the image that are statistically the same in terms of transformed frequency information, then you do whatever spatial classification you have to do.

You could use a metric of some sort and add in some uncertainty via statistical modelling and then use the metric as a basis for your density function.

The reason for using frequency information is that it is suited for statistical techniques. You can use continuous representations of frequency information and directly relate them to a pdf in an inuitive way.

It gives a way to have some kind of analytic and general methodic way to classify how 'different' an image really is at some kind of 'atomic' level. The transforms that you do with the frequency information you obtain is going to be determined on your specific application.

I haven't done any kind of image processing for a very long time now, but basically what I would be looking for is a statistical distribution for the frequency information and using that to say whether a 'part of an image' is the same as 'a part of a second image' under these conditions.

So yeah to finally answer your question, you would need to use geometric information on top of a frequency decomposition of sorts. Also it might help to do some kind of 'mip-mapping' technique with an FFT in certain ways to get the kind of spatial information you need:

http://en.wikipedia.org/wiki/Mipmap