Recognitions:

Practical detection of "similarity"?

The world needs new mathematical ideas in order to solve the problem of making general purpose image recognition algorithms. What are some new ideas for approaching the problem of determining if two objects are "similar"?

There are techniques for detecting precise forms of "similarity" between mathematical objects (for example, "similar triangles", "homomorphic groups".) None of these are very useful for detecting the type of similarity that we see between objects in nature such as two leaves on the same tree or the grain on one area of a board vs the grain in another area.

People interested in image recognition have developed techniques for texture detection and recognition. Most are statistical. I have never seen one with broad applicability.

In the case of two leaves from the same tree, if we consider a 2D outline of each leaf as a curve, I suppose there are conformal mappings that take one to the other. However, from a practical point of view, this approach begins with a fallacy - namely it assumes that it will be possible to process a typical image to compute bounding curves for such objects. Actual images have occusions where one object obscures part of another. Edge detection methods often fail to detect portions of edges and the subjective ways to fill-in missing edges must be tweaked for particular collections of images.

It would please me if effective image recognition algorithms depended on fairly low level ideas, In the total problem of image recognition there must be some dependence on relatively sophisticated knowledge. For example, depth perception (in the sense of the eyes perceiving distance due to seeing different images in each eye) is only effective to about 20 ft or so. So when you "see" a car parked in the distance in front of a telephone pole, it is higher level knowledge about the world that tells you the car is probably in front of the pole instead of the pole being something that sprouts out of the roof of the car. I wonder if the problem of recognizing the similarity between two leaves or two patches of wood grain also requires higher level knowledge.
 Recognitions: Gold Member I think you are looking for a 3D fingerprint. Fingerprint recognition is based on similarity, rather than precise measurements.
 Recognitions: Science Advisor How about using fractal dimension to compare, say, leaves from the same tree, or different coastline? Or, maybe, we may need more than one number to compare.

Practical detection of "similarity"?

Well the basic way for describing similarity mathematically is through a norm.

The real challenge is figuring out what norm to use, and also in a statistical context, how to tie these norms to estimators for the difference: particularly in relation to the "variance" of the actual differences (usually easier to look at the sum of the squares of these residual terms).

The issue becomes decomposition, not so much the analysis of general norms and with regard to this, we shift to signal analysis.

The actual nature of decompositions are found in the study of harmonic and fourier analysis and things like fingerprints and detection are found in the integral transforms of wavelets.

Comparing frequencies and other norm decompositions as opposed to simply checking the differences between raw data values (like color changes in a texel: i.e. a texture element as its called) is where a lot of the new thinking is coming in.

But the reality is that many many valid decompositions exist and it's important to look at what characteristic each basis vector of the decomposition actually encapsulates intuitively as opposed to simply mathematically.

In signal analysis, we apply frequency bands and pre-defined signal models to get the real information from a noisy signal with some known noise model.

In terms of your question, the next thing will be to build the model of a particular representation under a specific decomposition and compare the two models under the same decomposition.

You use both approaches: you may specify extra knowledge about a particular representation and then compare the other representation in terms of its model fit with the parameters of the decomposition in the first model, or you can use a general decomposition and look at how the residuals change between the two representations (observed and expected, in the same analog as a chi-square goodness of fit).

The other thing to look at is the generalization of any dependencies between co-effecients of the decomposition of a representation, and how this affects the estimator and subsequently the hypothesis test of similarity.

For example we might allow in our model two coeffecients to have a dependency that is statistically very significant and impose further tests of this as a hypothesis test before moving on to accept other tests of similarity (like the residual test boundaries of the sums of all residuals of coefficients of the integral transform).

We actually do this anyway for model fitting via F-tests in regressions: the exact same analog in terms of a general integral transform for an arbitrary signal has the same interpretation and use. You simply come up with a way to compare residuals for each component and see based on these whether you have enough similarity.

And of course in the F-test, we are just comparing residuals and these are simply a special kind of norm (the 2-norm as it's commonly called), but you can always transform this norm just like you transform x to be f(x).
 Recognitions: Science Advisor For image recognition, the problem with many "old ideas" in mathematics is that they are based on additive composition and decomposition of functions or "signals". In images, it is relatively rare that objects superimpose to present anything like an additive result. The most common phenomena in images is occulsion where one opaque object partly obscures another. Some people call this occulsion "noise" but it is not noise in the additive sense. From what I know about fractal dimensions, the use of that kind of analysis would asume the ability to determine what part of an image was "the object" and this is the primary unsolved problem! For example it's possible to investigate the fractal nature of the 2D outline of a leaf, but that assumes you have the outline.
 Decomposition for anything is always linearly additive if the basis vectors are orthogonal and the information corresponding to the co-effiecients is mutually exclusive. If you have dependencies, then you incorporate them in your basis or you transform your image so that you go to a new space that offers a basis that has a more natural set of properties to be "additive". This happens all the time: for example in support vector machines, data is transformed by taking it to a higher space so that it can be linearly separated. This happens when you are trying to classify sets of data particularly when the classification is complex: so what you do is you transform it to a higher dimensional space and then use this to create a linear classifier (i.e. a plane). The thing is looking at the choices for bases, the original structure of the data you have and the transforms that make sense for the context of the application. Data mining deals with this issue and the literature provides many ways of doing this for different types of data of which "image data" is just one of the infinitely many ways.

Recognitions:
 Quote by chiro Decomposition for anything is always linearly additive if the basis vectors are orthogonal and the information corresponding to the co-effiecients is mutually exclusive.
To say you have a vector space pre-supposes you have a phenomena that is represented by superpositions, so (of course!) additive decomposition makes sense in such a case. The problem with applying that thought to practical image processing is that common images are generated by phenomena that are not superpositions. If you see a car is parked in front of a bush, the car may cover up part of the bush. The part of the bush that is covered up isn't going to contribute anything to your picture no matter how you transform the image to new data structures.

If vector spaces were the key to image recognition, twenty plus years of people attempting to write general purpose image recognition algorithms would have succeeded by now.

 Quote by Stephen Tashi To say you have a vector space pre-supposes you have a phenomena that is represented by superpositions, so (of course!) additive decomposition makes sense in such a case. The problem with applying that thought to practical image processing is that common images are generated by phenomena that are not superpositions. If you see a car is parked in front of a bush, the car may cover up part of the bush. The part of the bush that is covered up isn't going to contribute anything to your picture no matter how you transform the image to new data structures. If vector spaces were the key to image recognition, twenty plus years of people attempting to write general purpose image recognition algorithms would have succeeded by now.
It's not just vector spaces: these form the abstract ideas. The real idea is in the actual choice of basis as mentioned above.

Also you are trying to decompose things based on your own intuition and trying to divide things up into mutually exclusive parts that a spatial: frequency decompositions are one way deal with the issues of having a lot of entangled information in one region and "un-entangling" it with regard to how the frequency information is treated.

The other thing that you are also forgetting is that we can process things relative to a lot of other information where-as the data for an individual picture is more or less in a vacuum.

Also everything is a superposition in some sense: a superposition admits a decomposition and everything can be decomposed (even the trivial decomposition).

Again the point doesn't have to do with the abstract notion of a vector space, although the frame-work allows people to construct orthogonal basis, do projections and so on which is very powerful: the thing of interest is the basis itself and any transformations that are used. You also need to use a relative context with your pictures just like we use a relative context to infer things that are incomplete (like the bush that barely exists).
 Recognitions: Science Advisor I suppose a subjective debate might inspire some productive thoughts, so I won't duck it. Mentioning the need for new mathematics to solve any extremely sophisticated problem reveals a spectrum of attitudes. For example: 1. The problem is already solved. Don't you read existing literature? i think there's a Excel function that does that. 2.. To solve the problem, we don't need any new mathematics or new computers. It's just a matter of detail. All we need is billions of dollars, hundreds of programmers and an intense effort like the Manhatten Project. 3. We don't need any new mathematics. All we need are new computers. We need massively parallel processing, quantum computers, stuff like that. (Perhaps we'll need the billion dollars and the Manhattan project too, but I'll let you know about that after I get the computers.) 4. We don't need any new computers. If we discover the right math, I could solve this on a 286. In the matter of image recognition, my stance is: For the mathematical hobbyist, outlook 4 is obviously the most interesting choice. It isn't necessarily the correct choice, but neither are the others. Outlook 4 is something an individual might work on without a billion dollar budget or a new computer. Outlook 3 is obviously most interesting to people interested in new computers. Outlook 2 is nice for math people who don't really want to solve the problem. You just say there's no interesting work for them to do. Outlook 1 is good for people who measure progress in terms of academic papers or academic projects. There must be hundreds of papers published on the subject of computer vision and they report hundreds of successes in limited enviroments. In limited environments , outlook 1 is correct - not for Excel (as far as I know), but some robots have a form of "computer vision". To me, the talk of using vector spaces and frequency domain analysis is just huff-and-puff since it hasn't panned out. I understand how a person with a different outlook might have faith in such things. They could see current literature as a record of steady progress instead of a record of marking time.
 Friend 1: "Look at those two brirds" Friend 2: "What about them?" Friend 1: "They are similar". Friend 2: "No they are not, I took a picture of the two on my iPhone and I used Stephen_Tahi's app to mathematically show that they are not similar". Friend 1: "I just meant they are similar in color, not in size or shape" Friend 2 makes adjustments on his phone. After a minute or two "Yes, I show with 97.4% certainty that they are similar in color."

 Quote by Stephen Tashi To me, the talk of using vector spaces and frequency domain analysis is just huff-and-puff since it hasn't panned out. I understand how a person with a different outlook might have faith in such things. They could see current literature as a record of steady progress instead of a record of marking time.
You've missed the point (and I have said this explicitly).

Frequency analysis is but one decomposition: there are lots of decompositions depending on the kind of information you are looking for and how that basis ends up giving that information.

The mathematicians have already done a lot of the hard work theoretically by proving all the inner product, projection and other results for signals and they have given these to the engineers and scientists of the world.

Also frequency analysis is a powerful tool for general signals in specific ways so what you saying is completely untrue and misleading.

With the theorems that we have been given by the mathematicians, our job is to find the right basis or transformation: not to prove Hilbert-Space theory.

The technique (recognize its relationship with the word "technology"?) is always the important thing and a particular decomposition method or classification method is always where the solution will lie.

Frequency analysis is one of way looking at data that has entangled features (i.e. entangled characteristics that are located in the same signal region like a sound or an image) and un-entangling those through frequencies: it's one way, not the only way.

The real trick is how you represent something and this is always the case in any kind of analysis.

You should already be aware of this for example with proofs: transforming an object to something else for proving something is commonly used. We transform things to something else to get it to the point where the property or thing we are trying to prove is closer to the structure of the object itself.

It's not just mathematics: this is how humans think. They take things and transform them until they get the characteristics that they want and then use that.

For example: when someone wants to understand language as a tourist and they have their native language (say English) and the things are in the foreign language (say German). When the tourist reads a sign (say "Nicht Abdecken" or say "Don't cover") then the first thing they try and do is transform the German phrase into an English one and then interpret the English phrase. So we have an attempt to transform one

This is how analysis works on all levels and trying to analyze image features in some context is no different.

Recognitions:
 Quote by chiro With the theorems that we have been given by the mathematicians, our job is to find the right basis or transformation: not to prove Hilbert-Space theory.
I agree that further developements in Hilbert space theory are unlikely to aid computer vision. I disagree that the problem of computer vision will be solved by people who are hunting for the right vector space. However, that' just my intuitive forecast. I don't claim certainity about what the mathematics of effective computer vision will look like. Let's assume we are looking for a "basis or transformation" - for the purpose of the detecting similarity (as mentioned int he original post) or, if you like, for the more complicated tasks of computer vision. What do we do?

There is the statistical approach where you prepare a large library of "training images", each with the desired output you want the algorithm to produce. Then you apply statistical methods. I think this is effective for creating algorithms that work on the large library of training images and don't work on much else. The same can be said for other training approaches.

I'm actually enthusiastic about training approaches being part of the solution, but I don't think they will work unless we understand what structure to build into the thing that is being trained. Neural nets, radial basis networks, and other structures can be trained to do remarkable things, but if you developing an algorithms with a large number of parameters it isn't surpirising that you can do a dandy curve fit. We need insight about how to decompose the process into individual functions. It interesting that very little is written about function decomposition in a statistical setting. To be slightly provocative, I don't see decomposing a general (i.e. non-linear) function into a composition of other functions as being a vector space problem.

 Quote by Stephen Tashi I agree that further developements in Hilbert space theory are unlikely to aid computer vision. I disagree that the problem of computer vision will be solved by people who are hunting for the right vector space. However, that' just my intuitive forecast. I don't claim certainity about what the mathematics of effective computer vision will look like. Let's assume we are looking for a "basis or transformation" - for the purpose of the detecting similarity (as mentioned int he original post) or, if you like, for the more complicated tasks of computer vision. What do we do? There is the statistical approach where you prepare a large library of "training images", each with the desired output you want the algorithm to produce. Then you apply statistical methods. I think this is effective for creating algorithms that work on the large library of training images and don't work on much else. The same can be said for other training approaches. I'm actually enthusiastic about training approaches being part of the solution, but I don't think they will work unless we understand what structure to build into the thing that is being trained. Neural nets, radial basis networks, and other structures can be trained to do remarkable things, but if you developing an algorithms with a large number of parameters it isn't surpirising that you can do a dandy curve fit. We need insight about how to decompose the process into individual functions. It interesting that very little is written about function decomposition in a statistical setting. To be slightly provocative, I don't see decomposing a general (i.e. non-linear) function into a composition of other functions as being a vector space problem.
Well what you do to incorporate statistics is that you have estimators for the difference of components of a particular projection and use this as a way of doing a hypothesis test in the same way you check for the difference of means with a t-test.

The idea of training something is assessing similarity but in a statistical sense: This is what a lot of these data mining algorithms do anyway: they use distance and other analogues to measure differences, but then statistical theory is brought in to manage the uncertainty.

With this you basically just throw a computer tonnes of signals and the algorithms used just determine with some decomposition and statistical procedures if something is "significantly similar".

I think I should have made this clear and I apologize for not doing so: I did not mean that you simply use norms in a deterministic sense, but use them in the context of probability and statistics: You can have estimators that are applied to give a probabilistic interpretation.

The idea of not using statistics for analysis nowadays is not a smart decision since (thankfully) the movement from deterministic to probabilistic has given us a way to not only deal with variation, but also with noise and it's not wise to assume that there is no noise or some kind of uncertainty in the model itself (important but subtle to consider the noise in the model as opposed to the noise in the signal).

Recognitions:
 Quote by chiro Well what you do to incorporate statistics is that you have estimators for the difference of components of a particular projection and use this as a way of doing a hypothesis test in the same way you check for the difference of means with a t-test.
That's a very general description and the only way I can visualize it is as the usual sort of experiment in a very limited environment where you are trying to accomplish a specialized task like finding a dog in a picture. Is there some more fundamental way to approach it? - something that has some hope of being general purpose?

 The idea of not using statistics for analysis nowadays is not a smart decision since (thankfully) the movement from deterministic to probabilistic has given us a way to not only deal with variation, but also with noise and it's not wise to assume that there is no noise or some kind of uncertainty in the model itself (important but subtle to consider the noise in the model as opposed to the noise in the signal).
I agree with the importance of statistics. There are many academic papers relating fourier analysis to problems of computer vision. However, I'm skeptical of the applicability of the signal-and-noise metaphor. I'm particularly skeptical of frequency domain analysis. The way I look at it frequency domain analysis is that its wonderful when you want to take a phenomena where superposition works (like electromagnetic signals). You pick a basis based on frequency (it can be frequency in time or in space or some other dimension of the data) and you represent things in that basis. But in a typical scene of daily life, objects are not obeying the superposition principle. Objects hide each other. Also, I don't think interpreting an average quality image from a modern digital camera is really a problem of dealing with "noise" - not noise in the sense of some additive disturbance that is superimposed on the image. (The type of image where "noise" is a believable problem would be an image of store dispaly window where there are reflections from things outside the store on the window.)

Perhaps this is too simplistic, but I think of it this way: Suppose the picture shows a car parked in front of bush so it obscures part of the bush. If you represent this the pixels in the image as some sort of superposition of pixels in "component pictures" then the part of the picture that car-that bush has the "signal" of being a car, not the signal of being part car and part bush. So the car isn't additive noise when you're trying to identify bushes.

Supose we go to the frequency domain and do spatial fourier transforms. Then the part of the image that is car hiding bush, in a manner of speaking, does have combined effect with the part of the image that is all bush and the part of the image that is car-not-hiding bush. But everything in the whole image is part of this combined effect. So it's hard to believe that this is a good general purpose method of isolating particular things.

Wavelets are a more plausible approach since they limit the spatial influence of a given area of the image. However, the garden variety Wavelet is senstive to changes in images that human vsion regards as minor. For example if you move the car a foot forward you change a boundary between car and bush. Fourier analysis is also sensitive to such changes. In an effort to combat sensitivity, one can resort to "steerable filters" where the basis, in a manner of speaking, tries to adjust its position to fit the image. (At leaset this involves somewhat new mathematics.) I'm not aware of any breakthroughs in the applicability of the above techniques to image recognition - or even to the problem of similarity detection mentioned in the original post.

 Quote by Stephen Tashi That's a very general description and the only way I can visualize it is as the usual sort of experiment in a very limited environment where you are trying to accomplish a specialized task like finding a dog in a picture. Is there some more fundamental way to approach it? - something that has some hope of being general purpose?
Well the problem really is classification and data mining looks at this in general ways.

Because there are so many ways to classify something and because humans are used to optimizing classifications for particular purposes, the actual thinking required can be foreign for this kind of purpose and thus it's made harder.

This example is highlighted by yourself: you automatically constrained the criteria as a specific thing like a dog. You also talked about specific concepts like occlusion and a bush.

This is not an attack on yourself: this is just how humans are geared to think, and its basically the paradox of analysis in general.

Analyzing something means you have to not only structure it, but break it down. It also means that you need to create a finite-description of something that is otherwise potentially infinite. This means that when you have information, you take something and break it down so that it's manageable for you to work with.

All humans do it and in fact, all the standard models of computation require this with the idea that you have memory which is divisible and in the CPU you have registers that is like a form of "working memory" (I know this isn't 100% accurate but bear with me). We do computations on things that have a finite amount of information, and we store the results in something that has a finite amount of information.

If you want to look at general approaches, you have to resist the need to instantly classify and think about ways to "think more like an autistic person" than "think like a non-autistic person". Austistic people have a propensity to take in tonnes of information while non-autistic people have a tendency to filter most of the information and get something that has cleaned a lot of the detail.

Now, to answer your question, the thing must be that the decomposition is determined by the algorithm itself and that the statistical framework is used to determine how to accept or reject notions of "similarity" through general estimators regarding whether there is a difference between either coeffecients or whether functions of coeffecients are statistically significantly different from 0.

The thing is not to explicitly state the decomposition, but to let the algorithm derive and define that is within statistical limits.

The estimators would be non-parametric (like say a median over a mean), and one approach to this is via what you termed before as a "neural network". The network itself is changing the structure of the data when the network itself changes and how it even uses the data for analysis.

Remember, that you have to take away the idea of trying to force your own constraints and instead just lose them almost like how an autistic person can do this (but unfortunately have a lot of trouble with scenarios of extreme stimulation resulting in situations where they are overwhelmed with stimuli that others would simply filter out early).

 I agree with the importance of statistics. There are many academic papers relating fourier analysis to problems of computer vision. However, I'm skeptical of the applicability of the signal-and-noise metaphor. I'm particularly skeptical of frequency domain analysis. The way I look at it frequency domain analysis is that its wonderful when you want to take a phenomena where superposition works (like electromagnetic signals). You pick a basis based on frequency (it can be frequency in time or in space or some other dimension of the data) and you represent things in that basis. But in a typical scene of daily life, objects are not obeying the superposition principle. Objects hide each other. Also, I don't think interpreting an average quality image from a modern digital camera is really a problem of dealing with "noise" - not noise in the sense of some additive disturbance that is superimposed on the image. (The type of image where "noise" is a believable problem would be an image of store dispaly window where there are reflections from things outside the store on the window.)
The reason why I mentioned frequency analysis is that this analysis is able to take apart information that is entangled, like things that are spatially entangled (i.e. multiple things that are not topologically simple but overlapping in the same region). This is one of the main points of using frequency decompositions and it's the reason why for example you can do amazing things with audio like remove noise and clicks and still get a nice result.

It's not solely for noise: it's a general purpose way to handle decompositions of signal data where you don't have simply connected pieces of information in a constrained spatial location.

Unfortunately though, this is not intuitive for a lot of sensory work especially when it comes to "forced analysis" because again while the intuition is able to discern these details, the use of "forced analysis" ends up looking at simple ways like "dividing things spatially" or using "hierarchical division" instead of looking at a more "natural" way like through a frequency analysis (or analog).

It's not intuitive to use frequency analysis when you are trying to do a forced analysis, and it doesn't help when the brain is filtering out all these details so that you are able to actually analyze (again: the example of the computer).

 Perhaps this is too simplistic, but I think of it this way: Suppose the picture shows a car parked in front of bush so it obscures part of the bush. If you represent this the pixels in the image as some sort of superposition of pixels in "component pictures" then the part of the picture that car-that bush has the "signal" of being a car, not the signal of being part car and part bush. So the car isn't additive noise when you're trying to identify bushes.
Again you have filtered out most of the actual data in order to make sense of what is going on: it's not a fault of you, it's just how a lot of people think. You are dividing the image into pixels and thus this is automatically constraining things that you would otherwise miss: it's the product of a particular choice of analysis.

Also the other thing that you take for granted is what I would call "relative memory". This is something that we take for granted, but a computer may not (unless it has been trained extensively).

Humans have relative memory in many instances: it's not all representations are the same. Relative memory means that you can use the relationships of all things in order to classify.

This idea that information is in a vacuum is not right: All information is relative. In fact all of language is relative: for example in order to define something, you need to define its complement. If something doesn't have a complement, it can't be defined because there is nothing to compare it to. In order for mathematics to even make sense you need variation: without variation and the ability to compare, you can't have mathematics.

You have all this information that is completely relative to something else and we just take for granted that all of this even exists so that we are able to differentiate between a "sloppy A" and a "printed A". The point is not what the A is, or even means: the point is that the A has relativity to everything else and what that relativity actually is in terms of analysis.

I think you're making the mistake that the decomposition itself has to be isomorphic to all the elements being spatially divisible (i.e. you look at the superposition with regard to completely isolated areas like groups of pixels rather than the entirety of the image itself as an undivided entity). The decomposition itself does produce information that is mutually exclusive in its final form (i.e. basis vectors are orthonormal and thus independent), but that does not mean that it deals with information that is spatially simple.

 Supose we go to the frequency domain and do spatial fourier transforms. Then the part of the image that is car hiding bush, in a manner of speaking, does have combined effect with the part of the image that is all bush and the part of the image that is car-not-hiding bush. But everything in the whole image is part of this combined effect. So it's hard to believe that this is a good general purpose method of isolating particular things.
Again, you are analyzing based on looking at things whereby the information you consider is topologically a simple region and all things are mutually exclusive (i.e the way we "atomize" information and structures): the fourier transform takes an entangled form of the entire signal and extrapolates information that describes the nature of these entanglements.

The best way to think about this is think of what a hologram looks like in terms of the representation, and how it is possible to analyze such a representation.

You can't look at things in a divided way: the underlying fundamental structure is completely one and indivisible.

 Wavelets are a more plausible approach since they limit the spatial influence of a given area of the image. However, the garden variety Wavelet is senstive to changes in images that human vsion regards as minor. For example if you move the car a foot forward you change a boundary between car and bush. Fourier analysis is also sensitive to such changes. In an effort to combat sensitivity, one can resort to "steerable filters" where the basis, in a manner of speaking, tries to adjust its position to fit the image. (At leaset this involves somewhat new mathematics.) I'm not aware of any breakthroughs in the applicability of the above techniques to image recognition - or even to the problem of similarity detection mentioned in the original post.
The point again is to consider the relativity of things rather than some hard criteria.

The whole thing of comparison is deciding what the language is that describes the comparison. Once a language is found, and an actual sentence corresponding to the two things (or more) that are going to compared, then one looks at what the language actually corresponds to in terms the representation of the structure that is stored in the neural network (or some other analog).

The language itself is created and the general approach is to consider how you would define the norm between two "representations" and how a non-parametric estimator can be used to ascertain a boundary for rejection or acceptance: (i.e. a stimulation or absence thereof).
 Recognitions: Science Advisor Chiro, I can't translate any of your remaks into a practical method, but I appreciate hearing them. Sometimes debating vague abstract viewpoints can spark ideas. However, in this post I'm only interested introducing more of them. When the idea of "symmetry" comes up in mathematics, it is related to ideas about groups. The idea of "similarity" can ve viewed (at least, by me) as a sort of generalized symmetry. An utterly simplistic viewpoint is that two things are "similar with respect to the Group G" if they are in the same orbit. (i.e. G is set of transformations and if some transformation in G moves thing A to thing B then the two things are "similar".) "Frequency analysis" is also related to groups. It is possible to generalize harmonic analysis by using basis functions other than sines and cosines. These other types of bases can be related to particular groups. (When I concentrate hard, I can actually understand how this is done! It has an unnatural feel to it. The functions that you decompose are functions whose domain is the group. (A group element is a transformation defined by a set of parameters and the functions will be functions of those parameters.) But in pratical aplicaitions, you have some sort of set of points and the group is a set of transformations that move these points around. Your interest is functions on the set of points, not functions of the group elements. So you must define some association between a point that is moved and the transformation that moves it.) If I reason just by manipulating words, the practical detection of similarity should have something to do with groups. In the type of similarity that I'm asking about (such as the similarity of two grain patterns in wood) there need not be a precise 1-to-1 way to transform one of the patterns into the other. For the spirit of group theory and harmonic analysis to prevail in this problem, something must give. The most conventional approach is to leave group theory and generalized harmonic analysis the way it is and fiddle with the data. This approach would define some functions (perhaps statistics) that would be used to map each grain pattern to a vector of numbers that summarized it. Then two different-but-similar grain patterns might always get mapped to same vector and you don't need the group theory and harmonic analysis for anything! If two similar-but-different grain patterns get mapped to different vectors then you could try to find a group that put these vectors in the same orbit. I'm not optimistic about the conventional approach. That's just my intuition. I think it's been tried hundreds of times. The other alternative is to seek some further generalization of the idea of a group or the idea of harmonic analysis on a group. There are lesser mathematical objects than groups (such as semi-groups) but that sort of generalization seem to lose more than we gain by making it. I don't understand if wavelets are closely related to group theory or if they are a more-generalized form of harmonic analysis on groups.
 I'm not saying the way we do it is optimal either: I'm only saying that the way we classify and deconstruct information is the key thing: the Hilbert-Space theory tells us how to construct orthogonal bases and use that for decomposition and recomposition, but it doesn't tell us what bases we should choose: this is the job for us. The relativity statement simply says that information is never in vacuum: it is all relative. All description characteristics are relative to some universal descriptive characteristic: you can't treat any bit of information in a vacuum even a whole picture. We take for granted the great context we have: we have a lot of redundant information that we call upon and we take all of this for granted. The idea of always reducing information to get of rid of redundancy can sometimes be counter-productive because the redundancy can give not only context, but a means of relating seemingly different things that would be otherwise missed when something is subjected to a reductionist information approach. With regards to harmonic analysis, the reason for its importance is that it can take a signal and decompose it in a way that gets a decomposition representing "spatially or temporally" intertwined characteristics. A comparison of something that does not do this is the Haar wavelet: This wavelet is entirely spatially and temporally localized: You can get rid of specific co-effecients without changing the nature of the areas not in the immediate-neighbourhood but you can't do this with a Fourier basis: if you take away one coeffecient it changes the whole thing globally and personally I see this as an advantage and not a disadvantage. This is why things like noise removal and powerful filtering techniques are possible and to me it really is an impressive manner of design: the idea of creating structures that are completely inter-twined with each other and not spatially or temporal isolated (like the way most analysis is done). But its not intuitive to do this for a lot of people because we always seek to classify and that usually translates to divide things orthogonally in a way that we break up the object in a kind of "Haar-wavelet" categorization rather than a "Fourier" way of doing this. So the two main suggestions: information is not in a vacuum and decomposing things in the "Haar way" can be detrimental. In terms of the comparison, once a basis and a proper set of transformations are found then the methods of analysis and statistics can be used. It can be statistical where you have random variables and you use the same sorts of techniques that is used in linear models to statistically reject the transformations of coeffecients lying in some region (i.e. a hypothesis test) or you could do it the deterministic way (simple norm comparison and a special case where the variance of these coeffecients is zero). However, I mentioned another way that is harder to grasp but a lot more powerful and that relies on an approach where the actual decomposition is completely implicit and this is the form of a neural network. A neural network builds its on decomposition of the data based on what it is trained on: it's also the reason why relativity of information is important because when we do analysis, we often forget that we have very highly trained and refined ways of looking things as a result of a lot of prior activities of analysis on many things. The decomposition of the the structures and the process of comparison is encoding in the network itself and since it is dynamic and can change not only with respect to the value of the weights, but also the topology, it means that not only the decomposition itself can change, but the transformations of weights, and the criteria for comparison and classification itself is completely dynamic. These things are often over-looked when doing analysis because we are projecting a tonne of knowledge that has been trained and refined into a few concepts in the way that you might project a hundred thousand dimensional vector on to say a four dimensional vector: it's not saying that the four-dimensional vector isn't useful but it is saying that you are going to lose a lot of information in the process.