# Using entropy to discover visually salient features

elpidiovaldez5
I am trying to understand how visually salient features could be discovered by unsupervised learning. I do not want to assume that we already have edge detectors or convolutional neural networks, rather I am trying to imagine how these could be discovered by observing the world.

Imagine that a camera captures grey scale images, and that these are thresholded, so that half the pixels are zero (black) and half the pixels are 1 (white). Now look at each pixel position and extract the 9 pixels in the 3x3 grid centred at that position. There are 2^9 (512) possibilities. So if all pixels were independent, the probability of observing a particular pattern would be 1/512. We could easily work out the real world probability by counting occurrences of each pattern in a large number of real world images.

Now consider a single image in which a horizontal line comprising 20 consecutive white pixels is present. We can calculate the probability of this occurring, if the pixels are independent, or using real world data. Either way it seems very unlikely that this would occur by chance, but we have to be quite careful, because ANY specific pattern of 20 pixels is very unlikely. What is special about a LINE of pixels ? (it is analagous to the unintuitive fact that a winning lottery combination of 9,999,999,999 seems very unlikely, but is exactly as likely as 6,936,125,674). It seems to me that although this arrangement is equally unlikely as many other random arrangements, the pixels in a line has lower entropy. The size of the image is also significant because it might be unlikely to find a line of 20 pixels in a 640x480 image, but much less so in a 4096x2048 image.

This is where my poorly remembered maths and physics let's me down. Is there any mathematical technique which would detect the line as a surprising anomaly ? I am thinking of Hypothesis Testing, or some kind of entropy calculation. In entropic considerations it seems like I need to distinguish 'microstates' (the pixel patterns) from 'macrostates' (a CONTIGUOUS sequence of the same state extended over a certain range). So a macrostate would define exactly which pattern of pixels are being predicted.

Am I missing something here ? I'd like to see some mathematical rigour brought to the idea. If it makes sense for lines, the same principles could be brought to bear to discover other visual features.

Staff Emeritus
I am trying to understand how visually salient features could be discovered by unsupervised learning
Do you mean so-called deep learning methods used to train neural networks? If so, then I recommend this free online book. http://neuralnetworksanddeeplearning.com/

The book uses handwritten character recognition as it's working example, that relates to visually salient features.

• berkeman
elpidiovaldez5
The book looks great, but here I am actually I am interested in a more physics based view of saliency. Neural nets, particularly Convolutional neural nets (and more recently Transformers) are well suited to detecting visual features like edges, but I am trying to get at the reason for their success, which I think may be based on entropy.

Staff Emeritus
but I am trying to get at the reason for their success, which I think may be based on entropy.
I would characterize them more as curve fitting. Cousins of this:
https://en.wikipedia.org/wiki/Least_squares

Edit: They can also be viewed as an optimization problem. Define a function that says how well the neural net performs, then tweak all the knobs to get the best that you can.

Last edited:
Homework Helper
2022 Award
If it makes sense for lines, the same principles could be brought to bear to discover other visual features.
Are you looking for a particular (known) figure embedded in the picture (where's waldo?) or asking identification of unknown but unusual patterns automatically? Please elaborate

elpidiovaldez5
I used a line as an example, but I am really interested in detecting any unusual/unlikely patterns. I am not even sure it is possible without pre-judging the types of interesting patterns, but I think it may be. e.g. the components that make up a line (edge segments) probably occur more frequently in real scenes than would be expected from a scene of random pixels. If these edge segments also occur in a continuous run of, say 20, then this is also unlikely. Any specific pattern of 20 edge segments would be equally surprising, but continuous, directed runs occur frequently in natural scenes, whereas other specific patterns would be observed rarely. How could I design an algorithm to discover these anomalies ?

Homework Helper
2022 Award
These are all very good questions but not one of them is simple. I have seen estimates that 20% of our "brainpower" is devoted to these kinds of questions so that should be an indicator.
The adjudication of what is "unusual" is possibly very difficult. My suggestion would be to first concentrate on the straight (edges?) of a minimum length. This seems like a reasonable research project. What is the most efficient way to flag very linear features? One immediate plus is that a linear feature viewed at an angle (without optical aberration) is still linear.

You do understand that this is a more than lifetime of work?

I used a line as an example, but I am really interested in detecting any unusual/unlikely patterns. I am not even sure it is possible without pre-judging the types of interesting patterns, but I think it may be. e.g. the components that make up a line (edge segments) probably occur more frequently in real scenes than would be expected from a scene of random pixels.

I think you must judge what patterns show lines in order to implement that idea. And you have pre-judged that such patterns are significant and that they should be lumped together as the event "image contains a line".

Regarding the question in the title of your post, Shannon entropy is a property of probability distributions. In your example, a single image is a one outcome of a probability distribution. The outcome has a probability but it doesn't have a defined entropy. Similarly, the occurrence of some event defined by a set of outcomes (e.g. outcomes that show a line) is an event that has a probability. If we want to say the event "has an entropy", we need to extend the definition of "entropy".

If an event has probability ##p##, we could assign it the number ##p \ln(p)## and call that number the entropy of the event. That would not be conventional terminology. However, I've see such a thing done. In the publication

### The maximum entropy formalism : a conference held at the Massachusetts Institute of Technology on May 2-4, 1978​

(which I haven't found online and I don't have my copy handy) there is a paper about finding where to drill for oil. The author used the distribution of various types of strata to define an entropy value for wells that had already been drilled. He then plotted "entropy contours" as a guide where to drill next.

• hutchphd and Jarvis323