I wanted to find the Information Entropy (Shannon Entropy) for a given image. I basically wanted to study the effects of image compression on the Shannon entropy for a given image. I am not sure as to how to go about it.

To do this, the Message Space is the image itself. Entropy is given as the uncertainty associated with a random variable. The random variable in my case would be a quantitative measure of the pixel. The measure could be:

For now, let us consider Luminance. I plan to write a small program for this and need help with the logic. What I first do is, I make a list of all the luminance values the pixels take and associate each luminance value with the no. of times it occurs in the image. Basically, the luminance is a random variable X. Let's say the list is something like:

Once I've done that, I find the self-information set for X. The set is basically the negative log (to some base) of the probability for each item in X. The luminance set would now look something like:

[tex]
H = -\sum_{i=1}^n {p(x_i) \log_b p(x_i)}
[/tex]

where [itex]x_i[/itex] are the elements of the set X and [itex]\log_b p(x_i)[/itex] are the elements from the set L described above. This should give me H, which is the information entropy associated with this image. Am I doing this right? Are there any other suggestions you might like to give?

What you're written seems correct, for estimating the entropy of the luminence of each pixel. But I think that if you multiply this by the number of pixels in the image, the total entropy you'll get will be quite large compared to the sizes of modern image formats. This is for two reasons: one is that efficient image coders do not code the luminences individually, but in blocks (in entropy terms, this means that it's not the entropy of individual pixels that counts, but whole collections of luminences in a nearby region).

The other reason is that entropy measures how many bits it takes to describe the image losslessly. Using perceptual methods, however, it is possible to build lower-rate lossy image coders where the loss is not perceivable. There should exist some underlying "perceptual entropy" of a given image, but it is not the sort of thing that you can calculate directly from the data statistics unfortunately.

Also, I'm assuming that you use more than 4 luminence values in your experiments, and just shortened it to that for the example you include? Because if you're quantizing luminence that coarsely prior to entropy coding, the result is going to look pretty bad.

I will be using full-colour photographs, which will have around all of the luminance values on the Lab scale. I am using a range of 1000 Luminance values on my scale [from 0.000 - 1.000], and for a 1024x768 photograph, there are 1.44 million pixels, so I think i'll have a huge set of luminance values.

Also, could you please explain as to why you said: "if you multiply this by the number of pixels in the image". Could you please elaborate on the significance of multiplying the entropy by the no. of pixels?

Also, could you guide me as to what is the 'entropy rate' in information theory?

Well, the entropy calculation that you've done gives the average number of bits required to represent the luminence value from a single pixel. If you want to encode an entire image, however, you need to encode ALL of the pixels. So, if you design, say, a Huffman code that represents each pixel's luminence, the total number of bits will be the single-pixel entropy times the number of pixels. Schemes that encode multiple pixels together require less total bits, but more complicated estimation and code design.

The entropy rate is an extension of the concept of entropy from a random variable to a random process. The regular entropy is defined for a single random variable, and tells you the number of bits needed, on average, to represent the value of the r.v. The entropy rate refers to situations where you have an entire random process (i.e., a sequence of r.v.'s that goes on for ever). In this case, the entropy rate is the number of bits you need to encode each new element in the sequence, given all the previous ones. Another way to think of this is that it's the amount of new information (on average) contained in each new element in the random process.

So, you mean to say, calculating the entropy rate would be an ideal measure for say, a video.. as in i calculate the shannon entropy for each frame and then the instantaneous entropy rate would be [itex]ER = H_i \times \frac{1}{fps}[/itex] where, [itex]H_i[/itex] is the entropy of the frame and fps is the frames per second of the video.

However, I am not able to understand a particular application of Entropy rate. From wikipedia:

Yeah, any signal that goes on for a long (or even indefinite) time: video, radio broadcast, whatever. Even finite sequences can be thought of this way, if they're really long. For example, if you were to view an image as a really long (one-dimensional) array of values.

Not quite. First, you don't normally bother normalizing the entropy rate to be in units of seconds, so you wouldn't include the fps term. Second, the entropy rate is a property of the process as a whole: you don't normally consider the "instantaneous" rate, but rather the average over the entire process.

They take long strings of English text and build probabalistic models of it, and then use those to estimate the entropy rate (just like you're doing with images). In the simplest case, they model it as an i.i.d. process (like you're doing with luminence values) and just estimate the probability of each letter occuring. Then the entropy rate is just equal to the entropy of a single letter (which is substantially higher than the 1-1.5 bit figure). More complicated methods model the text as a process with stronger dependencies, and so estimate things like the probability of pairs of letters, or of entire words. The entropy rate corresponding to these more accurate models is lower. As your model gets more and more complicated, it also gets more and more accurate, and the associated entropy rate decreases towards the "true" entropy rate of the underlying process.