# Information in a metric question

1. Aug 26, 2009

### wvguy8258

Hi,

I have a square grid that represents a landscape, each grid cell is forested or non-forested. I am calculating 2 different forest fragmentation metrics. Because there is a finite number of combinations of forest and nonforest cells, there is a finite number of possible values for each metric. It is likely that for one or both metrics, more than one combination of forest/nonforest cells will have the same metric value. It seems any such redundancy decreases the amount of information encoded in a metric. If on a small landscape (few cells), I calculated each pattern metric on all possible landscapes (2^# of cells), I could produce a discrete probability distribution for each metric and calculate entropy for each. Does it make sense to do this and with the result say 'the one with greater entropy contains more information about the landscape'? It seems that if each possible landscape had a unique metric value, then that metric has the maximum amount of information for that size landscape. If a metric always gave the same value, it would have an entropy of zero. If this makes sense, and please be brutal if it doesn't, how could one handle the situation where it is not possible to evaluate each possible combination of forest/nonforest cells (each possible landscape)? This will happen quite quickly as the number of cells increase. Is it possible to estimate entropy for a discrete variable using some math or monte carlo simulation? I've been reading about information theory applications for imaging, but usually they are calculating entropy within a single image based upon gray scale values. Any pertinent literature I'm missing? Also, would it be better to analyze my metric distributions using the usual variance measures instead?

2. Aug 26, 2009

### mXSCNT

Well, you could calculate entropy of your metric, and it would tell you how much information your metric contains about your original grid. But maximizing that information is not your primary concern. If your grid is nxn and your metric is the n^2 bit number formed by listing the squares of your grid as binary digits, that metric would have maximum entropy. But it would put you right back where you started, telling you no more about forest fragmentation than the original grid.

Instead of just maximizing entropy, what you really want to do is maximize the information about the degree of forest fragmentation contained in your metric, while at the same time minimizing the entropy of the metric, so that your metric contains as little extraneous information about the irrelevant features of the landscape (besides the degree of forest fragmentation) as possible.

Information theoretically you could write that like the following: let F be a random variable reflecting the "true" degree of forest fragmentation, and let G be a random variable representing your grid. You seek a random variable X = f(G), such that the entropy H(X) is minimized, and the mutual information I(X;F) is maximized.

However, since F is an abstract concept without a clear definition, I(X;F) really can't be calculated (there's no way to even approximate it).

You are better off using some intuitive idea of fragmentation, and making sure it satisfies certain reasonable properties. For example, your measure of fragmentation shouldn't change much if you increase the resolution of your grid. It should be invariant under rotations and reflections of the grid. Perhaps it should increase if you poke a non-forested hole in a forested area, or if you put a bit of forest in a non-forested area. Perhaps there are other desirable properties.

3. Aug 26, 2009

### mXSCNT

You might want to think about fragmentation at a given resolution rather than trying to give a single number for fragmentation at any scale. For example, you could have a value for fragmentation at 500 ft as the average number of connected components per grid square, calculated when the grid squares are 500 ft on a side, and have a different value for fragmentation at 20 ft or 100 ft. You could calculate the lower resolution fragmentations from your original grid by averaging groups of squares to get the lower resolution grid.

4. Aug 27, 2009

### wvguy8258

Thanks mXSCNT. Landscape ecologist talk about fragmentation varying by scale, so I'm impressed by your insight and intuition, especially if you aren't familiar with that literature.

I think I need to state my objectives more clearly this time. I'm not interested in creating a metric now, although using information theory to do that is very interesting. I'm interested more in comparing established metrics. I thought that by calculating the entropy of each I could get at the amount of information they capture. So, for example, the percentage of the landscape forested would be low entropy and therefore not informative. Your point though is well taken, if I understand, that maximizing entropy alone isn't a worthy goal. As there is information about a landscape along many dimensions, us humans are still left with determining the information we want. So, this does seem to entail defining what is wanted (what information) and then testing metrics for this or designing them to capture it. One could create landscapes according to some process and calculate a metric on it, but again you have to create landscapes according to some rule and have some measure of them to which to compare your metric to in order to see the correspondence. One approach to this is an organismal or process-based view of fragmentation. So the question is what forest disturbance/fragmentation affects a beetle of species X? This makes the idea of fragmentation a little less abstract, I think. Thanks.

Seth

5. Aug 27, 2009

### mXSCNT

Well, in that case you need to attempt to show causation or correlation between the various measures of fragmentation and the beetle species. The measure that bears the strongest relation is the most useful for that purpose. You have to work from real data, you can't cheat by generating random forest grids.

6. Aug 27, 2009

### SW VandeCarr

If you just wanted to measure fragmentation, all you need is a boundary length measure between forested and clear areas. Maximum fragmentation (longest boundary) is a checkerboard pattern and minimum fragmentation is a field divided in half by the shortest boundary line assuming there's an equal number of forested and clear grid squares.

Last edited: Aug 27, 2009
7. Aug 28, 2009

### SW VandeCarr

I think you're misunderstanding the concept of entropy/information. A grid field of n grid squares where each grid square can independently exist in one of two states with p=0.5 has 2^n possible states. Entropy measure is a logarithmic function of the the probability of any one grid state or p=1/2^n and Entropy= -k(log(p)) where k and the log base depend on the application.

The point is that given 2^n grid field states, there is only one value for entropy/information measure assuming statistical independence under a uniform distributuion. It has nothing to do with any particular state. Any measure based on on entropy will reflect this. However the total length of the boundary between the two states in the grid field will vary as a measure of fragmentation. That is, the specific length of the boundary is a single observed state, not an ensemble

Last edited: Aug 28, 2009
8. Aug 28, 2009

### wvguy8258

I was differentiating between entropy of a particular image with each being in 1 of 2 states with probability p and entropy of a measure taken on a series of such images. Does this make sense? You can look at an image created with 0.5 P of being in state 1 or 2. You could also look at a measure, like proportion in state 1. The proportion measure would have less entropy than the image as it will always be close to 0.5.

9. Aug 28, 2009

### SW VandeCarr

I'm not sure what you mean by proportion. Entropy refers to an ensemble of possible states. How are you defining your ensemble and how to do you assign probabilities to each possible state of the ensemble?

I assumed the total field area was half forest and half clear, but degree of fragmentation could vary greatly over the field as I described. Each pattern of fragmentation is one of an ensemble of possible states.

A simple proportion by itself only describes two overall states without regard to the distribution of clear grid squares and forested grid squires. Then the question is simply what the probability is that a given grid square is forested and you've already specified that. The entropy will then simply be a function of the specified proportion. That's not very interesting.

Last edited: Aug 28, 2009
10. Aug 28, 2009

### wvguy8258

I was using proportion as an example of a measure with low entropy. I'm suggesting that entropy of measures on a grid would range from low (like proportion) to maximum like setting each landscape pattern as something distinct. Other, more common measures of the pattern, would be somewhere in between, no? Such as perimeter, like you were suggesting, that would have an intermediate entropy as several different patterns could produce the same perimeter measure.

11. Aug 28, 2009

### SW VandeCarr

Yes, you can get an entropy measure for subsets of all possible patterns and some subsets have lower or higher entropy as ensembles with different numbers of possible states.

Again, assuming two equally probable states for one grid square, the entropy is -k(log(0.5)) or 1 bit if k=1 and log base 2. For 16 equiprobable states the entropy is 4 bits, etc. Note however, that we are talking about an ensemble of 16 states (four grid squares) and the entropy value applies only to a random variable over the ensemble, not to any individual pattern. Once a single state is identified, all you can properly say is that it is the output of a random variable with an entropy value of 4 bits.

Regarding the boundary length as as measure of fragmentation, the length is a measure of fragmentation, but is not a scalar measure of its information value. The longest length is the checkerboard pattern which has only one state with respect to boundary length. It follows a binomial distribution with the highest entropy values in the middle lengths. Moreover, a particular outcome once realized, has no entropy/information value since it exists with p=1, and all other outcomes exist with p=0. It's true that informal usage often overlooks this distinction and many talk about the entropy/information value of a particular outcome when they really mean the entropy value of a random variable over an ensemble.

All of this only applies for defined subsets. The entropy value for all possible states of the whole field is a single value.

Last edited: Aug 29, 2009