Entropy of a histogram with different bin sizes

  • Context: Undergrad 
  • Thread starter Thread starter obesogen
  • Start date Start date
  • Tags Tags
    Bin Entropy Histogram
Click For Summary

Discussion Overview

The discussion revolves around the entropy of a histogram when using different bin sizes, particularly in the context of approximating an unknown probability distribution in a dynamical system. Participants explore how the choice of bin size affects the calculated entropy and seek an expression or correction for this relationship.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • Some participants propose that as the number of bins increases, entropy will also increase, but they seek a relationship that accounts for this effect.
  • One participant clarifies that they are approximating an unknown probability distribution using a histogram and are interested in determining the number of bins needed to effectively decrease entropy.
  • Another participant questions the criteria for "sufficiently" decreasing entropy, suggesting that continuous distributions have a fixed entropy that should not be influenced by bin size.
  • One participant discusses the implications of using a Markov process and how discretizing the state space could lead to sparser transition probabilities, thereby affecting entropy.
  • Another participant raises concerns about the clarity of the probability distributions involved, questioning how to define and vary the distributions to analyze entropy correctly.
  • Some participants note that the relationship between bin size and entropy may depend on the underlying distribution, with uniform distributions showing a more pronounced increase in entropy with more bins.

Areas of Agreement / Disagreement

Participants express differing views on the relationship between bin size and entropy, with no consensus on how to define or calculate the entropy in this context. The discussion remains unresolved regarding the optimal approach to analyzing entropy with varying bin sizes.

Contextual Notes

Limitations include the ambiguity in defining the probability distributions involved and the assumptions about how entropy behaves with different bin sizes. The discussion also highlights the complexity of applying concepts from continuous distributions to discrete approximations.

obesogen
Messages
10
Reaction score
0
I'm wondering if there's an expression/correction for finding the entropy of a density using histograms of different bin sizes. I know that as the number of bins increases, entropy will also increase (given a sufficient number of data points), but is there an expression relating the two? All I can find is work on finding the optimal bin size given some data.
 
Physics news on Phys.org
Any ideas at all?
 
You're question is slightly ambiguous. On the one hand, you might have a formula for a probability density f(x) and I might be trying to approximate \int f(x) ln{f(x)}dx by using the area of a finite number of rectangles. Or you might have a set of data with an unknown probability distribution and you might be trying to estimate the entropy of the unknown distribution. Or you might declare that the data defines the an empirical distribution (i.e. a random sample from the distribution will be one of the data points selected at random) and you want to know the entropy of the empirical distribution.
 
Thanks for the reply. The case is approximating an unknown probability distribution with a histogram. I'm actually trying to figure out how many 'bins' i need to discretize the state space in a dynamical system, so I am looking for the number of bins that sufficiently decreases the entropy (i.e., where the 'elbow' in a # of states vs. entropy plot would be). The problem is that adding more bins always increases the entropy, so I'm trying to find an expression to correct for this.
 
obesogen said:
i need to discretize the state space in a dynamical system, so I am looking for the number of bins that sufficiently decreases the entropy (i.e., where the 'elbow' in a # of states vs. entropy plot would be).

I don't know much about dynamical systems, so I don't know what the criteria is for "sufficiently" decreasing the entropy. If we are talking about a continuous distribution, doesn't it have a definite entropy? - not one that can be adjusted by a number of bins?

Are you trying to find the (in some sense) optimum estimate for the entropy of a continuous distribution from a given set of data?
 
Sorry for being vague. I'm looking at the entropy of P(state j | state i,action a), so for a system obeying the Markov property (which this one does), the probability distribution should become sparser and sparser as the state space is better represented, ideally being 1 for one state and 0 for the rest. The way I thought to represent this sparseness is entropy, although I am definitely open to other suggestions. So in this way, I'm really not looking at a fixed density, but I thought I could apply some ideas on histograms to this problem. As for the "sufficient" part, that was probably an abuse of terminology; I would be satisfied with finding an 'elbow' of an entropy vs. number of bins graph.
 
obesogen said:
I'm looking at the entropy of P(state j | state i,action a).

Let me see if I can understand that. I'll speculate.

We have a physical process that (in reality) has states defined by continuous variables. For example, it might be defined by (temperature, concentration of substance A, concentration of substanceB). I don't know how the "actions" are parameterized. Are they also continuous? - like add heat at the rate of 0.342 BTUs per second? The physical process is a continuous time Markov process.

I don't know how such a process is described using continuous variables. Perhaps there is a function that gives the "instantaneous" probebility of transitions to state (t1,A1,B1) with action given you are in state (t0,A0,B0) and take action F. I can see that for a fixed (t0,A0,B0) and F there is a probability distribution over all possible states (t1,A1,B1). That distribution would have an entropy. Do you want entropy as a function of (t0,A0,B0,F)?

I don't know what is meant by the "elbow".
 
Yes, you're right--my states are actually continuous. I am trying to to discretize them so I can use learning algorithms that operate on tables. That being the case, the actions I choose to be available for the agent is also a discrete set. My system is in discrete time, also.

My idea behind calculating entropy of P(state j | state i,action a) is that, since the system is a Markov process, if I have represented the state space well enough (i.e., have enough vectors in the discretization), the transition probabilities (which are in reality deterministic for the 'real' states) will become very sparse. If I underrepresent the states, I will classify two different states as the same, and consequently, my transition probabilities will not be as sparse. i.e., transition probabilities will be more 'uncertain'--have more entropy.

Ideally, the entropy would then be monotonically decreasing for more vectors added to the discretization. By 'elbow', I mean the point in a monotonically decreasing graph right before the curve begins to flatten out, i.e., point of highest curvature. Analogous to identifying the number of labels in a clustering problem by looking at # of labels vs. inter-cluster error.

The problem, though, is that the graph looks like an upside down elbow, due to entropy calculated on more bins increasing. I don't think all is lost, though, since the distribution itself determines how much entropy increases by adding more bins. For example, a uniform distribution will have its entropy increase the most when adding more bins--the relationship is linear. The more concentrated the density is, the less adding bins will affect the entropy.
 
obesogen said:
My idea behind calculating entropy of P(state j | state i,action a) is that, since the system is a Markov process, if I have represented the state space well enough (i.e., have enough vectors in the discretization), the transition probabilities (which are in reality deterministic for the 'real' states) will become very sparse.

Whoa! We can't talk about concepts invovling probability unless we keep a clear picture.

In the first place, let's clarify what probability distribution (or distributions) we are talking about. P( state j | state i, action a) is a single number for a given j,i,a. So which of these are we varying in order to create a distribution? It would seem most natural to keep i and 'a' constant while varying j. But this means you get a possibly different distribution for each i and 'a'. In that case what entropy are we concerned with? The max or min entropy taken over the whole collection of distributions defined by various i and 'a' ?

If the physical process is really deterministic then how are you introducing probability in the model? If a bin i is defined as all physical states within some volume of the (actual) state space (for example, all (t,A,B) such that t0 - dt < t < t + dt; A0 - dA < A < A0 + dA, B0 - dB < B < B0 + dB) ) then are you introducing probability by saying that given the system is in bin i, the probability of the actual state is uniformly spread over the volumne defined by the bin?

I don't understand your assertion that the entropy calculated with more bins must be increasing - partly because I don't understand what distribution(s) are involved and I don't understand how probability enters into the model. Can you give a simple example of a scenario (not necessarily from your model) where finer binning increases the entropy of a distribution?
 

Similar threads

  • · Replies 10 ·
Replies
10
Views
6K
  • · Replies 3 ·
Replies
3
Views
8K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 37 ·
2
Replies
37
Views
5K
  • · Replies 18 ·
Replies
18
Views
3K
Replies
28
Views
4K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 1 ·
Replies
1
Views
3K