Entropy of a histogram with different bin sizes

In summary: You are calculating the entropy of the probability distribution of the system's state j given the system's state i and the action a. If you have a good enough representation of the state space, the entropy should decrease as the number of bins you use to represent the state space decreases.
  • #1
obesogen
10
0
I'm wondering if there's an expression/correction for finding the entropy of a density using histograms of different bin sizes. I know that as the number of bins increases, entropy will also increase (given a sufficient number of data points), but is there an expression relating the two? All I can find is work on finding the optimal bin size given some data.
 
Physics news on Phys.org
  • #2
Any ideas at all?
 
  • #3
You're question is slightly ambiguous. On the one hand, you might have a formula for a probability density [itex] f(x) [/itex] and I might be trying to approximate [itex] \int f(x) ln{f(x)}dx [/itex] by using the area of a finite number of rectangles. Or you might have a set of data with an unknown probability distribution and you might be trying to estimate the entropy of the unknown distribution. Or you might declare that the data defines the an empirical distribution (i.e. a random sample from the distribution will be one of the data points selected at random) and you want to know the entropy of the empirical distribution.
 
  • #4
Thanks for the reply. The case is approximating an unknown probability distribution with a histogram. I'm actually trying to figure out how many 'bins' i need to discretize the state space in a dynamical system, so I am looking for the number of bins that sufficiently decreases the entropy (i.e., where the 'elbow' in a # of states vs. entropy plot would be). The problem is that adding more bins always increases the entropy, so I'm trying to find an expression to correct for this.
 
  • #5
obesogen said:
i need to discretize the state space in a dynamical system, so I am looking for the number of bins that sufficiently decreases the entropy (i.e., where the 'elbow' in a # of states vs. entropy plot would be).

I don't know much about dynamical systems, so I don't know what the criteria is for "sufficiently" decreasing the entropy. If we are talking about a continuous distribution, doesn't it have a definite entropy? - not one that can be adjusted by a number of bins?

Are you trying to find the (in some sense) optimum estimate for the entropy of a continuous distribution from a given set of data?
 
  • #6
Sorry for being vague. I'm looking at the entropy of P(state j | state i,action a), so for a system obeying the Markov property (which this one does), the probability distribution should become sparser and sparser as the state space is better represented, ideally being 1 for one state and 0 for the rest. The way I thought to represent this sparseness is entropy, although I am definitely open to other suggestions. So in this way, I'm really not looking at a fixed density, but I thought I could apply some ideas on histograms to this problem. As for the "sufficient" part, that was probably an abuse of terminology; I would be satisfied with finding an 'elbow' of an entropy vs. number of bins graph.
 
  • #7
obesogen said:
I'm looking at the entropy of P(state j | state i,action a).

Let me see if I can understand that. I'll speculate.

We have a physical process that (in reality) has states defined by continuous variables. For example, it might be defined by (temperature, concentration of substance A, concentration of substanceB). I don't know how the "actions" are parameterized. Are they also continuous? - like add heat at the rate of 0.342 BTUs per second? The physical process is a continuous time Markov process.

I don't know how such a process is described using continuous variables. Perhaps there is a function that gives the "instantaneous" probebility of transitions to state (t1,A1,B1) with action given you are in state (t0,A0,B0) and take action F. I can see that for a fixed (t0,A0,B0) and F there is a probability distribution over all possible states (t1,A1,B1). That distribution would have an entropy. Do you want entropy as a function of (t0,A0,B0,F)?

I don't know what is meant by the "elbow".
 
  • #8
Yes, you're right--my states are actually continuous. I am trying to to discretize them so I can use learning algorithms that operate on tables. That being the case, the actions I choose to be available for the agent is also a discrete set. My system is in discrete time, also.

My idea behind calculating entropy of P(state j | state i,action a) is that, since the system is a Markov process, if I have represented the state space well enough (i.e., have enough vectors in the discretization), the transition probabilities (which are in reality deterministic for the 'real' states) will become very sparse. If I underrepresent the states, I will classify two different states as the same, and consequently, my transition probabilities will not be as sparse. i.e., transition probabilities will be more 'uncertain'--have more entropy.

Ideally, the entropy would then be monotonically decreasing for more vectors added to the discretization. By 'elbow', I mean the point in a monotonically decreasing graph right before the curve begins to flatten out, i.e., point of highest curvature. Analogous to identifying the number of labels in a clustering problem by looking at # of labels vs. inter-cluster error.

The problem, though, is that the graph looks like an upside down elbow, due to entropy calculated on more bins increasing. I don't think all is lost, though, since the distribution itself determines how much entropy increases by adding more bins. For example, a uniform distribution will have its entropy increase the most when adding more bins--the relationship is linear. The more concentrated the density is, the less adding bins will affect the entropy.
 
  • #9
obesogen said:
My idea behind calculating entropy of P(state j | state i,action a) is that, since the system is a Markov process, if I have represented the state space well enough (i.e., have enough vectors in the discretization), the transition probabilities (which are in reality deterministic for the 'real' states) will become very sparse.

Whoa! We can't talk about concepts invovling probability unless we keep a clear picture.

In the first place, let's clarify what probability distribution (or distributions) we are talking about. P( state j | state i, action a) is a single number for a given j,i,a. So which of these are we varying in order to create a distribution? It would seem most natural to keep i and 'a' constant while varying j. But this means you get a possibly different distribution for each i and 'a'. In that case what entropy are we concerned with? The max or min entropy taken over the whole collection of distributions defined by various i and 'a' ?

If the physical process is really deterministic then how are you introducing probability in the model? If a bin i is defined as all physical states within some volume of the (actual) state space (for example, all (t,A,B) such that t0 - dt < t < t + dt; A0 - dA < A < A0 + dA, B0 - dB < B < B0 + dB) ) then are you introducing probability by saying that given the system is in bin i, the probability of the actual state is uniformly spread over the volumne defined by the bin?

I don't understand your assertion that the entropy calculated with more bins must be increasing - partly because I don't understand what distribution(s) are involved and I don't understand how probability enters into the model. Can you give a simple example of a scenario (not necessarily from your model) where finer binning increases the entropy of a distribution?
 

1. What is entropy in the context of a histogram with different bin sizes?

Entropy in this context refers to the measure of randomness or disorder in the distribution of data within a histogram. It is a statistical concept that quantifies the uncertainty or unpredictability of the data.

2. How does the bin size affect the entropy of a histogram?

The bin size directly affects the entropy of a histogram. A smaller bin size results in a more detailed and granular histogram, with lower entropy. On the other hand, a larger bin size leads to a smoother and less detailed histogram, with higher entropy.

3. Can the entropy of a histogram be negative?

No, the entropy of a histogram cannot be negative. It is always a positive value, as it measures the amount of randomness or disorder in the data distribution. A value of 0 indicates no randomness, while a higher value indicates higher levels of randomness.

4. How is entropy calculated for a histogram with different bin sizes?

Entropy is typically calculated using the Shannon entropy formula, which takes into account the probability of each bin in the histogram. It is calculated by summing the products of the probability and logarithm of probability for each bin.

5. What insights can be gained from analyzing the entropy of a histogram with different bin sizes?

Analyzing the entropy of a histogram with different bin sizes can provide insights into the level of randomness or predictability in the data distribution. It can also help identify any outliers or unusual patterns in the data. Additionally, comparing the entropy of different histograms can aid in understanding the impact of bin size on the distribution of data.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
10
Views
3K
  • Set Theory, Logic, Probability, Statistics
2
Replies
37
Views
4K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
7K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
18
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
28
Views
3K
  • High Energy, Nuclear, Particle Physics
Replies
9
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
3K
Back
Top