Logarithmic Binning: Guide & Reference | Physics Forums

In summary, the author is trying to use a logarithmic binning of data in order to estimate the exponent of a power law, but finds that the histogram becomes inconsistent for small x-values. He eventually solves the problem by using a linear binning and a logarithmic binning, respectively.
  • #1
grquanti
17
0
Hello everybody,

I have a problem with the logarithmic binning of some data (which are expected to be distributed as a power law). I found this https://www.physicsforums.com/threads/exponential-binning.691834/
What "mute" says is exactly what I need: equally spaced bins on a logscale to estimate the exponent of a power law via the histogram. However, doing what he says doesn't make me obtain equally spaced bins.
However, the real problem is that I can't find anywhere a reference to study that makes me understand how to do an histogram with logarithmic binning.
Can someone help me with some suggestion or reference?

thanks!
 
Physics news on Phys.org
  • #2
In the linked thread, the right side of the equation should have k instead of the logarithm of k.
 
  • #3
ok, now it make sense, but it still not perfect. The content of the first bins, when divided for the bin lenght, becomes higher than the right value.

https://arxiv.org/pdf/1011.1533.pdf Here the author says that is convenient to use a normal binning until the noise becomes relevant (but doesn't explain why). In this way it works.
Can someone explain me why or give me some reference about all that stuff?
Thank's.
 
  • #4
grquanti said:
The content of the first bins, when divided for the bin lenght, becomes higher than the right value.
What do you compare with what?I would use an unbinned fit, but it is hard to tell more without a better description of your problem.
 
  • #5
I compare with the histogram I obtain when the bins are equally spaced (really equally spaced, not in logscale).
The problem is more or less this: when plotting the histogram on a logscale with equally spaced bins, I have a straight line up to a certain value of x. Going over that value the fluctuations become relevant and I have no more a straight line, covering the data many y-values in a little x-interval (this is due to the fluctuations and to the logscale: in the interval 1000-10000 I have a very big number of fluctuating points in a x-unitary segment).
To avoid this problem I do the logarithmic binning: in this way I have always the same number of points in a x-unitary segment, being the bins more and more long. This allow me to have the straight line also on the tail of the histogram. But, I repeat, for too small x I have a value which is not consistent with the value of the histogram without log-binning.
I'm new in the field, but as I know it is a standarnd way of working (you will understand surely if take a look at the paper I posted: my non-log-binned histogram is like the one they present in figure 3).
Thank's.
 
  • #6
grquanti said:
But, I repeat, for too small x I have a value which is not consistent with the value of the histogram without log-binning.
What do you mean by "consistent"? What is inconsistent?
grquanti said:
(you will understand surely if take a look at the paper I posted: my non-log-binned histogram is like the one they present in figure 3).
I understand figure 3, but I don't understand what you get if you don't show it, or at least describe it clearly. But showing it is much better.
 
  • #7
upload_2017-4-14_17-10-37.png

As you can see in the log-binned case I have an overstimation of the frequency for small x.
The article I posted says something very general, whitout explanations, that is: "data are best left unbinned for small x"
I think the behaviour I obtain is due to the fact that when I divide for the bin size, It's smaller than one. However I don't think it's a good
justification because for a small bin I should have a few data in it, so this two facts should balance themselfes. However, do you know anything can help me?
Thanks.
 
  • #8
Are your x-values integers? Then you can run into the problem where you have a bin "between 0.8 and 1.1", for example, which gets all the content of the "1"-bin, but a bin width that is too small. Keeping the linear bin width avoids this problem.
 
  • #9
Yes, this is the case: I have integer variables.
In fact, I found the best way to do the histogram is using a linear binning with unitary bin length until the fluctuations becomes relevant and then smoothing them via the logarithmic binnig. I found this also in the case of a non integer variable, but in this case I needed to divide for the bin length also the data with constant bin lenght. In this way I have unitary area under my histogram.
Do you think there is anything wrong in my method?
 
  • #11
thanks for all!
 

1. What is logarithmic binning?

Logarithmic binning is a data binning method used in scientific research to group data points based on their logarithmic values. This is useful for visualizing and analyzing data that covers a wide range of values, as it can help to reduce the effects of outliers and make patterns more apparent.

2. How is logarithmic binning different from linear binning?

Linear binning groups data points into equal-sized bins, while logarithmic binning groups data points based on their logarithmic values. This means that logarithmic binning can better handle data that varies greatly in magnitude, as it can create smaller bins for smaller values and larger bins for larger values.

3. What are the advantages of using logarithmic binning?

Logarithmic binning can help to reduce the effects of outliers and make patterns more apparent in data that covers a wide range of values. It also allows for better visualization and analysis of data with exponential or power-law relationships.

4. How do you choose the number of bins for logarithmic binning?

The number of bins for logarithmic binning can be chosen based on the desired level of granularity in the data. It is important to ensure that there are a sufficient number of data points in each bin for accurate analysis. It may also be helpful to use a binning algorithm that automatically determines the optimal number of bins.

5. Can logarithmic binning be applied to all types of data?

Logarithmic binning is most commonly used for data that follows an exponential or power-law distribution. However, it can also be applied to other types of data, such as non-linear relationships or data with a wide range of values. It is important to consider the nature of the data and whether logarithmic binning is appropriate for the specific analysis being conducted.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
7K
  • Set Theory, Logic, Probability, Statistics
Replies
18
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
3K
Replies
4
Views
7K
  • Calculus and Beyond Homework Help
Replies
2
Views
1K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
1
Views
12K
  • General Math
Replies
8
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
2K
Back
Top