Normalizing histograms and Finding best fit distribution

In summary, normalizing a histogram is a process used to compare data distributions of different datasets with varying sample sizes. This is done by scaling the data to a common range. The best way to determine the appropriate number of bins for a histogram is to use the square root of the number of data points as a starting point and adjust as needed based on visual inspection. To find the best fit distribution, methods such as the method of moments and the Kolmogorov-Smirnov test can be used. Common distributions used for fitting histograms include the normal, exponential, log-normal, and gamma distributions. However, it is important to consider the assumptions and limitations of the chosen distribution, carefully select the number of bins, and account for outliers and
  • #1
shegal
3
0
I have plotted packets arriving in one second at a router. I then made histograms of the number of occurences of same number of packets in one second time window. My question is that I want to normalize these histograms. How can I do this to get probability mass functions. And then how do I check which distributions give the best fit for these histograms.
 

Attachments

  • LL_DDoS_V2_Inside.JPG
    LL_DDoS_V2_Inside.JPG
    49.6 KB · Views: 616
  • LL_DDoS_V2_Outside.JPG
    LL_DDoS_V2_Outside.JPG
    51.6 KB · Views: 511
  • LL_DDoS_V1_Inside.JPG
    LL_DDoS_V1_Inside.JPG
    50.9 KB · Views: 510
Physics news on Phys.org
  • #2
To normalize, you should express the number of packets as percent of the total, so in the end they add up to 100%.

I'd start with the lognormal distribution.
 
  • #3


Normalizing histograms involves dividing the frequency of each bin (or bar) by the total number of data points, resulting in a probability mass function (PMF). This allows for easier comparison between different histograms, as the total area under each curve will now equal 1. To normalize your histograms, simply divide the frequency of each bin by the total number of data points.

To determine the best fit distribution for your histograms, there are a few methods you can use. One approach is to visually compare the shape of your histogram to the probability density function (PDF) of various distributions. The distribution with the closest shape to your histogram may be a good fit. However, this method may not always be accurate and may require some subjectivity.

Another approach is to use statistical tests, such as the Kolmogorov-Smirnov test or the chi-square test, to compare your data to the PDF of different distributions. These tests will provide a p-value, which indicates how well your data fits the distribution. A lower p-value indicates a better fit.

Ultimately, the best fit distribution will depend on the specific characteristics of your data. It may be helpful to consult with a statistician or use statistical software to aid in this process. Also, keep in mind that the best fit distribution may not always be a perfect fit, so it's important to consider the context and limitations of your data when selecting a distribution.
 

What is the purpose of normalizing a histogram?

Normalizing a histogram is done in order to compare the data distribution of different datasets that may have different sample sizes. It allows for a fair comparison by scaling the data to a common range.

What is the best way to determine the appropriate number of bins for a histogram?

The number of bins for a histogram can vary depending on the data and the desired level of detail. One common method is to use the square root of the number of data points as a starting point, and then adjust as needed based on visual inspection of the histogram.

How do you find the best fit distribution for a given histogram?

There are various methods for finding the best fit distribution for a histogram. One approach is to use the method of moments, where the mean, variance, and skewness of the data are compared to those of known distributions. Another approach is to use the Kolmogorov-Smirnov test to compare the empirical distribution of the data to theoretical distributions.

What are some common distributions used for fitting histograms?

Some commonly used distributions for fitting histograms include the normal (Gaussian) distribution, the exponential distribution, the log-normal distribution, and the gamma distribution. These distributions are often used due to their flexibility and ability to fit a wide range of data distributions.

What are some potential issues to consider when normalizing a histogram and finding the best fit distribution?

When normalizing a histogram and finding the best fit distribution, it is important to consider the assumptions and limitations of the chosen distribution. It is also important to carefully select the number of bins and the method for determining the best fit, as these can greatly affect the results. Additionally, outliers and data skewness can also impact the accuracy of the normalization and fitting process.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
469
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
791
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
7K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
783
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • High Energy, Nuclear, Particle Physics
Replies
9
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
2K
Back
Top