Choosing a Probability Distribution for Visualizing Discrete Data Sets

  • Context: Graduate 
  • Thread starter Thread starter Somefantastik
  • Start date Start date
  • Tags Tags
    Data Discrete Set
Click For Summary
SUMMARY

This discussion focuses on visualizing discrete data sets using probability distributions, specifically through histograms and cumulative distribution function (CDF) plots. The user seeks guidance on normalizing data and fitting a probability density function (PDF) to a discrete dataset. It is confirmed that normalization can occur before creating a histogram, allowing for the y-axis to represent probabilities. The process involves calculating the frequencies of distinct values and dividing them by the total number of measurements to estimate probabilities.

PREREQUISITES
  • Understanding of discrete probability distributions
  • Familiarity with histogram and cumulative distribution function (CDF) concepts
  • Basic knowledge of data normalization techniques
  • Experience with data visualization tools (e.g., Matplotlib or Seaborn)
NEXT STEPS
  • Learn how to create histograms using Matplotlib in Python
  • Explore normalization techniques for discrete datasets
  • Study how to fit probability density functions to discrete data
  • Investigate the differences between discrete and continuous probability distributions
USEFUL FOR

Data analysts, statisticians, and anyone involved in visualizing discrete datasets who seeks to enhance their understanding of probability distributions and data normalization techniques.

Somefantastik
Messages
226
Reaction score
0
I have a discrete set of data. I'd like to visualize it probabilistically. Unfortunately, I focused in Num Methods in grad school and am very weak in Probability. Where is a good place to start to visualize this data set using a discrete pdf?

I know a histagram is good to show # of occurrences for each outcome. I also know a cdf plot shows the probability of the outcome being less than some number. But when I start looking at plotting pdf's, there are many functions to choose from and I'm not sure how to go about choosing one, or translating that to a discrete data set rather than a continuous one.
 
Physics news on Phys.org
mkay so I know that I should make a histogram, normalize the histogram, and then fit a curve to the distribution then.

Now my question is, can I normalize my data before making a histogram, and will that process give me the probabilities on the y-axis?
 
Yes, you can normalize before making a histogram. Suppose, for instance, that you have N measurements, which come as n distinct values x_1, x_2, ..., x_n with frequencies f_1, f_2, ..., f_n. The frequencies are positive integers that add up to N. If you divide each frequency by N, you now have (estimated) probabilities p_i for each x_i that add up to 1. When you make your histogram you'll be binning the x_i, and you get the probability of that bin by adding up all the p_i that go in it. That's the estimated probability of falling into that bin.
 

Similar threads

  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 5 ·
Replies
5
Views
4K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 6 ·
Replies
6
Views
4K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 8 ·
Replies
8
Views
5K
  • · Replies 14 ·
Replies
14
Views
4K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 9 ·
Replies
9
Views
3K