Probability Density Function: Converting Experimental Observations to PDF

Click For Summary

Discussion Overview

The discussion revolves around the concept of converting experimental observations of a continuous random variable into a probability density function (PDF). Participants explore methods for creating a PDF from data, particularly through the use of histograms and normalization techniques. The context includes practical applications in data science and statistics.

Discussion Character

  • Exploratory
  • Technical explanation
  • Homework-related

Main Points Raised

  • One participant seeks clarification on how to convert experimental observations into a PDF, specifically asking for a small example.
  • Another participant mentions the normal distribution as a relevant concept.
  • A suggestion is made to create a histogram and normalize the frequencies by the total sample size to approximate the PDF, emphasizing the importance of appropriate cell range selection.
  • A later reply reiterates the histogram method and provides a link to a dataset, asking for guidance on creating a PDF and understanding the meaning of the y-axis in the context of probability density.
  • One participant outlines a rough process for creating a PDF using statistical software, mentioning specific steps such as determining the range of data, dividing it into sub-ranges, counting data points, and plotting results.
  • There is a question about the representation of the y-axis in the PDF, indicating a need for further clarification on this aspect.

Areas of Agreement / Disagreement

Participants generally agree on the method of using histograms to approximate a PDF, but there is no consensus on the specific software tools or the interpretation of the y-axis in the PDF.

Contextual Notes

Limitations include the dependence on the choice of software for creating the PDF and the need for appropriate bin sizes in histograms, which may affect the approximation of the continuous PDF.

naveendeveloper
Messages
2
Reaction score
0
TL;DR
I am not able to understand how to convert an experiments observation of continuous random variable into probability density function
Hi All
I am currently doing Master in data science. I came across the function PDF probability density function which is used to find cumulative probability(range) of a continuous random variable.
The PDF probability density function is plotted against probability density in y-axis and Random variable in x axis.
I am not able to understand how to convert an experiments observation of continuous random variable into probability density function
Kindly help me understand with a small example
Thank you
 
Last edited by a moderator:
Physics news on Phys.org
Do you know about the normal distribution, for example?
 
Make a histogram and divide every frequency number by the total sample size. That will approximate the PDF. The histogram cell range should be set appropriately so that there are enough samples in them so that they do not jump up and down too much and also so that there are not too few cells to approximate the continuous PDF..
 
FactChecker said:
Make a histogram and divide every frequency number by the total sample size. That will approximate the PDF. The histogram cell range should be set appropriately so that there are enough samples in them so that they do not jump up and down too much and also so that there are not too few cells to approximate the continuous PDF..
Hi
Thank you so much on your explanation. I have attached an excel sheet of height of 100k employees in the following link https://docs.google.com/spreadsheets/d/142Ay2BOh5rOd1weO4f7Jbe2-roYoTDRo/edit?usp=sharing&ouid=116301201506347494587&rtpof=true&sd=true
Kindly can you help me understand how to create the PDF by creating histogram and normalising its area to 1 ( just the logic to do that would be really helpful)

One other query, after creating the PDF the y-axis Probability density what does it represent

Thanks
Naveen
 
The steps would depend a lot on what statistics software package you are using. I like R, which is free, well respected, and well documented. R has a function, densityplot, that does it. I don't know what is available in EXCEL.
If you are doing it all yourself, this is a rough description of the process.
1) get the range of the height data, heightMin & heightMax.
2) divide the range evenly into some number of sub-range cells (with 1000 data points, try 20 cells as a first attempt and adjust if desired)
3) count the number of data points in each cell
4) convert the cell counts into fractions by dividing by the total number of data points (1000 in your example)
5) plot the results.

Have you had any classes in probability and statistics? The probability density function shows the fraction of results that would have certain values.
 
Last edited:

Similar threads

  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 4 ·
Replies
4
Views
1K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 11 ·
Replies
11
Views
3K
  • · Replies 25 ·
Replies
25
Views
10K
  • · Replies 1 ·
Replies
1
Views
4K