Understanding Probability Density Functions and Their Properties

Click For Summary

Homework Help Overview

The discussion revolves around understanding probability density functions (PDFs) and their properties, particularly in the context of continuous random variables. Participants explore the definitions, significance, and mathematical relationships involving PDFs, means, and medians.

Discussion Character

  • Conceptual clarification, Mathematical reasoning, Assumption checking

Approaches and Questions Raised

  • Participants raise questions about how to find PDFs and their necessity. They discuss the meanings of mean and median in relation to PDFs and question the correctness of provided formulae. Some participants offer insights into the relationship between PDFs and cumulative distribution functions (CDFs), while others express uncertainty about the formal definitions and operational understanding of these concepts.

Discussion Status

The discussion is active, with participants providing detailed explanations and questioning assumptions. Some guidance has been offered regarding the mathematical relationships between PDFs and CDFs, but there is no explicit consensus on all points raised. Multiple interpretations and approaches are being explored.

Contextual Notes

Participants acknowledge limitations in their understanding of formal definitions related to PDFs and CDFs, and there is a recognition of the operational approach some are taking. The discussion includes references to external resources for further exploration.

Leo Liu
Messages
353
Reaction score
156
Homework Statement
N/A
Relevant Equations
##\int f(x) \cdot{x} \mathrm{d} x##
My questions are as follows:
1. How do we find them and why do we need them?
2. What are the meanings of the mean and the median of a PDF? Are the formulae below correct?
$$\int_{a}^{median} f(x) \mathrm{d}x = \int_{median}^{b} f(x) \mathrm{d}x$$
$$\int_{a}^{mean} f(x) \cdot x \mathrm{d}x = \int_{mean}^{b} f(x) \cdot x \mathrm{d}x = \frac 1 2$$

Thank you.
 
Last edited:
Physics news on Phys.org
The idea is that the probability density function of a continuous random variable ##X##, ##f_X(x)##, is a probability per unit increment of ##x##. If ##f_X(x)## is constant within a certain interval, then the probability of the result being in that interval is just the probability per unit increment multiplied by the width of the interval, ##P = f_X(x) \Delta x##

If ##f_X(x)## is now a continuously varying function, you can imagine making ##\Delta x## small and summing lots of incremental probabilities in order to get the total probability of the result being within a certain interval $$P(x_1 \leq X \leq x_2) = \sum_{i=1}^n f_X(x_i) \Delta x$$ where ##x_i = x_1 + (i-1)\Delta x## as well as the usual definition of ##\Delta x = \frac{b-a}{n}## for ##n## strips for a Riemann sum. If you make the increments very small, i.e. ##\Delta x \rightarrow dx##, this becomes an integral $$P(x_1 \leq X \leq x_2) = \int_{x_1}^{x_2} f_X(x) dx$$ Now for your question. The median of a continuous random variable is where the cumulative probability up to that value of ##x## is 0.5, so if ##X## takes values in ##[a,b]##, $$\int_a^{\text{median}} f_X(x) dx = \int_{\text{median}}^{b} f_X(x) dx = 0.5$$ The expectation (similar to mean) of a continuous random variable ##X## also follows from the discrete case, which is $$E(X) = \sum_i p_i x_i$$ which translates into the continuous arena as $$E(X) = \int_a^b xf_X(x) dx$$ where ##[a,b]## is the whole range of possible values the c.r.v. can take.

Expectations are evaluated over the whole range of possible values of ##X##, and the cumulative probability up to the expectation is not (necessarily) 50%. So the second line of formulae you quote are incorrect.
 
Last edited by a moderator:
  • Like
  • Love
Likes   Reactions: sysprog, WWGD, archaic and 2 others
etotheipi said:
The idea is that the probability density function of a continuous random variable ##X##, ##f_X(x)##, is a probability per unit increment of ##x##. If ##f_X(x)## is constant within a certain interval, then the probability of the result being in that interval is just the probability per unit increment multiplied by the width of the interval, ##P = f_X(x) \Delta x##

If ##f_X(x)## is now a continuously varying function, you can imagine making ##\Delta x## small and summing lots of incremental probabilities in order to get the total probability of the result being within a certain interval $$P(a \leq X \leq b) = \sum_{i=1}^n f_X(x_i) \Delta x$$ where ##x_i = a + (i-1)\Delta x## as well as the usual definition of ##\Delta x = \frac{b-a}{n}## for ##n## strips for a Riemann sum. If you make the increments very small, i.e. ##\Delta x \rightarrow dx##, this becomes an integral $$P(a \leq X \leq b) = \int_{a}^b f_X(x) dx$$ Now for your question. The median of a continuous random variable is where the cumulative probability up to that value of ##x## is 0.5, so if ##X## takes values in ##[a,b]##, $$\int_a^{\text{median}} f_X(x) dx = \int_{\text{median}}^{b} f_X(x) dx = 0.5$$ The expectation (similar to mean) of a continuous random variable ##X## also follows from the discrete case, which is $$E(X) = \sum_i p_i x_i$$ which translates into the continuous arena as $$E(X) = \int_a^b xf_X(x) dx$$ where ##[a,b]## is the whole range of possible values the c.r.v. can take.

Expectations are evaluated over the whole range of possible values of ##X##, and the cumulative probability up to the expectation is not (necessarily) 50%. So the second line of formulae you quote are incorrect.

I think this is good intuition, but the context in which this works is larger (for example ##f_X## must not be continuous, but merely measurable).
 
  • Like
Likes   Reactions: etotheipi
Math_QED said:
I think this is good intuition, but the context in which this works is larger (for example ##f_X## must not be continuous, but merely measurable).

I admit I know little about how such objects are formally defined, my understanding on this topic is limited to quite an operational approach. In my (limited) experience, thinking of ##f_X(x) dx## as a probability and relating this to the discrete case helps me to get a hold on what's going on.

Apologies if I butchered the maths!
 
Aha, thank you for your detailed answer!

etotheipi said:
The idea is that the probability density function of a continuous random variable ##X##, ##f_X(x)##, is a probability per unit increment of ##x##. If ##f_X(x)## is constant within a certain interval, then the probability of the result being in that interval is just the probability per unit increment multiplied by the width of the interval, ##P = f_X(x) \Delta x##
I would like to know if this explains why PDF is the derivative of the CDF because if ##\Delta P = f_X(x) \Delta x##, then it can be shown that ##\lim_{\Delta x \to 0} \frac {\Delta P(x)} {\Delta x}##.
Also, if we already knew the CDF, why would one want to find its PDF since we can calculate the probability by subtracting ##y_1## from ##y_2##?

etotheipi said:
The expectation (similar to mean) of a continuous random variable X also follows from the discrete case
I see - it is just expectation. Would you minding telling me why mathematicians don't just use expectation in this case?
 
Leo Liu said:
I would like to know if this explains why PDF is the derivative of the CDF because if ##\Delta P = f_X(x) \Delta x##, then it can be shown that ##\lim_{\Delta x \to 0} \frac {\Delta P(x)} {\Delta x}##.

The CDF of a c.r.v. which takes values in the interval ##[a,b]## is defined as $$F_X(x) = \int_a^x f_X(x') dx'$$ If we take the derivative of this function w.r.t. ##x##, the fundamental theorem of calculus gives us
$$\frac{dF_X(x)}{dx} = \frac{d}{dx} \int_a^x f_X(x') dx' = f_X(x)$$ which is the desired result. Your intuition is correct, since the PDF is really just the rate of change of the cumulative probability w.r.t ##x##.

Leo Liu said:
Also, if we already knew the CDF, why would one want to find its PDF since we can calculate the probability by subtracting ##y_1## from ##y_2##?

For finding probabilities, yes, it's sufficient to have the CDF. But there is a lot more you can do with c.r.v.'s, a lot of which is formulated in terms of the PDF. For instance, to find the expectation or variance, you need to use the PDF in the various integrals.

Being able to switch between them is also important. If you have a c.r.v. e.g. ##X##, and want to find the distribution of ##Z = X^2##, a common approach is to go via the CDF. In this case, (for simplicity, let's suppose ##X## takes only positive values): $$F_Z(z) = P(Z<z) = P(X^2 < z) = P(X < \sqrt{z}) = F_X(\sqrt{z})$$ and then you can differentiate w.r.t. ##z## to find the PDF of ##Z##.

Leo Liu said:
I see - it is just expectation. Would you minding telling me why mathematicians don't just use expectation in this case?

I'm not sure what you mean by this. The easiest way to think about it is that for a given set of numerical data (i.e. you have already taken a sample from the distribution) you can calculate a mean. Whilst if you haven't got any actual measurements yet you are instead calculating the expectation of the c.r.v. The two concepts are similar, but quite distinct. For a probability distribution, it is the expectation that we usually talk about in this context.
 
Last edited by a moderator:
  • Like
  • Informative
Likes   Reactions: sysprog and Leo Liu
Re expectation, it is considered a measure of center, to understand where the values tend to aggregate*. In symmetric distributions such as the normal , mean and median ( and mode) coincide. If /when they don't, the distribution is skewed.

*Mean may not be reflective of center if there are outliers. If this last is the case, the median is used.
 
  • Like
Likes   Reactions: etotheipi

Similar threads

  • · Replies 19 ·
Replies
19
Views
4K
  • · Replies 2 ·
Replies
2
Views
4K
  • · Replies 14 ·
Replies
14
Views
2K
  • · Replies 105 ·
4
Replies
105
Views
10K
  • · Replies 14 ·
Replies
14
Views
2K
  • · Replies 11 ·
Replies
11
Views
2K
Replies
6
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K