Proving the product rule using probability

Click For Summary

Discussion Overview

The discussion revolves around proving the product rule using concepts from probability theory, specifically through the analysis of cumulative distribution functions (CDFs) and probability density functions (PDFs) of independent random variables. Participants explore various approaches to the proof, including direct derivations and intuitive arguments, while also questioning the implications of using probabilistic functions.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant presents a proof of the product rule using the CDFs of independent random variables, leading to the conclusion that the PDF of the maximum of two variables can be expressed as a combination of their individual PDFs.
  • Another participant questions whether the proof's reliance on probability limits its applicability to only probabilistic functions.
  • A different participant provides a direct proof of the product rule, emphasizing the independence of the random variables involved.
  • Concerns are raised about deriving the PDF without first establishing the CDF, with requests for clarification on the approach taken in the initial proof.
  • Some participants discuss the cleverness of calculating the PDF by considering probabilities in small regions around a point, while acknowledging potential double counting issues that diminish in significance.
  • One participant draws a parallel between the probabilistic argument and a standard intuitive argument for the product rule using geometric reasoning with rectangles.
  • Another participant elaborates on the intuitive argument, providing a mathematical expression for the change in the product of two functions and discussing the neglect of higher-order terms.

Areas of Agreement / Disagreement

Participants express differing views on the implications of using probabilistic functions in the proof, with some agreeing on the validity of the approach while others raise concerns about its limitations. The discussion remains unresolved regarding the broader applicability of the proof and the necessity of deriving the PDF from the CDF.

Contextual Notes

Some participants note that the functions involved are increasing functions between 0 and 1, and discuss the implications of shifting and scaling these functions. There are also mentions of potential double counting in the probability calculations, which could affect the accuracy of the PDF derivation.

Office_Shredder
Staff Emeritus
Science Advisor
Gold Member
Messages
5,706
Reaction score
1,592
I thought this was kind of a cool proof of the product rule.

Let ##F(x)## and ##G(x)## be cumulative distribution functions for independent random variables ##A## and ##B## respectively with probability density functions ##f(x)=F'(x)##, ##g(x)=G'(x)##. Consider the random variable ##C=\max(A,B)##. Let ##H(x)## be the cumulative distribution function of ##C##, with pdf ##h(x)=H'(x)##. Then ##h(x) = f(x)G(x) + F(x)g(x)##. If ##C=x##, it's because either ##A=x## and ##B\leq x## or ##B=x## and ##A\leq x##. But since ##A## and ##B## are independent, to both be smaller than ##x##, the probabilities just multiply. So the cdf is simply ##F(x)G(x)##. Therefore ##\frac{d}{dx} \left(F(x)G(x)\right) = F'(x)G(x) + F(x)G'(x)##.

That's pretty much it! I thought it was kind of neat and wanted to share it.
 
  • Like
Likes   Reactions: vela, FactChecker, etotheipi and 2 others
Physics news on Phys.org
Having done the proof via probability, does that limit its scope to only probabilistic functions?
 
  • Like
Likes   Reactions: member 587159 and FactChecker
The direct proof is simple enough ##H(x)=P(C\le x)=P(A\le x, B\le x)=P(A\le x)P(B\le x)=F(x)G(x)## The last step uses independence of ##A## and ##B##.
 
I wondered how you deduced ##h(x)## without going via. the CDF first? Usually to derive the PDF of the ##\text{max}## function I would have thought you say, assuming independence, ##H(x) = P(C \leq x) = P(A \leq x) P(B \leq x) = F(x)G(x)## and then make use of the product rule to find ##h(x)##.

But for this proof we need to do it backward, i.e. already know what ##h(x)## is. So is there another way of obtaining ##h(x)##? Thanks... sorry if I missed the point ?:)!
 
Last edited by a moderator:
jedishrfu said:
Having done the proof via probability, does that limit its scope to only probabilistic functions?

Technically here F and G are increasing functions between 0 and 1. But the derivative scales with multiplication by a constant, and is invariant under constant shifts, and also is only a property of the function locally. So I think it's easy to prove given any F and G, you can shift them and maybe multiply them by -1 and then redefine them everywhere except for the area where you want to calculate the derivative to make the functions match the requirements.

mathman said:
The direct proof is simple enough ##H(x)=P(C\le x)=P(A\le x, B\le x)=P(A\le x)P(B\le x)=F(x)G(x)## The last step uses independence of ##A## and ##B##.

Where do you prove the product rule for calculating derivatives here? I think I'm using the fact you posted here in my proof.

etotheipi said:
I wondered how you deduced ##h(x)## without going via. the CDF first? Usually to derive the PDF of the ##\text{max}## function I would have thought you say, assuming independence, ##H(x) = P(C \leq x) = P(A \leq x) P(B \leq x) = F(x)G(x)## and then make use of the product rule to find ##h'(x)##.

I think the point here is basically I calculate h by being clever. There pdf can be interpreted as in a small area ##\Delta x## around ##x##, the probability ##A## is in that region is ##f(x)\Delta x##. Similar for ##B## and ##g(x)##. Then if ##C## is in that ##\Delta x## region, the probability it's because ##A## is in that region and ##B##is smaller is ##G(x)f(x) \Delta x##. And a similar formula for the other way around. Adding them up gives the probability that ##C## is in the region. There's a small issue where I have double counted some of the times where both ##A## and ##B## are both in the region, but the chance of that is proportional to ##(\Delta x)^2## so goes to zero fast enough it doesn't contribute to the probability density function.
 
  • Like
Likes   Reactions: etotheipi
Office_Shredder said:
I think the point here is basically I calculate h by being clever. There pdf can be interpreted as in a small area ##\Delta x## around ##x##, the probability ##A## is in that region is ##f(x)\Delta x##. Similar for ##B## and ##g(x)##. Then if ##C## is in that ##\Delta x## region, the probability it's because ##A## is in that region and ##B##is smaller is ##G(x)f(x) \Delta x##. And a similar formula for the other way around. Adding them up gives the probability that ##C## is in the region. There's a small issue where I have double counted some of the times where both ##A## and ##B## are both in the region, but the chance of that is proportional to ##(\Delta x)^2## so goes to zero fast enough it doesn't contribute to the probability density function.

I think this is pretty similar to a standard intuitive argument for the product rule (not using probabilistic language). Consider a rectangle with side lengths ##f(x)## and ##g(x)##, so its area is ##f(x)g(x).## Change ##x## by a small amount ##\Delta x## and apply the same reasoning.
 
  • Like
Likes   Reactions: etotheipi
Yep, totally agree with that.
 
Office_Shredder said:
I think the point here is basically I calculate h by being clever. There pdf can be interpreted as in a small area ##\Delta x## around ##x##, the probability ##A## is in that region is ##f(x)\Delta x##. Similar for ##B## and ##g(x)##. Then if ##C## is in that ##\Delta x## region, the probability it's because ##A## is in that region and ##B##is smaller is ##G(x)f(x) \Delta x##. And a similar formula for the other way around. Adding them up gives the probability that ##C## is in the region. There's a small issue where I have double counted some of the times where both ##A## and ##B## are both in the region, but the chance of that is proportional to ##(\Delta x)^2## so goes to zero fast enough it doesn't contribute to the probability density function.

Cool, thanks. That makes sense.

Infrared said:
I think this is pretty similar to a standard intuitive argument for the product rule (not using probabilistic language). Consider a rectangle with side lengths ##f(x)## and ##g(x)##, so its area is ##f(x)g(x).## Change ##x## by a small amount ##\Delta x## and apply the same reasoning.
Do you mean like$$\begin{align*}
\Delta(f(x)g(x)) = f(x+\Delta x)g(x + \Delta x) - f(x)g(x) &\approx (f(x) + \Delta x f'(x))(g(x) + \Delta x g'(x)) - f(x)g(x)\\ \\&\approx \Delta x \left(f'(x) g(x) + f(x) g'(x) \right)
\end{align*}$$where we dropped the cross term in ##(\Delta x)^2##, so$$\frac{d(f(x)g(x))}{dx} = \lim_{\Delta x \rightarrow 0}\frac{\Delta(f(x)g(x))}{\Delta x} = f'(x) g(x) + f(x) g'(x)$$
 
  • Like
Likes   Reactions: Infrared

Similar threads

  • · Replies 9 ·
Replies
9
Views
3K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 14 ·
Replies
14
Views
2K
  • · Replies 9 ·
Replies
9
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 11 ·
Replies
11
Views
2K
  • · Replies 6 ·
Replies
6
Views
4K