# Proving the product rule using probability

Staff Emeritus
Gold Member
I thought this was kind of a cool proof of the product rule.

Let ##F(x)## and ##G(x)## be cumulative distribution functions for independent random variables ##A## and ##B## respectively with probability density functions ##f(x)=F'(x)##, ##g(x)=G'(x)##. Consider the random variable ##C=\max(A,B)##. Let ##H(x)## be the cumulative distribution function of ##C##, with pdf ##h(x)=H'(x)##. Then ##h(x) = f(x)G(x) + F(x)g(x)##. If ##C=x##, it's because either ##A=x## and ##B\leq x## or ##B=x## and ##A\leq x##. But since ##A## and ##B## are independent, to both be smaller than ##x##, the probabilities just multiply. So the cdf is simply ##F(x)G(x)##. Therefore ##\frac{d}{dx} \left(F(x)G(x)\right) = F'(x)G(x) + F(x)G'(x)##.

That's pretty much it! I thought it was kind of neat and wanted to share it.

vela, FactChecker, etotheipi and 2 others

Mentor
Having done the proof via probability, does that limit its scope to only probabilistic functions?

member 587159 and FactChecker
The direct proof is simple enough ##H(x)=P(C\le x)=P(A\le x, B\le x)=P(A\le x)P(B\le x)=F(x)G(x)## The last step uses independence of ##A## and ##B##.

I wondered how you deduced ##h(x)## without going via. the CDF first? Usually to derive the PDF of the ##\text{max}## function I would have thought you say, assuming independence, ##H(x) = P(C \leq x) = P(A \leq x) P(B \leq x) = F(x)G(x)## and then make use of the product rule to find ##h(x)##.

But for this proof we need to do it backward, i.e. already know what ##h(x)## is. So is there another way of obtaining ##h(x)##? Thanks... sorry if I missed the point !

Last edited by a moderator:
Staff Emeritus
Gold Member
Having done the proof via probability, does that limit its scope to only probabilistic functions?

Technically here F and G are increasing functions between 0 and 1. But the derivative scales with multiplication by a constant, and is invariant under constant shifts, and also is only a property of the function locally. So I think it's easy to prove given any F and G, you can shift them and maybe multiply them by -1 and then redefine them everywhere except for the area where you want to calculate the derivative to make the functions match the requirements.

The direct proof is simple enough ##H(x)=P(C\le x)=P(A\le x, B\le x)=P(A\le x)P(B\le x)=F(x)G(x)## The last step uses independence of ##A## and ##B##.

Where do you prove the product rule for calculating derivatives here? I think I'm using the fact you posted here in my proof.

I wondered how you deduced ##h(x)## without going via. the CDF first? Usually to derive the PDF of the ##\text{max}## function I would have thought you say, assuming independence, ##H(x) = P(C \leq x) = P(A \leq x) P(B \leq x) = F(x)G(x)## and then make use of the product rule to find ##h'(x)##.

I think the point here is basically I calculate h by being clever. There pdf can be interpreted as in a small area ##\Delta x## around ##x##, the probability ##A## is in that region is ##f(x)\Delta x##. Similar for ##B## and ##g(x)##. Then if ##C## is in that ##\Delta x## region, the probability it's because ##A## is in that region and ##B##is smaller is ##G(x)f(x) \Delta x##. And a similar formula for the other way around. Adding them up gives the probability that ##C## is in the region. There's a small issue where I have double counted some of the times where both ##A## and ##B## are both in the region, but the chance of that is proportional to ##(\Delta x)^2## so goes to zero fast enough it doesn't contribute to the probability density function.

etotheipi
Gold Member
I think the point here is basically I calculate h by being clever. There pdf can be interpreted as in a small area ##\Delta x## around ##x##, the probability ##A## is in that region is ##f(x)\Delta x##. Similar for ##B## and ##g(x)##. Then if ##C## is in that ##\Delta x## region, the probability it's because ##A## is in that region and ##B##is smaller is ##G(x)f(x) \Delta x##. And a similar formula for the other way around. Adding them up gives the probability that ##C## is in the region. There's a small issue where I have double counted some of the times where both ##A## and ##B## are both in the region, but the chance of that is proportional to ##(\Delta x)^2## so goes to zero fast enough it doesn't contribute to the probability density function.

I think this is pretty similar to a standard intuitive argument for the product rule (not using probabilistic language). Consider a rectangle with side lengths ##f(x)## and ##g(x)##, so its area is ##f(x)g(x).## Change ##x## by a small amount ##\Delta x## and apply the same reasoning.

etotheipi
Staff Emeritus
Do you mean like\begin{align*} \Delta(f(x)g(x)) = f(x+\Delta x)g(x + \Delta x) - f(x)g(x) &\approx (f(x) + \Delta x f'(x))(g(x) + \Delta x g'(x)) - f(x)g(x)\\ \\&\approx \Delta x \left(f'(x) g(x) + f(x) g'(x) \right) \end{align*}where we dropped the cross term in ##(\Delta x)^2##, so$$\frac{d(f(x)g(x))}{dx} = \lim_{\Delta x \rightarrow 0}\frac{\Delta(f(x)g(x))}{\Delta x} = f'(x) g(x) + f(x) g'(x)$$