I Derive Convolution Expression for Z_PDF(z)

Click For Summary
The discussion centers on deriving the expression for the probability density function (PDF) of the sum of two independent random variables, Z = X + Y. Initial attempts to express Z_PDF(z) using the joint PDFs of X and Y are critiqued for lacking proper justification, particularly regarding the independence of X and Y and the need to account for all combinations of values that sum to z. The conversation emphasizes that the correct approach involves integrating over all possible pairs (x, y) such that x + y = z, leading to the convolution formula. Participants suggest that a rigorous derivation should start from the cumulative distribution function and differentiate it, rather than relying on informal algebraic manipulations. The importance of defining integration limits and relationships between variables is highlighted as crucial for accurate derivation.
  • #31
rabbed said:
I'm looking at something called a "line integral". Just need to understand it first :)
Do you think it looks like the right way to go?

The problem with that idea is that the joint density f(x,y) is a density per unit area and when you do integral of a scalar function f(x,y) over a curve, the interpretation of f(x,y) needs to be density per unit length.

That leads to the question: Give a density f(x,y) per unit area, can we use it to define a density per unit length on some lines ?
 
Physics news on Phys.org
  • #32
Stephen Tashi said:
The problem with that idea is that the joint density f(x,y) is a density per unit area and when you do integral of a scalar function f(x,y) over a curve, the interpretation of f(x,y) needs to be density per unit length.

That leads to the question: Give a density f(x,y) per unit area, can we use it to define a density per unit length on some lines ?

What if the densities are not joined, even though they can be?
 
  • #33
rabbed said:
What if the densities are not joined, even though they can be?

I don't know what you mean by that.
 
  • #34
X_PDF(x) and Y_PDF(y) both give density per unit length, right?
So we have the information..

But let me get it clear what we're doing.
Let's say we want the line on the plane where z = c and we call it L = (x(t), y(t)) = (t, c-t)

x'(t) = 1
y'(t) = -1

We want to integrate X_PDF(x) * Y_PDF(y) along the line L..

Z_PDF(c)*|dz| = integral wrt t from -inf to inf of X_PDF(x(t)) * Y_PDF(y(t)) * sqrt( x'(t)^2 + y'(t)^2 ) * dt

Ok?

We're integrating along the hypothenuse and we need the catheters?
 
  • #35
rabbed said:
X_PDF(x) and Y_PDF(y) both give density per unit length, right?

Yes
So we have the information..

But let me get it clear what we're doing.
Let's say we want the line on the plane where z = c and we call it L = (x(t), y(t)) = (t, c-t)

x'(t) = 1
y'(t) = -1

We want to integrate X_PDF(x) * Y_PDF(y) along the line L..

Z_PDF(c)*|dz| = integral wrt t from -inf to inf of X_PDF(x(t)) * Y_PDF(y(t)) * sqrt( x'(t)^2 + y'(t)^2 ) * dt

Ok?

No. Why do you have a "dz" on the left hand side of the equation?

On the right hand side, we get the intriguing result ##\int_{-\infty}^{\infty} X_{PDF}(t) Y_{PDF}(c-t) \sqrt{2} dt ##

We're integrating along the hypothenuse and we need the catheters?

"catheters"? Do you mean "sides" ?
 
Last edited:
  • #36
Stephen Tashi said:
No. Why do you have a "dz" on the left hand side of the equation?
I thought we were approximating the probability, so that we can then divide by |dz| to get Z_PDF(c)

Stephen Tashi said:
"catheters"? Do you mean "sides" ?
yep.. https://en.wikipedia.org/wiki/Cathetus
"catheters" in plural form from swedish according to google translate :)
 
  • #37
rabbed said:
I thought we were approximating the probability, so that we can then divide by |dz| to get Z_PDF(c)

Yes, but there is nothing on the right hand size of the equation that depends upon "dz". If you divide the right hand size of the equation by "dz" and take the limit of the right hand size as ##dz \rightarrow 0##, you get an infinite result.

Since c is some constant, the integral on the right hand side of the equation (if the integral exists) is some constant.

yep.. https://en.wikipedia.org/wiki/Cathetus
"catheters" in plural form from swedish according to google translate :)
Is your idea that we can fix the factor of ##\sqrt{2}## by some argument about "dz" being the hypotenuse of an infinitesimal triangle whose sides are "dx" and "dx" ? That's an interesting idea, but I don't see how to formulate it as plausible reasoning.
 
  • #38
Stephen Tashi said:
Yes, but there is nothing on the right hand size of the equation that depends upon "dz". If you divide the right hand size of the equation by "dz" and take the limit of the right hand size as dz→0dz \rightarrow 0, you get an infinite result.

Since c is some constant, the integral on the right hand side of the equation (if the integral exists) is some constant.

Best would be to use z instead of c all the way, but I wanted to clarify.

Stephen Tashi said:
Is your idea that we can fix the factor of √2\sqrt{2} by some argument about "dz" being the hypotenuse of an infinitesimal triangle whose sides are "dx" and "dx" ? That's an interesting idea, but I don't see how to formulate it as plausible reasoning.

Something like that. Can we use the "aim-vector" of L for that? so that the rate of growth of t has dx and dy in it
 
  • #39
L = (x(t), y(t)) = (t*|dy|/sqrt(2), z-t*|dy|/sqrt(2))

x'(t) = |dy|/sqrt(2)
y'(t) = -|dy|/sqrt(2)

Z_PDF(z)*|dz| = integral wrt t from -inf to inf of X_PDF(x(t)) * Y_PDF(y(t)) * sqrt( x'(t)^2 + y'(t)^2 ) * dt
= integral wrt t from -inf to inf of X_PDF(t*|dy|/sqrt(2)) * Y_PDF(z-t*|dy|/sqrt(2)) * |dy| * dt

Make any sense?
 
Last edited:
  • #40
Z_PDF(z) dz is approximately the integral of the joint density over a thin area bounded by the lines x + y = z + dz and x + y = z - dz. Approximate this integral by dividing the area into parallelograms with vertices ( (k)dx, z+dz), ((k)dx, ,z-dz) , ( (k+1)dx, z-dz) ((k+1)dx, z+dz)). The area of a parallelogram is (dx)(dz). The probability mass in a parallelogram is approximately X_PDF(kdx) Y_PDF(z-kdx) dx dz. This gives an approximation with "dz" on both sides of the equation.
 
  • #41
Stephen Tashi said:
Z_PDF(z) dz is approximately the integral of the joint density over a thin area bounded by the lines x + y = z + dz and x + y = z - dz. Approximate this integral by dividing the area into parallelograms with vertices ( (k)dx, z+dz), ((k)dx, ,z-dz) , ( (k+1)dx, z-dz) ((k+1)dx, z+dz)). The area of a parallelogram is (dx)(dz). The probability mass in a parallelogram is approximately X_PDF(kdx) Y_PDF(z-kdx) dx dz. This gives an approximation with "dz" on both sides of the equation.

Okay, sounds good.. I'll take a closer look when I get some more time. Thanks for now!
 
  • #42
Hm, a density value of an outcome equals the probability of that outcome per the length/area/volume of that outcome.

I would like to see how the outcome area of P(x<X<x+dx AND z-x<Y<z-x+dy) can be modified into the outcome area of P(z < Z < z+dz), since these should be equal.

Then it's desirable to take the quotient of these areas (even though I think this will be 1?).
When the number of source RV's equal the number of destination RV's, this will become the absolute value of the Jacobian determinant, I think?

I'll try to formulate that, but feel free to help out! :)
 
Last edited:
  • #43
rabbed said:
I would like to see how the outcome area of P(x<X<x+dx AND z-x<Y<z-x+dy) can be transformed into the outcome area of P(z < Z < z+dz), since these should be equal.

Why should they be equal ? For example, suppose z = 10, dz = 0.01, x = 5, dx = 0.5, dy = 0.25. Then you aren't accounting for cases like x = 9 and y = 1. And cases like x = 5.45, y = 5.20 aren't cases where x + y = z is in (z, z + dz).
 
  • #44
rabbed said:
Hm, a density value of an outcome equals the probability of that outcome per the length/area/volume of that outcome.

That would do for a definition of average density. For a probability density function (defined on points in a length, area, or volume) you need a definition for "density at a point", which means you must define it in terms of a limit of average densities.
 
  • #45
Stephen Tashi said:
Why should they be equal ? For example, suppose z = 10, dz = 0.01, x = 5, dx = 0.5, dy = 0.25. Then you aren't accounting for cases like x = 9 and y = 1. And cases like x = 5.45, y = 5.20 aren't cases where x + y = z is in (z, z + dz).

Right, already forgot :) Is it possible to express that as probabilities?
Something like P(OR wrt x from -inf to inf of x<X<x+dx AND z-x<Y<z-x+dy)? :)
Or maybe integral wrt x from -inf to inf of P(x<X<x+dx AND z-x<Y<z-x+dy)
Or integral wrt x from -inf to inf of P(X=x AND Y=z-x)

Also, should dy be expressed wrt x?
 
Last edited:
  • #46
rabbed said:
Right, already forgot :) Is it possible to express that as probabilities?

Express what ? Are you asking for ways to describe the event ##Z :\{ z < Z < z + dz\}## ? If you describe it in terms of variables like x,y,dx,dy, then those variables must have some relation to z and dz.

In the XY-plane the event ##Z :\{ z < Z < z + dz\}## is ## (x,y): \{ z < x + y < z + dz \} ## If you want to write a description that includes "dx" and "dy", you have to specify how they are related to "z" and "dz".
 
  • #47
Basically, there should be a directional derivative or gradient (dz/dx, dz/dy) when you have many source RV's and one destination RV.

I want Z_PDF(z) to be expressed in terms of the probability of the points on that infinite line, divided by the absolute value of the derivative (length?) of the point where Z=z, like when there is a Jacobian..

Something like:
Z_PDF(z)/|dz| = integral wrt x from -inf to inf of X_PDF(x) / |dz/dx| * Y_PDF(z-x) / |dz/dy| * |dz|

I realize |dz/dx| and |dz/dy| is 1.. so maybe it would be better to try something like Z = 2*X + 3*Y

Maybe it will give a definition of what a determinant looks like for a non-square matrix, because that has no definition now, right? :)
 
Last edited:
  • #48
rabbed said:
I want Z_PDF(z) to be expressed in terms of the probability of the points on that infinite line, divided by the absolute value of the derivative (length?) of the point where Z=z, like when there is a Jacobian..

For continuous random variables, the probability of each point on the infinite line is zero and the probability that (X,Y) will be some point on the infinite line is also zero. We have to think about probability densities, not probabilities.

The line x + y = z is a level curve of the surface f(x,y) = x + y. The gradient of f(x,y) defines a vector field that is perpendicular to that level curve. If we imagine Z varying from z to z + dz, this corresponds to the level curve x+y = Z sweeping out an area approximated by moving each point (x,y) on the level surface in the direction specified by the gradient.

I don't think you can approximate the area swept out by the level surface only by considering the gradient, because the area swept out depends on the shape of curve. If I imagine the curve approximated by a series of small straight line segments, then the area swept out is approximated by a sum of areas of parallelograms. One side of a parallelogram is a line segment. The adjacent size is a vector defined by the direction of the gradient at one end of the line segment.

Perhaps this topic has been worked out by people who study "level set" methods. https://en.wikipedia.org/wiki/Level_set_method

Or page 449 eq 39 b) of https://books.google.com/books?id=Q...ge&q=area swept out by a moving curve&f=false
 
Last edited:
  • #49
Crystal clear explanation! Thank you :)
It doesn't seem impossible to do, but I guess if it can/has been solved they would already be teaching it as part of probability theory.

So people get by with only having the n-to-1 and n-to-n dimensional cases or are n-to-m calculations done graphically instead of with a formula?
 
  • #50
rabbed said:
So people get by with only having the n-to-1 and n-to-n dimensional cases or are n-to-m calculations done graphically instead of with a formula?

Even the 1-dim to 1-dim case isn't that simple in practice. Things are simple if Y = F(X) is a monotone function of the random variable X, but if it has peaks and valleys, you must consider various cases.

A complication with an n-dimensional function of a random variable (or variables) is that the components of a n-dimensional random vector might be dependent, even if the variables in the domain of the functions are independent. For example if X1,X2,X3 are independent random variables and Y1 = X1 + X3, Y2 = X2 + X3 then the joint density of (Y1,Y2) isn't necessarily given by the product Y1_PDF(y1) Y2_PDF(y2).
 
  • #51
Hello again, a bit late (as usual after doing some thinking).

Does this make sense?

The line is (x(t), y(t)) = (t, z-t)

Z_PDF(z) = integral wrt t from -inf to inf of X_PDF(x(t)) * Y_PDF(y(t)) * sqrt( x'(t)^2 + y'(t)^2 ) * dt / sqrt(det( [ z'(x(t)) z'(y(t)) ] * [ z'(x(t)) z'(y(t)) ]^T ))

The line integral divided by the square root of the squared determinant of J where J is the gradient [ z'(x(t)) z'(y(t)) ]
(the square root which is also the length element of z divided by the area element of x and y at a point)
It would be nice to convert the line integral into a double integral with dx and dy multiplying area elements into length elements of z, but maybe that's not possible..
 
Last edited:
  • #52
rabbed said:
The line is (x(t), y(t)) = (t, z-t)

Z_PDF(z) = integral wrt t from -inf to inf of X_PDF(x(t)) * Y_PDF(y(t)) * sqrt( x'(t)^2 + y'(t)^2 ) * dt / sqrt(det( [ z'(x(t)) z'(y(t)) ] * [ z'(x(t)) z'(y(t)) ]^T ))

The line integral divided by the square root of the squared determinant of J where J is the gradient [ z'(x(t)) z'(y(t)) ]
(the square root which is also the length element of z divided by the area element of x and y at a point)
It would be nice to convert the line integral into a double integral with dx and dy multiplying area elements into length elements of z, but maybe that's not possible..

I don't see any definition for z(t).

Does you forumula work for the case where X and Y are each uniformly distributed on [0,1] ?
 
  • #53
Stephen Tashi said:
I don't see any definition for z(t).
Right, but z(t) isn't used anywhere. z = f(x,y) = x + y, does it need to be a function of t?

Stephen Tashi said:
Does you forumula work for the case where X and Y are each uniformly distributed on [0,1] ?
Since both square roots become sqrt(2) we end up with the convolution formula, so it should be ok?
 
  • #54
By the way,
since (x(t), y(t)) = (t, z-t)
is
integral wrt t from -inf to inf of X_PDF(x(t)) * Y_PDF(y(t)) * sqrt( x'(t)^2 + y'(t)^2 ) * dt
equal to
integral wrt x from -inf to inf of X_PDF(x) * Y_PDF(z-x) * sqrt( (dx/dx)^2 + (dy/dx)^2 ) * dx?

In that case:
Z_PDF(z) = integral wrt x from -inf to inf of X_PDF(x) * Y_PDF(z-x) * sqrt( (dx/dx)^2 + (dy/dx)^2 ) * dx / sqrt(det( [ (dz/dx) (dz/dy) ] * [ (dz/dx) (dz/dy) ]^T ))
 
  • #55
rabbed said:
Right, but z(t) isn't used anywhere. z = f(x,y) = x + y, does it need to be a function of t?

If z is a function of the two variables (x,y) then what do you mean by z' ?
 
  • #56
Stephen Tashi said:
If z is a function of the two variables (x,y) then what do you mean by z' ?
depends on what variable it's derivated with respect to
z = x + y
dz/dx = 1 (z'(x) = 1)
dz/dy = 1 (z'(y) = 1)
 
  • #58
Stephen Tashi said:
You should be using the notation for partial derivatives. (The Insight: https://www.physicsforums.com/insights/partial-differentiation-without-tears/ is relevant to the question "Is z a function of t ?")
Yep, I know the other notation is a bit flawed, but it has it's uses. (but tell me if my logic was wrong somewhere!)

So it this better?
rabbed said:
By the way,
since (x(t), y(t)) = (t, z-t)
is
integral wrt t from -inf to inf of X_PDF(x(t)) * Y_PDF(y(t)) * sqrt( x'(t)^2 + y'(t)^2 ) * dt
equal to
integral wrt x from -inf to inf of X_PDF(x) * Y_PDF(z-x) * sqrt( (dx/dx)^2 + (dy/dx)^2 ) * dx?

In that case:
Z_PDF(z) = integral wrt x from -inf to inf of X_PDF(x) * Y_PDF(z-x) * sqrt( (dx/dx)^2 + (dy/dx)^2 ) * dx / sqrt(det( [ (dz/dx) (dz/dy) ] * [ (dz/dx) (dz/dy) ]^T ))
 
  • #59
rabbed said:
So it this better?

The problem is that your notation can't be interpreted. For example, what function is "dy/dx" ?

A mathematical function has a domain and a co-domain. What is the domain of "dy/dx" and what is it's co-domain?
 
  • #60
Since z = x + y,
y = z - x
and
dy/dx = -1?
 

Similar threads

  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 10 ·
Replies
10
Views
764
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 33 ·
2
Replies
33
Views
4K
  • · Replies 10 ·
Replies
10
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 3 ·
Replies
3
Views
861
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 2 ·
Replies
2
Views
811