# I Deriving convolution

1. Sep 24, 2016

### rabbed

Hi

Can I derive the expression for Z_PDF(z) where:
Z = t(X,Y) = X + Y
By starting with:
Z_PDF(z)*|dz| = X_PDF(x)*|dx| * Y_PDF(y)*|dy|
Z_PDF(z) = X_PDF(x) * Y_PDF(y) * |dx|*|dy|/|dz|

and then substitute the deltas with derivatives and x and y with expressions of z?

Last edited: Sep 24, 2016
2. Sep 24, 2016

### Stephen Tashi

Are you asking whether you can "derive" in the sense of giving a proof? Or are you just asking whether you can get a result by doing some formal algebraic manipulations ?

I see no reason why that would be true - even using "infinitesimal reasoning" upon "dx,dy,dz".

Inutuitively, the value of Z_PDF(z) dz is the probability that Z is in the interval (z - dz, z + dz). For that to happen, the probabilities of X and Y can't be chosen arbitrarily. You have incorporate some relationship between x and y and the value of z.

I think the basic idea is that to approximate Z_PDF(z) dz you must integrate the joint PDF of x and y over the region where x + y is in (z - dz, z + dz). (If you are assuming X and Y are independent, you need to say so.)

3. Sep 25, 2016

### rabbed

The latter.

Ok, I have to do the substitution of x and y before I state the equality of probabilities (otherwise it's not true), and since X and Y are just any numbers picked at random (either does not affect the probability of the other), they are independent giving P(X=z-y AND Y=z-x) = P(X=z-y) * P(Y=z-x | X=z-y) = P(X=z-y) * P(Y=z-x), so:
Z_PDF(z)*|dz| = X_PDF(z-y)*|dx| * Y_PDF(z-x)*|dy|

Is this first step OK?

4. Sep 25, 2016

### Stephen Tashi

No. For example, suppose x = 100 and y = 1. You're claiming the value of Z_PDF(101) is approximately the probability that X is near 100 and Y is near 1. But Z_PDF(101) must also account for other possibilities. For example X might be near 50 and Y might be near 51.

5. Sep 25, 2016

### rabbed

Z_PDF(z)*|dz| = X_PDF(x)*|dx| * Y_PDF(z-x)*|dy|

Better? :)

6. Sep 25, 2016

### Stephen Tashi

Suppose z = 101 and x = 1. You're saying the probability that Z is near 101 is approximately the probability that X is near 1 and Y is near 100. But the probability that Z is near 101 must also include other possibilities - for example that X is near 50 and Y is near 51.

7. Sep 25, 2016

### rabbed

But if Z=101,
I can set x = 1 and get y = 101-1 = 100
Z_PDF(101)*|dz| = X_PDF(1)*|dx| * Y_PDF(101-1)*|dy|

Or I can set x = 50 and get y = 101-50 = 51
Z_PDF(101)*|dz| = X_PDF(50)*|dx| * Y_PDF(101-50)*|dy|

8. Sep 25, 2016

### Stephen Tashi

You can only do that if you assume the equation you are using is correct. You might get two different answers for Z_PDF(101) if you do those two different computations.

9. Sep 25, 2016

### rabbed

But am I not "setting" the unknown Z_PDF(101) to be the same no matter if x=1 and y=100 or x=50 and y=51?
They will become the same because of the different deltas?

10. Sep 25, 2016

### rabbed

Maybe I'm lost..
But it would be good to have a 'derivation' starting from the probability approximations using the PDF's. And it should be possible because that's what's actually being used to get the destination RV's PDF? Any advice?

11. Sep 25, 2016

### Stephen Tashi

You are going to have to do something involving a summation. The value of Z_PDF( z) doesn't depend only on one particular pair of values for X and Y, so Z_PDF(z) is not going to be a function of X_PDF(x) and Y_PDF(y) for one particular pair of values x,y. The value of Z_PDF(z) depends on the values of X_PDF and Y_PDF at all possible combinations of x,y that sum to z. You have to sum over all those possible combinations.

Most dx-dy-dz arguments explicitly or implicitly express a differential equation. I don't know whether there is a dx-dy-dz way to derive the convolution formula. My suggestion is to start with the answer and try to express it as a differential or partial differential equation. Try taking the formula for the cumulative distribution of the convolution of X and Y and differentiate both sides of it. (You might have to use Leibnitz's formula for differentiating an integral where variables appear in the limits of integration. See the "Formal statement" section of https://en.wikipedia.org/wiki/Leibniz_integral_rule )

12. Sep 26, 2016

### rabbed

So since the transformation function is describing a plane, The PDF formula of the destination RV needs to account for the probability of each point on that plane (or rather the infinite line of constant z)?
Like for example a n-to-1 transformation function, where the PDF formula of the destination RV needs to account for the probability of n source RV values for each destination RV value?
(The first case having two (dimensional) source RV's and the second case having one source RV)

Last edited: Sep 26, 2016
13. Sep 26, 2016

### rabbed

Z_PDF(z)*|dz| = integral wrt x from -inf to inf of X_PDF(x)*|dx| * Y_PDF(z-x)*|dy|

Looks better?

14. Sep 26, 2016

### Stephen Tashi

Yes.

Yes, it's a similar situation. The convolution is a special case of a many-to-one transformation. The transformation Z = X + Y maps many points (x,y) to one point z.

The convolution of independent random variables is a even more special case where the joint PDF of X and Y is the product of their individual PDFs.

15. Sep 26, 2016

### Stephen Tashi

It looks better, but your integration isn't defined. You say "wrt x" but you also have a "dy" in the integrand.

In general, if you were asked to integrate a function f(x,) of two variables over the line x + y = 2, how would you write this integration ?

16. Sep 26, 2016

### rabbed

I was thinking I could divide both sides by |dz| and change |dy/dz| into it's derivative.. If y=z-x, then dy/dz = 1?

It was a long time since I did much integrating, but maybe you mean a double integral?
Or setting the integration limits according to that relation..

Last edited: Sep 26, 2016
17. Sep 26, 2016

### Stephen Tashi

You can divide both sides of an equation by something once you have an equation. But what you wrote isn't an actual equation because the integration on the right hand side isn't completely defined.

My use of the terminology "integrating over" was not good. If we integrate g(x,y) over an area, then we use a double integral and, yes, one of the limits would involve a variable - since given a value of x, we can vary y and still have (x,y) be inside an area. However, we try to integrate g(x,y) "over a line" by using a double integral, we get zero since a line has no area. If we set a value of x, we have no choice about the value of y.

What I mean to say is that we want the integral $h(z) = \int_{-\infty}^{\infty} g(x,z-x) dx$, which doesn't involve any "dy".

18. Sep 27, 2016

### rabbed

In programming terms I see the integral sign as a summing for-loop:
for (v=a, result=0; v<b; v+=dv) result += f(parameters)
where you can choose the limits a and b and which of the parameters that should be substituted by the value of v in some expression f.
(Same with sigma, where the only difference being the increment 1 instead of the infinitesimal dv)
Although, we can only do integration if the expression contains dx (in the case where the parameter x was chosen).

Shouldn't the right hand side also mathematically just be a sum in
Z_PDF(z)*|dz| = integral wrt x from -inf to inf of X_PDF(x)*|dx| * Y_PDF(z-x)*|dy|
and it shouldn't matter if some factor is infinitesimal.

How does this extend to n source RV's? Would I let n-1 variables not be substituted in terms of the other variables and be integrated, while one of the source variables is expressed in terms of the other?

For example:
Q = X + Y + Z + W
Q_PDF(q)*|dq| = integral wrt x from -inf to inf of integral wrt y from -inf to inf of integral wrt z from -inf to inf of X_PDF(x)*|dx| * Y_PDF(y)*|dy| * Z_PDF(z)*|dz| * W_PDF(q-x-y-z)*|dw|

Last edited: Sep 27, 2016
19. Sep 27, 2016

### Stephen Tashi

If we are approximating the integral of a function "f", the code would be:
result += f(v)*dv

The sum we are approximating doesn't involve any factor of dy. The function being integrated is: f(x) = X_PDF(x)*Y_PDF(z-x) where z is a constant since we do this approximation when we are given a specific value of z in order to find Z_PDF(z).

If that were true then for X,Y independent and Z = X + Y we would have the convolution formula
$Z\_PDF(z) = \int_{-\infty}^{\infty} \ ( \int_{-\infty}^{\infty} X\_PFF(x) Y\_PDF(z-y) dy)\ dx$
$= \int_{-\infty}^{\infty} \ ( X\_PDF(x) \int_{-\infty}^{\infty} Y\_PDF(z-y) dy)\ dx$
$= \int_{-\infty}^{\infty} X\_PDF(x) (1) dx$
$= 1$

20. Sep 27, 2016

### rabbed

But if I keep |dz| on the left side, the products on the right hand side will have a factor |dy| in their sum? And afterwards I divide both sides with |dz| to get rid of the |dy|.

If I have n independent source RV's and n=2 (Z=X+Y), I assign an integral sign for n-1 of them, for example x, and the remaining one, y, I substitute with the expression z-x.
So I will always have n-1 integral signs and 1 change of variable. Will this not work for n>2?

21. Sep 28, 2016

### Stephen Tashi

If you are going to begin the derivation with the assertion $Z_{pdf}(z) dz = \int_{-\infty}^{\infty} X_{pdf}(x) dx Y_{pdf}(y) dy$ then you need to justify that equation before proceeding. For example, is that equation correct even in a simple case such as when X and Y are each uniformly distributed on [0,1] ?

A derivation beginning with the assertion $Z_{pdf} dz = \int_{-\infty}^{\infty} X_{pdf}(x) dx Y_{pdf}(y) dy$ is apparently claiming that an appropriate dy can be chosen to make the two sides of the equation equal, because there is no other stipulation on what dy represents.

However, by a similar technique, we can begin by assuming the equation $1 dz = 2 dy$. Dividing both sides of that equation by dz doesn't prove 1 = 2. An equation like $1 dz = 2 dy$ only holds when there is particular relationship between dz and dy.

So you if attempt to derive the general convolution formula by assuming a special realationship between Y and Z, you must show that this relationship is not actually "special". It must always hold between Y and Z regardless of how X is distributed.

22. Sep 28, 2016

### rabbed

I'm just trying (for my own sake) to explain the change of variables/PDF method and have done this by:
- first explain the PDF formula for one-to-one transformation functions
- then explain how to use that in the PDF formula for many-to-one transformation functions (using OR-logic with the probability approximations:
Y_PDF(y)*|dy| = X_PDF(t1^-1(y)) * dx + X_PDF(t2^-1(y)) * dx + ..
where t1^-1(y), t2^1(y) .. are the different solutions of the inverse transformation function)

Now I want to learn how to treat the probabilities in the case of multidimensional transformation functions, starting with convolutions.
So as long as the reasoning makes sense probability-wise (using AND/OR logic and some change of variables), I'm happy :)

Since, before I started with the change of variables method-explanation I already explained how the AND/OR-'formulas' hold for probabilities and that the probability of a specific outcome value for a continuous random variable can be approximated by multiplying its PDF with the infinitesimal outcome size, if I can explain how to treat the probabilities for the separate transformation cases above in a way that makes sense intuitively and mathematically it's feels like proof enough for me.

Last edited: Sep 28, 2016
23. Sep 28, 2016

### Stephen Tashi

That's an interesting approach. I don't recall ever seeing convolution presented as special case of a more general theorem - a theorem for a general "function of several random variables".

Ok, but what you've presented so far doesn't make sense. If we are doing to do a dx,dy,dz type of argument it has to make sense to a dx,dy,dz type of thinker.

If we accept Z_PDF(z) dz as an approximation for the probabilty of z - dz < Z < z + dz then there is some area in the X,Y plane that corresponds to this event. How do we integrate the joint density g(x,y) of X and Y over that area? The area isn't simply a rectangle of dimensions dx by dy with sides parallel to the respective coordinate axes.

For $g(X,Y) = X + Y$, I think the area looks like two "infinite triangles" with a common vertex at (0,0). Every line that satisfies the equation x + y = k for z-dz < k < z+dz is within the area.

So I think the integration we need is $\int_{-\infty}^{\infty} \int_{ymin}^{ymax} g(x,y) dy dx$ where
$ymin = MIN( z - dz - x, z + dz - x)$ and $ymax = MAX(z - dz - x, z + dz - x)$.

24. Sep 29, 2016

### rabbed

I think the case of Z = X + Y does make sense:
Z_PDF(z)*|dz| = integral wrt x from -inf to inf of X_PDF(x)*|dx| * Y_PDF(z-x)*|dy|
This would produce the right hand side sum:

X_PDF(-inf)*|dx| * Y_PDF(z-(-inf))*|dy| +
X_PDF(-inf+dx)*|dx| * Y_PDF(z-(-inf+dx))*|dy|
X_PDF(-inf+2*dx)*|dx| * Y_PDF(z-(-inf+2*dx))*|dy| +
...
X_PDF(inf)*|dx| * Y_PDF(z-(inf))*|dy|

We get the sum of the probabilities of each infinitesimal point on that infinite line where z is some constant c and the result is Z_PDF(c)*|dz| (the probability that any point OR the others on that line will be the outcome).

Wouldn't the same reasoning work for another dimension of source RV, giving us a plane of points we need to sum the probabilities for? Maybe we don't need to look at the area/volume etc. of the shape, just the probabilities of the points? So we only use the integral signs to produce the coordinates of those points and make sure that the change of the last variable (y = z-x in the above 2D case) keeps the coordinates on that line/place etc.

2D: z = x + y => (x, z-x) are the points on the 1D line where z is constant
3D: w = x + y + z => (x, y, w-x-y) are points on the 2D plane where w is constant
4D: q = x + y + z + w => (x, y, z, q-x-y-z) are points in the 3D volume where q is constant

Hm, is that correct?

Then maybe the area/volume element comes in automatically when dividing the right hand side substituted variable's delta by the left hand side delta. but yes, it feels too speculative.. so maybe we need a more analytic solution that you propose.

Last edited: Sep 29, 2016
25. Sep 29, 2016

### Stephen Tashi

If you think that makes sense as an approximation, try coding it for a specific example and see if you can approximate Z_PDF(z) that way for various values of z.

The problem I see is that you are integrating the joint density over a line of uniform thickness, but the bundle of lines that define the probability that z - dz < Z < z + dz is not of uniform thickness.