# I Change of variables many-to-many transformation

1. Sep 6, 2016

### rabbed

With the change of variables-method for a many-to-one transformation function Y = t(X),
what's the logic behind summing the different densities for the roots of x = t^-1(y)?
Probabilities should be ok to add, but densities?

Also, is there no way to extend this method for many-to-many transformation functions, like calculating
the density for each of the roots of y = t(x) separately and summing those sums of densities?

For example, if X has uniform distribution (-1 < x < 1) and Y = +/- sqrt(1-X^2), maybe
it's possible to sum the two densities of positive/negative x for both positive and negative y, then
add the two densities of positive/negative y and divide by two? Would that work if there is asymmetry?

2. Sep 9, 2016

### chiro

Hey rabbed.

Aside from the mathematics you should realize that you are just mapping the probabilities from one set of numbers to another and assuming that the transformation is analytic, this will mean that the probabilities will be stretched across the space.

The things being mapped to change - but the probabilities themselves don't (unless you have a situation like squaring a random variable in which case the 1-1 thing is violated and you do get a big change in how the new probabilities are spread across the space), and that is the key to understanding the formula.

I'm not sure what you mean about summing the densities but you will need a random variable that covers the whole space and you use that as your distribution.

Symmetry only exists in very specific cases and it's not something I'd recommend going by in general.

The algebra for finding new distributions that are functions of others spans a lot of probability and statistical theory and it isn't just one or two results but rather an entire collection of them used in a variety of contexts and situations.

3. Sep 10, 2016

### rabbed

Hi chiro

Thanks, you always give good answers on a level that's hard to find elsewhere :)

And yes, I realize symmetry is a special case so I'm not comfortable with the solution I have so far for this case:

I want to know the distribution of the y-coordinate given the distribution of the x-coordinate and the fact that (x,y) is a point on the unit circle:
X_PDF(x) = 1/(1-(-1)) = 0.5 (-1 < x < 1)
Yp = tp(X) = +sqrt(1-X^2) (t for transformation, p for positive, since the change of variables method only allows many-to-one transformation functions)

Since t is many-to-one, I find the inverse transformation for all roots (naming the inverse functions tpn for positive Y, negative X and tpp for positive Y, positive X):
Xn = tpn^-1(Y) = -sqrt(1-Y^2)
Xp = tpp^-1(Y) = +sqrt(1-Y^2)

For the formula I will also need the derivative of tp:
tp'(X) = -x/sqrt(1-x^2)

Now I get:
Yp_PDF(y) = X_PDF(tpn^-1(y)) / |tp'(tpn^-1(y))| + X_PDF(tpp^-1(y)) / |tp'(tpp^-1(y))| = 1/|sqrt(1-y^2)/sqrt(y^2)|
(this is what I mean by summing densities)

This gives me a distribution which I can use to calculate the probability that Y will be in a certain range, but only for positive y's.
If I want to calculate the probability that Y will be in a certain range for the whole span (-1 < y < 1), I use the symmetry and multiply Yp_PDF(y) by 0.5 so that it integrates to 1 for (-1 < y < 1) instead.

It's this last part that sucks and makes me want to extend the method to many-to-many transformation functions.

That's why I thought, maybe you can do another calculation, starting with Yn = tn(X) = -sqrt(1-X^2), which will get me
Yn_PDF(y) = X_PDF(tnn^-1(y)) / |tn'(tnn^-1(y))| + X_PDF(tnp^-1(y)) / |tn'(tnp^-1(y))| = 1/|sqrt(1-y^2)/sqrt(y^2)|

And somehow get:
Y_PDF(y) = f( Yn_PDF(y), Yp_PDF(y) )

This made me wonder why it was OK to sum densities when calculating Yp_PDF(y) and what the logic is behind that?

4. Sep 10, 2016

### chiro

It seems that what you are saying is that you are creating a new distribution by considering all of the branch cuts (I use that terminology as it comes from analysis).

If they are disjoint you could do that but when you have that overlap you typically have to know the original distribution if you are to go back to one branch.

In other words if you have say a random variable X with Y = f(X) and then you do the inverse to get back [and it has multiple branches] then you have to know specific things about X to deal with that and in general it means understanding how to partition the random variable so that you get back to the definition of X.

You can certainly partition random variables where you get say X_all = X_1 + X_2 where X_1 < 0 and X_2 >= 0 if you want to do that way and there is some sort of indicator function that selects the random variable for each branch.

The many to many transformations should utilize the Jacobian [but likely it will have to have an absolute value in there somewhere] with a standard substitution from one coordinate system to another but you would have to derive it and check the result.

The idea is similar to the one dimensional change of variables except that you are looking for a combination of random variables which means you have matrix theory and multi-variable calculus in combination with the ideas of the one dimensional change of variables.You could look at the linear case [i.e. linear combination of random variables first] and then generalize to the non-linear case later on but an arbitrary transformation will involve operators and I'd look at the case when you get a determinant of zero to see what happens when you don't have a 1-1 mapping since that will be similar for when you get this in the non-linear situation as well.

There is a reason for the absolute value of the derivative and I'd read about that first before using the substitution integral for the higher dimensional version.