# Combination of two dependant discrete random variables

1. Dec 16, 2011

### simcc

Hi,
I’m looking for a way to combine two discrete random variables (which I have as probability distributions). The combination should be the product (or other operation) of the two variables.
This would be easy if they were independent, but they’re not. There is a known correlation between the variables.

Question: how to combine two discrete random variables with correlation?
Given: The marginal probabilities of the two variables & a correlation function
Result: either the individual probabilities in a probability table or the complete probability distribution of the combination.

Simple example:
Variables A and B are the distributions:
PA(a=1, 4) = [0.75, 0.25]
PB(b=4, 8, 10) = [0.25, 0.25, 0.5]

Their joint probability function is shown in their joint probability table and joint value table:
P B=4 8 10
A=1 ? ? ? 0.75
4 ? ? ? 0.25
0.25 0.25 0.5 1

value B=4 8 10
A=1 4 8 10
4 16 32 40

(tables are clearer in attached file)

The correlation between the two variables is: b = 10 – 2/3*a

P(A*B)(4, 8, 10, 16, 32, 40) = ?

#### Attached Files:

• ###### Combining two dependant discrete random variables.doc
File size:
27.5 KB
Views:
66
2. Dec 16, 2011

### Stephen Tashi

You've described an interesting type of problem. This general type of problem is "ill posed", meaning that there are examples of it that have infinitely many solutions. However, ill posed problems arise in many real world situation, such as in mathematics of computing CAT scans, MRI scans etc., so you shouldn't let the ill posed nature of the problem deter you from thinking about it if you find it interesting.

To solve for the joint probability distribution (or determine that there are no solutions or infinitely many solutions), set up the simultaneous equations that the entries in the joint probability table must satisfy. Each entry in the joint probability table is an unknown. The fact that each row sum is known gives you an equation for each row. Likewise the totals for each column give you an equation for each column.

You need to clarify what you mean by "the correlation function". If you mean the line that is computed by doing linear regression ( to get a least squares fit), there is some ambiguity about that line. The line computed by treating B as the independent variable is not the same as the line you get by treating A as the independent variable. There is also a method called "total least squares" that fits a regression line that may be different from both the aforementioned lines. (If you intended to say "correlation coefficient", that is a single quantity, not a line. Likewise, the "covariance" of A and B is not a line.)

How you define the "correlation function" will give you more equations for the unknowns in the joint probability table.

You may find that in some cases, the simultaneous equations have no solution and in some cases they may have infinitely many solutions.

As to the probability distribution for the quantity AB, it would be defined by a table that gave all the possible values of AB and their probabilities. It would not list a value twice. So if your "joint value" table for AB had several entries all equal to the same number, then the final table for the random variable AB would list that number as a value only once. The probability of that vaue would be the sum of all probabilities in the joint distribution table that corresponded to that "joint vaue".

Last edited: Dec 16, 2011
3. Dec 16, 2011

### SW VandeCarr

If you can use the normal approximation to your two distributions and know the correlation $\rho$, you should be able to use the characteristic function:

$$\phi(t_{1},t_{2})=exp[i(t_{1}\mu_{1}+t_{2}\mu_{2})-1/2(\sigma_{1}^{2}t_{1}^{2}+2\rho \sigma_{1} \sigma_{2} t_{1}t_{2}+\sigma_{2}^{2} t_{2}^{2})]$$

Last edited: Dec 16, 2011
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook