Combination of two dependant discrete random variables

Click For Summary
SUMMARY

This discussion focuses on the method to combine two dependent discrete random variables, specifically through their joint probability distribution. The variables A and B are represented by their marginal probabilities, with A having values PA(a=1, 4) = [0.75, 0.25] and B having PB(b=4, 8, 10) = [0.25, 0.25, 0.5]. The correlation between these variables is defined by the equation b = 10 – 2/3*a. To derive the joint probability distribution, one must set up simultaneous equations based on the known row and column sums of the joint probability table.

PREREQUISITES
  • Understanding of discrete random variables and their probability distributions
  • Knowledge of joint probability tables and marginal probabilities
  • Familiarity with correlation functions and linear regression concepts
  • Basic skills in solving simultaneous equations
NEXT STEPS
  • Research methods for deriving joint probability distributions from marginal probabilities
  • Learn about the implications of correlation coefficients in probability distributions
  • Study the concept of total least squares in regression analysis
  • Explore the use of characteristic functions in probability theory
USEFUL FOR

Statisticians, data scientists, and mathematicians interested in advanced probability theory and the analysis of dependent random variables.

simcc
Messages
1
Reaction score
0
Hi,
I’m looking for a way to combine two discrete random variables (which I have as probability distributions). The combination should be the product (or other operation) of the two variables.
This would be easy if they were independent, but they’re not. There is a known correlation between the variables.

Question: how to combine two discrete random variables with correlation?
Given: The marginal probabilities of the two variables & a correlation function
Result: either the individual probabilities in a probability table or the complete probability distribution of the combination.

Simple example:
Variables A and B are the distributions:
PA(a=1, 4) = [0.75, 0.25]
PB(b=4, 8, 10) = [0.25, 0.25, 0.5]

Their joint probability function is shown in their joint probability table and joint value table:
P B=4 8 10
A=1 ? ? ? 0.75
4 ? ? ? 0.25
0.25 0.25 0.5 1

value B=4 8 10
A=1 4 8 10
4 16 32 40

(tables are clearer in attached file)

The correlation between the two variables is: b = 10 – 2/3*a

P(A*B)(4, 8, 10, 16, 32, 40) = ?
 

Attachments

Physics news on Phys.org
You've described an interesting type of problem. This general type of problem is "ill posed", meaning that there are examples of it that have infinitely many solutions. However, ill posed problems arise in many real world situation, such as in mathematics of computing CAT scans, MRI scans etc., so you shouldn't let the ill posed nature of the problem deter you from thinking about it if you find it interesting.

To solve for the joint probability distribution (or determine that there are no solutions or infinitely many solutions), set up the simultaneous equations that the entries in the joint probability table must satisfy. Each entry in the joint probability table is an unknown. The fact that each row sum is known gives you an equation for each row. Likewise the totals for each column give you an equation for each column.

You need to clarify what you mean by "the correlation function". If you mean the line that is computed by doing linear regression ( to get a least squares fit), there is some ambiguity about that line. The line computed by treating B as the independent variable is not the same as the line you get by treating A as the independent variable. There is also a method called "total least squares" that fits a regression line that may be different from both the aforementioned lines. (If you intended to say "correlation coefficient", that is a single quantity, not a line. Likewise, the "covariance" of A and B is not a line.)

How you define the "correlation function" will give you more equations for the unknowns in the joint probability table.

You may find that in some cases, the simultaneous equations have no solution and in some cases they may have infinitely many solutions.

As to the probability distribution for the quantity AB, it would be defined by a table that gave all the possible values of AB and their probabilities. It would not list a value twice. So if your "joint value" table for AB had several entries all equal to the same number, then the final table for the random variable AB would list that number as a value only once. The probability of that vaue would be the sum of all probabilities in the joint distribution table that corresponded to that "joint vaue".
 
Last edited:
If you can use the normal approximation to your two distributions and know the correlation \rho, you should be able to use the characteristic function:

\phi(t_{1},t_{2})=exp[i(t_{1}\mu_{1}+t_{2}\mu_{2})-1/2(\sigma_{1}^{2}t_{1}^{2}+2\rho \sigma_{1} \sigma_{2} t_{1}t_{2}+\sigma_{2}^{2} t_{2}^{2})]
 
Last edited:

Similar threads

  • · Replies 30 ·
2
Replies
30
Views
5K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K