Estimating joint distributions from marginal

Click For Summary

Discussion Overview

The discussion revolves around estimating joint probability distributions from given marginal probability density functions of two random variables, A and B. Participants explore methods for deriving the joint density function, potential challenges with small datasets, and the implications of using conditional probabilities.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant describes their approach to modeling the marginals P(A) and P(B) using a mixture model and expresses interest in finding the joint density function P(A and B).
  • Another participant asserts that it is generally impossible to determine a joint distribution solely from marginal distributions, suggesting that additional information or special circumstances are necessary.
  • A participant mentions the use of Bayes' theorem as a potential method for approximating the joint distribution but emphasizes that this is not merely an approximation.
  • One contributor discusses the challenges of estimating conditional probabilities from a small dataset, noting that this can introduce noise and error.
  • Another participant suggests that incorporating structure into the model, such as assuming a family of distributions defined by a few parameters, may help mitigate issues arising from limited data.
  • Clarifications are made regarding the interpretation of continuous probability density functions and the distinction between observed frequencies and probabilities.

Areas of Agreement / Disagreement

Participants express differing views on the feasibility of estimating joint distributions from marginals, with some asserting it is generally not possible without additional information, while others explore methods that might work under certain conditions. The discussion remains unresolved regarding the best approach to take.

Contextual Notes

Limitations include potential missing assumptions about the nature of the data and the distributions involved, as well as the dependence on the size of the dataset and the choice of distribution family.

exmachina
Messages
42
Reaction score
0
Suppose I have the marginal probability density functions of two random variables A and B, P(A), and P(B). Suppose I modeled P(A) and P(B) using a mixture model from some dataset D and obtained a closed form pdf for each.

I am interested in finding their joint density function P(A and B) and associated properties such as maximas, minimas, etc.

Ideally the joint density is expressed as a closed form 2D mixture model as well, but this is not critical.

I could do something perhaps by brute force by use of Baye's theorem:

ie. I can approximate

P(A and B) = P(A) P(B | A) = P(B) | P(A | B)

But eventually I need to extend this to higher dimensions, eg. P( A and B and C and D... etc) and this is certainly no trivial task.
 
Physics news on Phys.org
In general, you cannot determine a joint probability distribution when given only the marginal probability distributions, so if your problem can be solved the solution depends on special circumstances or information that you haven't mentioned. To get the best advice, you should describe the situation completely.

exmachina said:
I could do something perhaps by brute force by use of Baye's theorem:

ie. I can approximate

P(A and B) = P(A) P(B | A) = P(B) | P(A | B)

That isn't a mere approximation. It is a theorem.

Are you saying that you have data that could be used estimate the conditional probability distributions?
 
Well A and B are two variables that specify (completely) the state of the system. Suppose I've sampled a whole bunch of data points (a,b) s.t. I can generate their PDFs.

I can approximate P(B | A=a1) and P(A | B=b1) as well by taking a slice of my dataset, (eg. B= b1+-0.1) and count the occurrences of A. However, this can be bad because my entire dataset may be quite small, and using only a subset of it will result in a lot of noise and error.
 
exmachina said:
this can be bad because my entire dataset may be quite small, and using only a subset of it will result in a lot of noise and error.

I think the only convenient cure for small amounts of data is to build-in a lot of structure to the answer - for example, you might assume the distribution you are trying to determine is from a family of distributions that are defined by only a few parameters and estimate those parameters from the data. To do this you must employ any expert knowledge that you have about the situation. For example you may know that certain families of distributions have a plasusible shape and others don't.

You haven't described the problem clearly, but from your remarks, I conjecture that you are dealing with continuous variates. Some technicalities about your terminology: The value of a continuous probability density function does not give "the probability of" particular values. (For example, think about the uniform probability density function on the interval [0, 1/2] which has constant value 2.) However, I agree that it is often helpful to think about density functions informally that way. Observed frequences of values in a sample are not probabilities (unless you are taking about randomly selecting a value from the sample itself.) So you shouldn't use the p(A|B) notation for them. Of course, observed frequencies can be used as estimators of probabilities.
 

Similar threads

  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 1 ·
Replies
1
Views
5K
  • · Replies 2 ·
Replies
2
Views
2K
Replies
6
Views
2K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 9 ·
Replies
9
Views
5K