Combining Distributions (ex. Mixture distribution, copula)

In summary, the conversation discusses the use of copulas and mixture distributions to describe the resulting population from combining random variables from different distributions. The speaker is seeking other ways to link or relate the random variables, such as through principal component analysis or independent component analysis, but it is unclear what they specifically mean by this. The conversation also touches on the interpretation of "statistic" and whether there are other ways to represent the data as a sum of components.
  • #1
Apteronotus
202
0
This is a vague question and I apologize in advance for not being able to explain it better.

I'm combining r.v.'s from different populations (distributions). The resulting population can be thought to come from a mixture distribution. I think another way of describing the resulting distribution may be by the use of copulas.

I'm wondering if there are other ways aside from mixture distributions and copulas.

Many thanks
 
Physics news on Phys.org
  • #2
Apteronotus said:
I think another way of describing the resulting distribution may be by the use of copulas.

A copula is used to handle a joint distribution of two random variables, so the two variables might represent two different physical quantities (e.g. weight and temperature). A mixture distribution represents a single random variable (e.g. weight). You can model the joint distribution of several random variables that have the same physical unit (e.g. weights of pieces of candy) and this would imply a distribution for their sum (e.g. weight of a box of 20 pieces of candy).

What are the physical units of the random variables involved in your study?
 
  • #3
The physical units of my variables is $ dollars.

One random variable represents $'s lost due fraud. The second $'s lost due to external circumstances.

These r.v.'s may come from different distributions but they may not necessarily be independent.
 
  • #4
How are you seeking to "combine" the random variables? If the object is compute total cost, this would imply adding them. Did you mean to ask for ways to "relate" two random variables?
 
  • #5
Hi Stephen,

Thank you for your post. I'm sorry if I'm failing to describe the situation clearly. Here's a second attempt...

Suppose I take all incidences of loss due to fraud (r.v. X) and those of external circumstances (r.v. Y) and put them in one "box". Then the members of my box can come from either of X or Y populations each of which has a different distribution.

A mixture distribution allows me to describe the distribution of my box.

Is there another way of doing that aside from mixture distributions.
 
  • #6
Apteronotus said:
Is there another way of doing that aside from mixture distributions.

I'd say no - meaning that that the natural model for "drawing a loss at random from the box containing two types of losses" is a mixture distribution. If you change your mental picture of how a loss is generated then the mathematics could change.

For example, suppose we think some frauds resulting from exaggerating actual losses from external circumstances. This could lead to a model where one first draws a loss due to external circumstances at random and then makes another random selection to determine the amount of fraud added to that loss. From that point of view, any way of representing a joint distribution of the two variables ( external loss, added fraud) would model the situation.
 
  • #7
Hi Stephen,

Yes, I think a mixture distribution is a very natural way of describing the distribution of the "box".

Also, copulas allow a way to describe the joint distribution of the two r.v.'s.

Do you know if there any other statistic which combines or links the r.v.'s?
 
  • #8
There are various empirical ways of representing data as sum of components. It can analyzed by "principal component analysis" or "independent component analysis". I suppose the representation of data as the sum of components can be called using a mixture of distributions, but the same data can be represented in different ways by mixtures.

Almost any function of two variables can be modified to create a joint distribution. What you mean by a "way" of relating two random variables isn't clear. Is a "way" a "family" of models that has known techniques for finding a member of that family that fits some given data?


Do you know if there any other statistic which combines or links the r.v.'s?

The word "statistic" has a technical meaning in mathematical statistics. Do you mean "statistic" in that technical sense?
 

1. What is a mixture distribution?

A mixture distribution is a type of probability distribution that combines two or more probability distributions in order to model a more complex set of data. It is formed by taking a weighted average of the component distributions, where the weights represent the probability of each component occurring.

2. How is a mixture distribution different from a traditional distribution?

Unlike a traditional distribution, which represents the distribution of a single variable, a mixture distribution takes into account multiple variables and their respective probabilities. This allows for a more accurate representation of complex data sets that cannot be modeled by a single distribution.

3. What is a copula in relation to combining distributions?

A copula is a mathematical function that is used to describe the relationship between two or more random variables in a joint probability distribution. It is commonly used in conjunction with mixture distributions to model the dependence between the component distributions.

4. Can any distributions be combined into a mixture distribution?

No, not all distributions can be combined into a mixture distribution. The component distributions must have the same support (i.e. defined over the same range of values) and be compatible in terms of the types of variables they represent (e.g. continuous, discrete, etc.).

5. How are the weights determined in a mixture distribution?

The weights in a mixture distribution are typically determined through a process called parameter estimation, which involves using statistical methods to find the most likely values for the parameters of the component distributions. These weights can also be adjusted based on prior knowledge or assumptions about the data.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
12
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
14
Views
6K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
22
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
5K
  • Advanced Physics Homework Help
Replies
1
Views
730
  • Set Theory, Logic, Probability, Statistics
Replies
24
Views
6K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
3K
Back
Top