Combining Conditional Probability Distributions

Whenry · May 30, 2012

Hi all,

My question is the following. Let's say I have two probability distributions;

[tex]f(x|b)\,g(x|c)[/tex]

b and c are discrete events while x is a continuos variable. i.e When the button b is pressed there is some distribution for the amount of rain fall the next day, x. When the button c is pressed there is a different distribution of rain fall the next day, x. Are there any strategies for estimating the distribution of rain fall if both buttons are pressed,

[tex]h(x|b,c)\,?[/tex]

And, what assumptions do those strategies rest on?

Thank you in advance,

Will

chiro · May 30, 2012

Whenry said:

Hi all,

My question is the following. Let's say I have two probability distributions;

[tex]f(x|b)\,g(x|c)[/tex]

b and c are discrete events while x is a continuos variable. i.e When the button b is pressed there is some distribution for the amount of rain fall the next day, x. When the button c is pressed there is a different distribution of rain fall the next day, x. Are there any strategies for estimating the distribution of rain fall if both buttons are pressed,

[tex]h(x|b,c)\,?[/tex]

And, what assumptions do those strategies rest on?

Thank you in advance,

Will

Hey Whenry and welcome to the forums.

The subtlety with this kind of problem is one of interpretation and it boils down to the atomicity of events.

In probability, we usually break things down more or less in a way that we are able to identify events that can not be broken down any further (atomic) and also are completely disjoint from every other event.

In your situation, you have to interpret what these atomic events refer to because the first situation implies that your 'b' and 'c' events are disjoint, but now that you mention that you can have 'both' being pressed makes this assumption not hold if this double keypress corresponds to a real-event.

One way you can deal with this is to get a distribution that is an 'average' of the two distributions, which will be a proper distribution mathematically with respect to Kolmogorov axioms if both of the distributions are of the same domain and also both valid PDF's. If they not, then you need to consider this.

But again, it's more important that you consider what the events refer to rather than just trying to fudge things mathematically. If you want to consider three events (B only, C only, B and C) then this will need an interpretation. If you want to consider only two (B only, C only) then this will have an interpretation.

Without an interpretation and a subsequent understanding thereof, you have a mathematical model that really has no basis for understanding.

Whenry · May 30, 2012

chiro said:

Hey Whenry and welcome to the forums.

The subtlety with this kind of problem is one of interpretation and it boils down to the atomicity of events.

In probability, we usually break things down more or less in a way that we are able to identify events that can not be broken down any further (atomic) and also are completely disjoint from every other event.

In your situation, you have to interpret what these atomic events refer to because the first situation implies that your 'b' and 'c' events are disjoint, but now that you mention that you can have 'both' being pressed makes this assumption not hold if this double keypress corresponds to a real-event.

One way you can deal with this is to get a distribution that is an 'average' of the two distributions, which will be a proper distribution mathematically with respect to Kolmogorov axioms if both of the distributions are of the same domain and also both valid PDF's. If they not, then you need to consider this.

But again, it's more important that you consider what the events refer to rather than just trying to fudge things mathematically. If you want to consider three events (B only, C only, B and C) then this will need an interpretation. If you want to consider only two (B only, C only) then this will have an interpretation.

Without an interpretation and a subsequent understanding thereof, you have a mathematical model that really has no basis for understanding.

Thank you Chiro,

I apologize for lack of clarity. I mean the following cases (I am not sure of the proper notation): [itex] f(x|B) [/itex] means the distributions of x given B and (C or not C). [itex] g(x|C) [/itex] means the distributions of x given C and (B or not B). [itex] h(x|B,C) [/itex] means the distributions of x given B and C.

So, [itex] f(x|B) [/itex] is the PDF of x over (C or not C) and B.

In my originally analogy, this would be the distribution of rain fall x when B is definitely pressed and C may or may not be pressed. The probability of C being pressed is assumed to be independent of B, [itex] p(C|B) = p(C) [/itex], and vice versa, [itex] p(B|C) = p(B) [/itex].

I can relate this to a more realistic example where B and C are not buttons but are distinct weather patterns. i.e B represents a distinct pattern over greenland, and C represents a distinct patter over the Atlantic Ocean, and x is rain fall over Englad. I have enough data to reasonably determine [itex] f(x|B)[/itex] and [itex] g(x|C)[/itex], but I would like to, hopefully infer something about [itex] h(x|B,C)[/itex] . Unfortunately I have a very small sample size of data where both B and C have occurred simultaneously. The probabilities of C and B occurring are relatively small. [itex]p(B)≈0.05[/itex] and [itex]p(C)≈0.05[/itex].

I hope that helps. I appreciate your feedback.

chiro · May 30, 2012

Whenry said:

Thank you Chiro,

I apologize for lack of clarity. I mean the following cases (I am not sure of the proper notation): [itex] f(x|B) [/itex] means the distributions of x given B and (C or not C). [itex] g(x|C) [/itex] means the distributions of x given C and (B or not B). [itex] h(x|B,C) [/itex] means the distributions of x given B and C.

So, [itex] f(x|B) [/itex] is the PDF of x over (C or not C) and B.

In my originally analogy, this would be the distribution of rain fall x when B is definitely pressed and C may or may not be pressed. The probability of C being pressed is assumed to be independent of B, [itex] p(C|B) = p(C) [/itex], and vice versa, [itex] p(B|C) = p(B) [/itex].

I can relate this to a more realistic example where B and C are not buttons but are distinct weather patterns. i.e B represents a distinct pattern over greenland, and C represents a distinct patter over the Atlantic Ocean, and x is rain fall over Englad. I have enough data to reasonably determine [itex] f(x|B)[/itex] and [itex] g(x|C)[/itex], but I would like to, hopefully infer something about [itex] h(x|B,C)[/itex] . Unfortunately I have a very small sample size of data where both B and C have occurred simultaneously. The probabilities of C and B occurring are relatively small. [itex]p(B)≈0.05[/itex] and [itex]p(C)≈0.05[/itex].

I hope that helps. I appreciate your feedback.

I misunderstood what B and C were referring to: it seems that these are three different events with clear and distinct meanings which is what you need.

If you want to infer something like a distribution, this is a little bit more complicated than making an inference on say a mean, group or means of variance.

I recommend you look into something along the lines of a Markov-Chain Monte-Carlo (MCMC) scheme in the Bayesian setting. Bayesian statistics is very useful especially in the context of not having a lot of data.

There is a program called WinBUGS:

http://www.mrc-bsu.cam.ac.uk/bugs/

This can generate distributions based on given priors, likelihoods and also based on specific data that is used to generate distributions and from this means, variances and so on.

The key thing of course is specifying the model parameters and you will need to understand Bayesian statistics and the MCMC method.

Have you had experience with this kind of thing before? Have you been exposed to Bayesian inference?

Whenry · May 30, 2012

chiro said:

I misunderstood what B and C were referring to: it seems that these are three different events with clear and distinct meanings which is what you need.

If you want to infer something like a distribution, this is a little bit more complicated than making an inference on say a mean, group or means of variance.

I recommend you look into something along the lines of a Markov-Chain Monte-Carlo (MCMC) scheme in the Bayesian setting. Bayesian statistics is very useful especially in the context of not having a lot of data.

There is a program called WinBUGS:

http://www.mrc-bsu.cam.ac.uk/bugs/

This can generate distributions based on given priors, likelihoods and also based on specific data that is used to generate distributions and from this means, variances and so on.

The key thing of course is specifying the model parameters and you will need to understand Bayesian statistics and the MCMC method.

Have you had experience with this kind of thing before? Have you been exposed to Bayesian inference?

I do have experience coding naive bayes binomial classifiers, but that is where my experience ends. I certainly have no experience using bayesian inference to arrive at PDFs of continuous variables, as is x in the above example. Neither do I have experience with MCMC.

I will need to find some crash course with examples as I need to make some quick decisions on how to find a reasonable estimate of [itex] h(x|b,c) [/itex].

Any more pointers or advice would be very appreciated.

thank you,

Will

chiro · May 30, 2012

Whenry said:

I do have experience coding naive bayes binomial classifiers, but that is where my experience ends. I certainly have no experience using bayesian inference to arrive at PDFs of continuous variables, as is x in the above example. Neither do I have experience with MCMC.

I will need to find some crash course with examples as I need to make some quick decisions on how to find a reasonable estimate of [itex] h(x|b,c) [/itex].

Any more pointers or advice would be very appreciated.

thank you,

Will

I guess the only advice would be to know the limits of your data and the other assumptions that will be used to generate simulated distributions using MCMC.

Understanding the limitations of your prior and how you describe it too will be important as well as the consequences for using priors, especially with low data points.

This kind of thing though is really application and domain specific, and you are ultimately going to have the expert knowledge that I don't have a chance of having.

Whenry · May 31, 2012

chiro said:

I guess the only advice would be to know the limits of your data and the other assumptions that will be used to generate simulated distributions using MCMC.

Understanding the limitations of your prior and how you describe it too will be important as well as the consequences for using priors, especially with low data points.

This kind of thing though is really application and domain specific, and you are ultimately going to have the expert knowledge that I don't have a chance of having.

Thank you chiro, I appreciate your feedback. I have been doing some investigating into Bayesian inference and seems that I will have to have some data points within the distribution [itex] h(x|b,c) [/itex] in order to infer the paramaters of the distribution. Unfortunately, I will have very few to none of these data points, especially considering that the full application will be considering more conditions than only b and c, ie. [itex] h(x|b,c,d,e,f,...) [/itex]. I think the best strategy may be to discretize the random variable x into categories. i.e. (in our above example) "0-2 inches of rain", "2-5 inches of rain", "5-7 inches of rain", and then I can use a multinomial naive bayes network to model the relative probabilities of each category, and then fit a distribution to that (?).

viraltux · May 31, 2012

Whenry said:

Thank you Chiro,

I apologize for lack of clarity. I mean the following cases (I am not sure of the proper notation): [itex] f(x|B) [/itex] means the distributions of x given B and (C or not C). [itex] g(x|C) [/itex] means the distributions of x given C and (B or not B). [itex] h(x|B,C) [/itex] means the distributions of x given B and C.

So, [itex] f(x|B) [/itex] is the PDF of x over (C or not C) and B.

In my originally analogy, this would be the distribution of rain fall x when B is definitely pressed and C may or may not be pressed. The probability of C being pressed is assumed to be independent of B, [itex] p(C|B) = p(C) [/itex], and vice versa, [itex] p(B|C) = p(B) [/itex].

I can relate this to a more realistic example where B and C are not buttons but are distinct weather patterns. i.e B represents a distinct pattern over greenland, and C represents a distinct patter over the Atlantic Ocean, and x is rain fall over Englad. I have enough data to reasonably determine [itex] f(x|B)[/itex] and [itex] g(x|C)[/itex], but I would like to, hopefully infer something about [itex] h(x|B,C)[/itex] . Unfortunately I have a very small sample size of data where both B and C have occurred simultaneously. The probabilities of C and B occurring are relatively small. [itex]p(B)≈0.05[/itex] and [itex]p(C)≈0.05[/itex].

I hope that helps. I appreciate your feedback.

Hi Whenry,

If you don't have an underlying model for h, or a theoretical background that helps you out to guess its behavior, or any other information about it, then you are left with your little data, and that's all you have.

The way you express the problem seems like if f and g should tell you how h behave, but then it is you that have to assess with the experience you have in the field what that relationship is. If there are no grounds for any relationship among f, g, h, B or C then you simply need more data.

Combining Conditional Probability Distributions

1. What is the concept of combining conditional probability distributions?

2. Why is it important to combine conditional probability distributions?

3. How do you combine conditional probability distributions?

4. What are some common applications of combining conditional probability distributions?

5. Are there any limitations to combining conditional probability distributions?

Similar threads

Hot Threads

Recent Insights