# Bayesian probability question about Dirichlet prior distributions

1. Oct 14, 2012

Hi there,

I have a question about Bayesian probability.

I have a list of the starting times of journeys. I binned the data into 15 minute bins so I have 96 bins in total (4*24=96). So for example a journey start time of 08:05 am would be in bin number 29.

As an example here is the data for bins numbers 28-50 (8am until 12.30pm).

https://dl.dropbox.com/u/54057365/All/bin.JPG [Broken]

I've calculate the frequency density of the bins in the last column.

Would anybody be able to tell me how I would do the following:

Taking Dirichlet prior distribution over the density of each bin for a multinomial model, you estimate the parameters. This way you get a non-zero probability for each bin. Each parameter is basically some prior parameter plus the frequency of the data in that bin.

Would anybody know if this can be done with an excel addin?

Regards

John

Last edited by a moderator: May 6, 2017
2. Oct 16, 2012

Hello,

Just wondering if anybody has had time to consider my question?

Regards

3. Oct 16, 2012

### Stephen Tashi

To many mathematicians the stumbling block in your question is the phrase "in Excel". Your question is rather like asking "How do I tie my shoes - in a phone booth, standing on my head?" Lots of people know the answer to the first part. Not many know the answer with the added restrictions.

I suggest you look in the computer sections of the forum if you want to know about Excel. I'm not very familiar with those sections, so I can't suggest which ones. Do a search on "excel" and find places on the forum where people who know about sophisticated excel plugins make posts.

4. Oct 16, 2012

Hi Steven,

Thanks for your comment. It doesn't necessarily have to be in excel, how would you suggest doing it?

Thanks

John

5. Oct 16, 2012

### Stephen Tashi

(The term "frequency" is not a good choice of words since it might also mean the fraction of trials that fell in a given bin. That is not the intended meaning here.)

A plausible 'non-informative" prior is the the Dirichlet distribution with all parameters set = 1.
The posterior is a Dirichlet distribution with parameters $\alpha_i = 1 + b_i$ where $b_i$ is the number of observations in the data that were in bin $i$.

What is it that you want to do with this posterior distribution?

6. Oct 17, 2012

### chiro

You can code routines in Excel if they don't exist using formulas or VBA code.

7. Oct 18, 2012

Hi,

I'm confused on the exact steps on how you determine the posterior distribution. I want to sample from the posterior distribution. I've been told that these are the steps to follow but I can't seem to grasp the method. I'd be grateful if you confirm if I'm doing it correctly. I want to estimate the posterior distribution of journey start times. I believe you do the following.

1) You have the column of all starting times.

2) Choose a bin size, giving lets say k bins.

3) Taking Dirichlet prior distribution over the density of each bin for a multinomial model, you estimate the parameters. This way you get a non-zero probability for each bin. Each parameter is basically some prior parameter plus the frequency of the data in that bin. The data is binned when you want model your data using a discrete probability distribution or create a frequency table and do classical analysis. In the bayesian approach you get a discrete probability model for the probability in each bin.

4) Thus you have estimated f(journey starting time)

So I binned the data into 15 minute bins so I have 96 bins in total (4*24=96). So for example a journey start time of 08:05 am would be in bin number 29.

As an example here is the data for bins numbers 28-50 (8am until 12.30pm).

https://dl.dropbox.com/u/54057365/All/bin.JPG [Broken]

I'm using this excel software

Is this problem an example of a Conjugate prior distribution? http://www.vosesoftware.com/ModelRiskHelp/index.htm#Analysing_and_using_data/Bayesian/Bayesian_inference.htm

So I have the frequency in each bin. Is the next step to find the prior distribution of each bin? and then the likeihood function for the data in each bin?

How do these come together to form one complete posterior distribution that you can sample from? Am I understanding this correctly or have I got it all wrong?

Could you tell me if this page is doing what I am trying to do?

Apologies for the long winded post. I've been trying to do this for a few days now with no success.

I'm going to be doing conditional probabilities next so I need to grasp the basics.

Last edited by a moderator: May 6, 2017
8. Oct 19, 2012

### chiro

If you are using your data to update the "parameter" of your distribution then typically what you do is your last posterior becomes your new prior and you repeat the cycle if you are using previous data to update parameters.

If you are doing this in a computer with bins, then if your likelihood for each bin is P(X|theta) and your prior is P(theta) for each bin then for each bin multiply the two together for each cell and then once you've done this, normalize the whole distribution (i.e. sum up all newly created cells of the posterior distribution and divide the whole distribution by this values).

You can then take this distribution and get for example a point estimate for the parameter (by taking the expectation of each element in the theta vector) and you can even get confidence intervals as well by getting the appropriate quantile.

If for example you got the point estimates from the posterior, these can be used in your new prior and the process goes on.

9. Oct 19, 2012

Thank you foe explaining that.

I have 2 questions:

1. Is the frequency density (last column) the prior distribution for that bin?

https://dl.dropbox.com/u/54057365/All/screenshot.JPG

2. How do you estimate the likelihood for a bin? I can't seem to find an similar example on line?

John

10. Oct 19, 2012

### chiro

If you are using a multinomial model, then you will have quite a complex distribution that sums over many different combinations of permutations. For example if you have a multinomial with five parameters for N trials, this corresponds to the number of events corresponding to say rolling a dice N times and 6^N gets very large very quickly.

Your prior for P choices should have P-1 different unique probabilities (the last can be calculated by 1 - sum_of_the_rest), so you will have 5 different values for your actual prior (like a vector).

Your likelihood will contain an entry for every possible combination and you can think about this in either a multi-dimensional way or in a uni-dimensional way where the uni-dimensional way is basically like taking every combination and laying it out in a big massive line (one way to think about this is instead of having a square, you take the square and slowly un-pack all the cells in one huge line instead of seeing it as two dimensional).

The likelihood function for the multinomial is just the multinomial like any likelihood function and it will be represented as P(X|theta) where theta is your set of parameters (remember for P choices you will have P-1 parameters) and X will be any particular possible outcome.

What you will need to do is relate your excel spreadsheet data with the actual choices with regards to the probabilities.

So as an example with the three throws of a dice we can get anything from {1,1,1} to {6,6,6} so if you lay this out in a one-dimensional form, then you you could have the first cell corresponding to {1,1,1} and the last cell corresponding to {6,6,6}.

So the likelihood function will be one field and the prior for that combination will be some fixed vector.

Now your posterior probabilities will basically have distributions for every single parameter (i.e. P-1 different ones) and your posterior will simply be a vector where you calculate the likelihood for every combination (with regard to those parameters) and then you multiply that by the given prior vector to get a posterior vector.

Remember that your posterior of P(theta|X) will be a vector where theta corresponds to <theta1,theta2,theta3,....>^T where this vector is the vector of all parameters in the multinomial.

This new posterior will become your prior and will represent the "updated" version of the parameters under the given sample you got before given that the likelihood model is a multinomial distribution.

11. Oct 19, 2012

### Stephen Tashi

If you read my previous post, I describe how you get a specific posterior distribution. To draw a sample using that distribution, do two steps.

1) Draw a sample from the posterior distribution. This sample gives you a vector of probabilities.
2) Use the vector of probabilities as the probabilities that a start time falls in a given pin. Make a random draw to determine in which bin the start time falls.

Repeat both steps each time you want to generate a random sample.

That's not a specific question because that page covers a wide variety of topics.

If you are using a Dirichlet prior, you don't have to calculate the posterior. People have already done this and the formula is known.

http://en.wikipedia.org/wiki/Dirichlet_distribution, see the section "Conjugate to categorical/multinomail".

The only complicated thing I see about your problem is how to take a sample from a Dirichlet distribution. We can discuss that or someone other forum member may know an easy way. The general version of this question is "How do I draw a random sample from a joing probability distribution of several random variables?".

12. Oct 19, 2012

### digfarenough

Regarding Stephen's final point:
That wikipedia page also describes sampling a Dirichlet distribution, relying on the fact that each element in the vector drawn from a Dirichlet distribution is gamma distributed, with the overall vector normalized by its sum. If you have MATLAB this is really easy to do, but in Excel--I'm not so sure. I don't think Excel can generate gamma distributed numbers. I did a quick search for you and something like this package might help: it has a function called ntrandgamma that maybe what you're looking for. It seems to be free, but I have not used it and cannot vouch for it.

The general way of sampling from a joint distribution is also mentioned on that wikipedia page, where you sample from the marginal distribution of one element in a multivariate distribution, and then repeatedly use conditional distributions to sample each additional element (see chain rule). Intuitively it is easy to imagine for a 2D distribution: you pick an x point using the marginal distribution of the x coordinate, then the slice through the distribution at that x value is the conditional distribution of y on x, which you sample from to get the y point, and then you have your desired point.

Hope that is useful!