How To Find 68% Confidence Interval After Running MCMC

In summary: I just sort my posterior samples and pick the appropriate range.No, since you have a good sample of the posterior distribution you don't need to do any formulas. If your software package can generate the empirical CDF, then just use that to take any credible interval you like. Many Bayesian packages have a function for obtaining the highest posterior density interval at any given level. If your package doesn't have any of those then just sort your posterior samples and pick the appropriate range.Hi,I need help with a problem. I just ran an MCMC and found the best fit parameters for a model of massive gravity. I now need to find the said 68% Confidence Interval interval for those parameters. I have never done
  • #1
xdrgnh
417
0
This is not a homework problem so I'm asking it here. I just ran an MCMC and found the best fit parameters for a model of massive gravity. I now need to find the said 68% Confidence Interval interval for those parameters. I have never done anything like this before so I'm clueless where to begin or even look for reference. I have the parameter errors from the MCMC and the correlation matrix if any of that helps. Any help will be appreciated thank you.
 
Physics news on Phys.org
  • #2
I assume you have a sample of size N with sample mean ## \mu ^{-} ##, where the pop. deviation ## \sigma ## is known. An interval centered at the sample mean with width ## \sigma ## , i.e., ## ( \mu^{-} - \sigma, \mu^{-} + \sigma) ## will be a 68 % confidence interval, using the 68-95-99.7 rule. Please specify if you are using a different setup. EDIT: This last works for the mean, let me check for similar results for different parameters, i.e., different versions of the CLT for different parameters.
 
Last edited:
  • Like
Likes xdrgnh
  • #3
From the MCMC you should have a sample of the posterior distribution. You can directly construct your 68% highest posterior density intervals from that.
 
  • Like
Likes xdrgnh
  • #4
Dale said:
From the MCMC you should have a sample of the posterior distribution. You can directly construct your 68% highest posterior density intervals from that.
By posterior distribution do you mean the list of values of the parameters that were generated as it went though its 39,000 steps?
 
  • #5
WWGD said:
I assume you have a sample of size N with sample mean ## \mu ^{-} ##, where the pop. deviation ## \sigma ## is known. An interval centered at the sample mean with width ## \sigma ## , i.e., ## ( \mu^{-} - \sigma, \mu^{-} + \sigma) ## will be a 68 % confidence interval, using the 68-95-99.7 rule. Please specify if you are using a different setup. EDIT: This last works for the mean, let me check for similar results for different parameters, i.e., different versions of the CLT for different parameters.
This about the median not the mean actually. Does that change anything?
 
  • #6
Sorry, I jumped in without asking questions. What is an MCMC?
 
  • #7
xdrgnh said:
By posterior distribution do you mean the list of values of the parameters that were generated as it went though its 39,000 steps?
Yes, that is the output of a MCMC.

Are you familiar with Bayesian statistics? Each step of the MCMC method generates a sample of the posterior distribution of the parameters.

WWGD said:
Sorry, I jumped in without asking questions. What is an MCMC?
Markov Chain Monte Carlo. It is the standard method for doing Bayesian statistics. It is a very interesting and powerful technique.
 
  • Like
Likes xdrgnh
  • #8
Dale said:
Yes, that is the output of a MCMC.

Are you familiar with Bayesian statistics? Each step of the MCMC method generates a sample of the posterior distribution of the parameters.

Markov Chain Monte Carlo. It is the standard method for doing Bayesian statistics. It is a very interesting and powerful technique.

Ah, yes, I just happen to call it E, so that E=MCMC= (MC)^2 ... So Close ; ).
 
  • Like
Likes xdrgnh and Dale
  • #10
Dale said:
Yes, that is the output of a MCMC.

Are you familiar with Bayesian statistics? Each step of the MCMC method generates a sample of the posterior distribution of the parameters.

Markov Chain Monte Carlo. It is the standard method for doing Bayesian statistics. It is a very interesting and powerful technique.
Not as much as I should lol. This is my first time doing anything meaningful with statistics. So I have my posterior distribution now. From there do I just find the standard deviation for my parameters and use the formulas I learned in intro to Stat to find the confidence intervals around the median?
 
  • #11
xdrgnh said:
So I have my posterior distribution now. From there do I just find the standard deviation for my parameters and use the formulas I learned in intro to Stat to find the confidence intervals around the median?
No, since you have a good sample of the posterior distribution you don't need to do any formulas. If your software package can generate the empirical CDF, then just use that to take any credible interval you like. Many Bayesian packages have a function for obtaining the highest posterior density interval at any given level. If your package doesn't have any of those then just sort your posterior samples and pick the appropriate range.
 
  • #12
Dale said:
No, since you have a good sample of the posterior distribution you don't need to do any formulas. If your software package can generate the empirical CDF, then just use that to take any credible interval you like. Many Bayesian packages have a function for obtaining the highest posterior density interval at any given level. If your package doesn't have any of those then just sort your posterior samples and pick the appropriate range.

I'm using Mathematica. So I form the CDF but I'm not clear on what to do from there. How do I decide the appropriate range.
 
  • #13
xdrgnh said:
I'm using Mathematica. So I form the CDF but I'm not clear on what to do from there. How do I decide the appropriate range.
Well, if you want a 95% confidence interval then you would take the 0.025 quantile and the 0.975 quantile from the CDF
 
  • #14
xdrgnh said:
I now need to find the said 68% Confidence Interval interval for those parameters.

It isn't clear what you mean by a confidence interval for the parameters of the model.

Is the general scenario that you had a prior distributions for the parameters and obtained a posterior distribution for them?

Or did you set some criteria for "best" fit and solve the problem as an optimization problem without specifying a prior distribution for the parameters?

"Confidence interval" is a concept from frequentist statistics, but MCMC is usually used in Bayesian statistics.
 
  • Like
Likes WWGD
  • #15
Stephen Tashi said:
It isn't clear what you mean by a confidence interval for the parameters of the model.

Is the general scenario that you had a prior distributions for the parameters and obtained a posterior distribution for them?

Or did you set some criteria for "best" fit and solve the problem as an optimization problem without specifying a prior distribution for the parameters?

"Confidence interval" is a concept from frequentist statistics, but MCMC is usually used in Bayesian statistics.
I'm trying to reproduce the 68% confidence level in this paper. https://arxiv.org/pdf/1205.1613v1.pdf. I never studied this type of statistics before which is why I have a lot of questions and really don't understand the terminology.
 
  • #16
Dale said:
Well, if you want a 95% confidence interval then you would take the 0.025 quantile and the 0.975 quantile from the CDF

Alright I think I found the 95% confidence interval for my parameters from the posterior distribution. To check though the boundary of my intervals should be elements of my posterior distribution right? Also what is the quantile for the 68% interval and where are those quantile derived from. Thank you so much for all of your help so far.
 
  • #17
xdrgnh said:
Also what is the quantile for the 68% interval and where are those quantile derived from.
It would be (1-0.68)/2 and 1-(1-0.68)/2
 
  • #18
Does EDIT the process of obtaining confidence intervals in MCMC use some version of the CLT, i.e., do we use i.i.d random variables for each approximation? Otherwise, how do we apply confidence intervals to generic distributions? Hope this does not sound too confused. EDIT: for example, how would we compute a confidence interval for the median, variance, etc if not by using a version of CLT? I know, e.g., the sampling median obtained from a normal population has a know distribution, but, AFAIK, it does not have a nice form otherwise.
 
Last edited:
  • #19
Bayesian statistics is fundamentally different in that respect. The CLT doesn't really get used in Bayesian statistics.

In frequentist statistics you always work with probabilities that are defined as proportions over an imagined infinite number of trials given a fixed true hypothesis.

In Bayesian statistics it is the hypothesis that is uncertain given a fixed set of data. You don't assume anything about infinite numbers of repetitions of the experiment, you just refine your hypothesis as much as possible given the data that you actually have.
 
  • Like
Likes WWGD
  • #20
Dale said:
In Bayesian statistics it is the hypothesis that is uncertain given a fixed set of data. You don't assume anything about infinite numbers of repetitions of the experiment, you just refine your hypothesis as much as possible given the data that you actually have.

True, but looking at the paper that the OP linked, I can't tell whether Bayesian statistics is being used. (I'm not a cosmologist.) The term "priors" for the cosmic microwave background radiation (CMBR) is used, but this may not refer to a probability distribution in the Bayesian sense of a prior.

Can anyone comment on what "priors" means in the context of CMBR?
 
  • #21
Stephen Tashi said:
True, but looking at the paper that the OP linked, I can't tell whether Bayesian statistics is being used. (I'm not a cosmologist.)
I am also not a cosmologist, but they do say "Bayesian framework", and they use MCMC, and talk about both the likelihood and the prior. So if it isn't Bayesian, it is a very well camouflaged frequentist.
 
  • #22
Dale said:
Well, if you want a 95% confidence interval then you would take the 0.025 quantile and the 0.975 quantile from the CDF

does that give you in general the Highest Density Interval?
 
  • #23
Valerio M said:
does that give you in general the Highest Density Interval?
Not in general, no. In particular, if a posterior is skewed, then the highest density interval will take more of the “short” side.
 

1. What is MCMC and how is it related to confidence intervals?

MCMC stands for Markov Chain Monte Carlo, which is a computational method used in statistics to obtain a large number of samples from a probability distribution. These samples are used to estimate the parameters of the distribution, which can then be used to calculate confidence intervals.

2. How does running MCMC help in finding the 68% confidence interval?

Running MCMC allows for the generation of a large number of samples from the posterior distribution of the parameters. These samples can then be used to calculate the mean and standard deviation of the distribution, which can be used to construct a confidence interval around the estimated parameter value. The 68% confidence interval is calculated by taking the mean ± one standard deviation.

3. What are the assumptions made when calculating a 68% confidence interval using MCMC?

The main assumption is that the posterior distribution of the parameters is approximately normal. Additionally, the samples generated by the MCMC algorithm should be independent and identically distributed.

4. How does the number of MCMC iterations affect the accuracy of the 68% confidence interval?

The more iterations that are run, the more accurate the estimated parameters and confidence interval will be. However, there is a trade-off between accuracy and computational time, as running a larger number of iterations can be time-consuming.

5. Can MCMC be used to calculate confidence intervals for any type of data?

Yes, MCMC can be used to calculate confidence intervals for any type of data as long as the underlying distribution is known or can be approximated. It is particularly useful for complex and high-dimensional data sets where traditional methods may not be applicable.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
18
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
24
Views
4K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
2K
Back
Top