- #1
- 35,936
- 14,436
rude man said:Problem: how can p(d/-) be less than 0.001? And by 3 orders of magnitude?
Get me past this hurdle so I can start reading your 1st blog!
Thx.
BTW I did p(d/+) and it came out right at about 9%.
Yes, that formula is indeed correct and it looks like you did the arithmetic correct also. What you are seeing is the correct and expected result.rude man said:I hope that's right.
Remember that p(d) was 0.001 is our prior belief. So without even having the test we were pretty convinced that he didn’t have the disease simply because the disease is rare. When we get a negative result for the test we are going to be even more certain that he doesn’t have the disease. So a negative test will always reduce our belief that the disease is present.rude man said:Problem: how can p(d/-) be less than 0.001?
It is actually about 2 orders of magnitude. From ##1 \ 10^{-3}## to ##1 \ 10^{-5}##. As a rough estimate you can look at the Bayes factor for a negative test. The Bayes factor is $$\frac{P(-|d)}{P(-|d*)}=\frac{1-0.99}{0.99}\approx 0.01$$ so we do expect it to change by about two orders of magnitude.rude man said:And by 3 orders of magnitude?
To get samples of the posterior predictive distribution we take a sample from the posterior parameter distribution, get a set of parameters, plug those parameters into the model, and generate a predicted sample of the data. The distribution of those predictions is the posterior predictive distribution.
Source https://www.physicsforums.com/insights/posterior-predictive-distributions-in-bayesian-statistics/
Yes, it is a bit ambiguous, sorry about that. Again, I am not a statistician so I will not be able to be as rigorous as one. This is, as claimed, by and for non-statisticians.Stephen Tashi said:That definition of the posterior predictive distribution is ambiguous - perhaps the intent is use examples to make it precise, but I'd have to learn the computer code to figure it out.
Typically ##N=1##. In principle you could have ##N>1## but that is not generally done. That is why I said "generate a predicted sample". I don't know if there is a specific reason for that, but it is what I have always seen done in the literature and so I have copied that.Stephen Tashi said:we generate batches of data by repeating the following process M times: Generate one value θi of Θ from the distribution g. Then generate a lot of values xi,j,j=1,2,3,...N of X from the distribution F(x,θi).
I disagree. Even if nature has some underlying process with a single value of the parameter, we don't know what the value of that parameter is. So the posterior predictive distribution is the best prediction we can make of future observations, given our current data. Any single value of the parameters that we select would underestimate our uncertainty in the prediction. Accounting for our uncertainty in the parameters is definitely Bayesian, selecting a single value of the parameters would not be a good Bayesian approach even if we believe that nature has such a single value.Stephen Tashi said:probability model for Y does not match the Bayesian assumption we adopted because the Bayesian assumption is that a single value of the parameter Θ was used when Nature generated the data D.
I haven't seen it done that way, but I don't know why it couldn't be done that way. Each individual batch would systematically understate the uncertainty in the predicted data, but I am not sure that would mean that the resulting histogram of ##\gamma## would similarly have artificially reduced uncertainty. It would be plausible to me that the inter-batch variation would adequately represent our uncertainty in ##\gamma##.Stephen Tashi said:Each simulation of a batch of data from a distribution F(x,θi) can be used to make point estimates of a different parameter γi of the distribution F(x,θi). ( For example, if θi is the (population) mean of the distribution then the sample variance of the data we generated can be used make a point estimate of the variance σi2 of F(x,θi).) The simulation process provides different batches of data, so it provides a histogram of γi,i=1,2,...M that estimates the posterior distribution of γ. We can make a point estimate of γ using the original data D and see where this estimate falls on the histogram of the simulated data for γ.
It would be interesting to define what "best prediction" means in this case. The various senses of "best" for point estimators are well know ( unbiased, minimum variance, maximum liklihood, etc.). But what does it mean for an estimator whose outcome is a distribution to be the best estimator for a distribution?Dale said:I disagree. Even if nature has some underlying process with a single value of the parameter, we don't know what the value of that parameter is. So the posterior predictive distribution is the best prediction we can make of future observations, given our current data.
I certainly cannot be rigorous, but I think that I adequately demonstrated several very useful features of the posterior predictive distribution. In particular, one feature that I would like from a “best estimator” distribution is that it neither ignore outliers nor overfit them. I was quite excited to find exactly that in my experience applying this method to real-world data. It was one of those things that I didn’t know I wanted until I saw it.Stephen Tashi said:It would be interesting to define what "best prediction" means in this case. The various senses of "best" for point estimators are well know ( unbiased, minimum variance, maximum liklihood, etc.). But what does it mean for an estimator whose outcome is a distribution to be the best estimator for a distribution?