I Questions about error range from Bayesian statistics

  • Thread starter Buzz Bloom
  • Start date

Stephen Tashi

Science Advisor
The method I have in mind shares some of the ideas you suggest for generated simulated data. Specifically, the method repeats the following a relatively large number of times.
1. Randomly select a fraction of the data points, say for example, one half.
2. Find the value of h0 which corresponds to the least mean square fit.
Each of these h0 values is a random variable, and the collection can be used to calculate a mean and standard deviation.
This approach is called "bootstrapping" and we can look up articles about it.

I would expect this mean to be close to the previously calculated value using all of the data. I may be mistaken, but I think it likely that the original value using all the data is a better estimate of the best value to use than the second value. It is in this sense that I used the term "mean" before the distribution was generated. One question I am not certain about is the best number of iterations/trials to use for the calculation of the standard deviation. My guess is that the same number as the original set of data points is a good choice. I would also add the square of the difference between the two mean values to the square of the calculated standard deviation.
The question of which estimators are better or best depends on the technical definition of "best". Among the possible interpretations of "best" are: minimum variance, unbiased, maximum liklihood, and best mean square. I don't know which, if any, of those criteria are met by the proposed bootstrap estimator. We can probably find an article about it on web.
How would a calculation be made using Bayesian methods leading to a value for a mean and standard deviation for a distribution of possible h0 values? What would the conditional probabilities and priors be?
For a straight linear regression there is a conjugate prior, so you can do this analytically. It is described here:

The conjugate prior is normal for the regression coefficients and inverse-gamma for the error variance.

In practice, however, I think most people just use Marcov Chain Monte Carlo methods. That allows more flexibility in specifying both the prior and the model. So usually you can just specify a prior that makes sense, plug it and the data into the MCMC package of your choice and get your posterior distribution.

As far as how to specify the prior, that requires domain knowledge that I don’t have for this problem. If there were prior studies you might use those to generate a mean and a standard deviation and then use a prior with say double that standard deviation. Or if there haven’t been previous studies but some values are absurd then you could shape your prior accordingly. The point is that the prior should summarize all of the currently available information.

Want to reply to this thread?

"Questions about error range from Bayesian statistics" You must log in or register to reply here.

Physics Forums Values

We Value Quality
• Topics based on mainstream science
• Proper English grammar and spelling
We Value Civility
• Positive and compassionate attitudes
• Patience while debating
We Value Productivity
• Disciplined to remain on-topic
• Recognition of own weaknesses
• Solo and co-op problem solving