Bernoulli and Bayesian probabilities

Click For Summary
SUMMARY

This discussion focuses on the comparison of Bernoulli and Bayesian probabilities in estimating parameters from a dataset. The maximum likelihood estimate (MLE) of the Bernoulli parameter is calculated as 0.75, while Bayesian estimates using Beta distributions yield 0.7333 and 0.6. The accuracy of these estimates is contingent upon the true population parameter, which remains unspecified in the problem statement. The discussion concludes that without knowing the true parameter value, determining the accuracy of MLE versus Bayesian estimates is not feasible.

PREREQUISITES
  • Understanding of Bernoulli Distribution and its parameter estimation
  • Familiarity with Maximum Likelihood Estimation (MLE)
  • Knowledge of Bayesian statistics and Beta distribution
  • Proficiency in MATLAB or similar programming for statistical calculations
NEXT STEPS
  • Explore the concept of Maximum Likelihood Estimation (MLE) in depth
  • Learn about Bayesian parameter estimation and its applications
  • Study the properties of the Beta distribution and its role in Bayesian analysis
  • Investigate methods for determining the true population parameter in statistical models
USEFUL FOR

Statisticians, data scientists, machine learning practitioners, and anyone involved in parameter estimation and statistical modeling.

hdp12
Messages
67
Reaction score
2
Summary:: Hello there, I'm a mechanical engineer pursuing my graduate degree and I'm taking a class on machine learning. Coding is a skill of mine, but statistics is not... anyway, I have a homework problem on Bernoulli and Bayesian probabilities. I believe I've done the first few parts correctly, but the final question asks me to explain why one is more accurate than another, and the inverse as well. I am not sure, so I figured I'd reach out here and ask. The work and appropriate equations are below:

1. (10 pts) Consider 20 values randomly sampled from the Bernoulli Distribution with parameter :

Matlab:
x = [1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1];
N = length(x);

(a) Estimate the parameter using the maximum likelihood approach and the 20 data values.
Matlab:
u = sum(x==1)/N; % u = 0.75
bern = (u.^x).*(1-u).^(1-x)

p = 0;
for n = 1:N
    pTemp = x(n)*log(u) + (1-x(n))*log(1-u);
    p = p+pTemp;
end

%ln(a) = b <--> a = e^b
p = exp(p); % p = 1.3050e-05
(b) Estimate the parameter using the Bayesian approach. Use the beta distribution Beta(a=8, b=4).
Matlab:
% a + sum(xn),b + N - sum(xn)
% (8 + 15 - 1) / (12 + 20 - 2) = 22/30
u = 22/30; % u = 0.7333

(c) Estimate the parameter using the Bayesian approach. Use the beta distribution Beta(a=4, b=8).
Matlab:
% (4 + 15 - 1) / (12 + 20 - 2) = 18/30
u = 18/30; % u = 0.6
(d) Discuss why the estimation from (b) is more accurate than that from (a) and why the estimation from (c) is worse than that from (a).
Matlab:
uA = 0.75;
uB = 0.7333;
uC = 0.6;

1610979062961.png

1610979089554.png


Thanks in advance for any help!
 
Physics news on Phys.org
I think something goes wrong in the first step (a).
The max likelihood estimate of the Bernoulli parameter is simply the number of 1s divided by the sample size, which gives 15/20 = 0.75.
I don't understand your reason for doing the calcs you show above, which appear to give an answer of approx 10^-5.

It is too hard to work out what you were trying to do based only on computer code. Better to write out mathematical reasoning and explain the steps you took.

Also, we can't assess accuracy without knowing what the population parameter is. It looks like you meant to include that in the problem statement, but it is missing.
 
  • Like
Likes   Reactions: hdp12
I did a little too much in part A, you're right.

nothing else was provided in the problem statement though. Is there any reason why one estimation would be more accurate than another method of estimation?
 
Do you know the true value of your parameter u (e.g. with what value of the parameter your x was generated)? Otherwise I don't understand the question about accuracy ... How would you know if the MLE or the Bayesian posterior is more accurate if you don't know the actual value that you try to predict in this case?

However, if I assume that true value should be somewhere near 0.7, then the comparison between (c)-(a) is rather straightforward if you plot your Beta function prior in the case of (c)...
 
hdp12 said:
Is there any reason why one estimation would be more accurate than another method of estimation?
Yes. "Accuracy" measures the difference between the estimate and the true population parameter. Your data sample gives a MLE estimate of 0.75. You have done two different Bayesian calcs, call them B1 and B2, producing estimates of 0.7333 and 0.6. If you line these up on a number line, you can see that :
  • MLE is most accurate if the population parameter is greater than (0.7333 + 0.75) / 2, approx 0.742
  • B1 is most accurate if the population parameter is between (0.6 + 0.7333) / 2 and (0.7333 + 0.75) / 2 (approx between 0.67 and 0.742)
  • B2 is most accurate if the population parameter is less than (0.6 + 0.7333) / 2 (approx 0.67).
Any of those three could be true given the information provided in the OP!

As @ChrisVer points out, if the question doesn't state the population parameter's value, it is impossible to answer.
 

Similar threads

Replies
1
Views
1K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 11 ·
Replies
11
Views
2K
Replies
2
Views
1K
Replies
1
Views
1K
  • · Replies 4 ·
Replies
4
Views
2K
Replies
6
Views
2K
  • · Replies 14 ·
Replies
14
Views
3K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 11 ·
Replies
11
Views
2K