Efficient parameter estimation of marginal distribution

omg! · Jun 28, 2010

hi all,
the situation is this:

i have a conditional PDF [tex]f_{X|(D,S)}(x|d,s)[/tex] and the unconditional PDFs D and S, [tex]f_{D}[/tex], [tex]f_{S}[/tex].
Now, I can marginalize the joint PDF [tex]f_{X,D,S}=f_{X|(D,S)}f_{D}f_{S}[/tex] by integrating over [tex]d[/tex] and [tex]s[/tex] over their respective supports. What I'm really interested in is a distribution parameter of the random variable D, let's call it [tex]p[/tex]. The marginal PDF f_{X}(x|p) is the entity at interest, and for example I could perform MLE to determine the parameter [tex]p[/tex].

My question is as follows: Do you know a method that can do without explicitly calculating the marginal PDF [tex]f_{X}[/tex], i.e. not requiring integration which is computationally slow.

I really do not have a good overview over all the methods in statistics. From the sources that I was able to find, the methods Expectation-maximization algorithm, MCMC method were mentioned frequently but I don't understand how they work and how they can be useful for my problem.

thanks a lot!

winterfors · Jun 29, 2010

It doesn't make much sense being interested in [tex]f_{X|P}(x|p)[/tex].

Do you mean you want to estimate [tex]P[/tex] given an observation of [tex]X[/tex] using [tex]f_{P|X}(p|x)[/tex]?

omg! · Jun 29, 2010

winterfors said:

It doesn't make much sense being interested in [tex]f_{X|P}(x|p)[/tex].

sorry for my unclear statement. the p in [tex]f_{X}(x|p)[/tex] is not to be regarded as a random variable, it a parameter in the [tex]f_{D}(d|p)[/tex] PDF and I'm integrating over [tex]d[/tex] (and also [tex]s[/tex] from [tex]f_{S}(s)[/tex], the difference here being that all parameters of S are known to me) so that p is left in the marginal PDF as (the only) remaining parameter...

Do you mean you want to estimate [tex]P[/tex] given an observation of [tex]X[/tex] using [tex]f_{P|X}(p|x)[/tex]?

this is exactly right! i want to estimate the parameter [tex]p[/tex] which I'm currently able to do by calculating the log likelihood from the marginal PDF. But this takes a lot of time due to the 2D integration that I have to do. Is there a faster way? thank you

winterfors · Jun 29, 2010

It depends what kind of estimator of [tex]P[/tex] you want to calculate.

I would suggest computing the (Bayesian) expected value of [tex]P[/tex] instead of only the maximum likelihood (which can be misleading if the distribution [tex]f_{P}(p|x)[/tex] is very asymmetric).

To estimate the expected value, you only have to generate a MCMC sample of the joint distribution [tex]f(d,s,p,x)=f(x|d,s,p)f(d|p)f(p)f(s)[/tex] (I have dropped the indices) with the parameter [tex]x[/tex] locked to your observation of [tex]X[/tex]. This MCMC sample will be a list of triplets [tex](d,s,p)[/tex]. Taking the mean of [tex]p[/tex] (discarding [tex]d[/tex] and [tex]s[/tex]) gives the estimator of the expectation of [tex]f(p|x)[/tex], i.e. [tex]P[/tex] given [tex]X[/tex]. You can also estimate the uncertainty by calculating the variance of the sample of [tex]p[/tex]:sIn the notation above, I have included the parameter [tex]p[/tex] so that [tex]X[/tex] and [tex]D[/tex] are conditional on [tex]p[/tex]. For this Bayesian analysis, you need to assume a prior distribution [tex]f(p)[/tex] on the parameter [tex]P[/tex], e.g. a constant if you do not have knowledge of that its value might be.

The integration over [tex]D[/tex] and [tex]S[/tex] corresponds to the discarding of the [tex]d[/tex] and [tex]s[/tex] in the list of MCMC samples, so it is extremely "numerically cheap" to perform :-)

omg! · Jun 30, 2010

thanks, i have the feeling that this is exactly what I'm looking for. however, since my knowledge and experience on this topic is extremely limited, i would appreciate it if you could elaborate further...

first of all, am i right that you are suggesting to calculate a bayesian estimator using MCMC?

the reason behind the computational cost of MLE with the marginal distribution is a combined effect of the marginal density being moderate slow to calculate and the many evaluations of the marginal likelihood (essentially solving extreme value problem). the method you are proposing seems to solve the problem on both ends, is that right?

if it is not too much trouble could you write in a nutshell a step-wise recipe how to implement
the method? how do i generating a sample of the joint distribution if i have no knowledge of p?

could you comment on the expectation-maximization algorithm?

thanks a lot!

winterfors · Jun 30, 2010

omg! said:

first of all, am i right that you are suggesting to calculate a bayesian estimator using MCMC?

Yes, that is correct.

omg! said:

could you comment on the expectation-maximization algorithm?

If you use the mean of the posterior distribution as estimator (more reliable than maximum likelihood, e.g. in that is a non-biased estimator), there is no need for the expectation-maximization algorithm. It is only used for calculating maximum likelihood.

I could give you some more details on how to implement the MCMC, but it would help to know what math- or statistics software are you using for your implementation...

omg! · Jun 30, 2010

i'm using MATLAB (R2010a) with all the toolboxes available. I'm able to generate random numbers from the distributions of [tex]D[/tex] (for given [tex]p[/tex], this is parameter I'm trying to determine), [tex]S[/tex] and [tex]X[/tex] (for given [tex]d[/tex] and [tex]s[/tex]. For my current problem [tex]P[/tex] can be assumed to be constant although a non-trivial distribution could arise in the future.

thank you for your effort

winterfors · Jun 30, 2010

Matlab has a function for (Metropolis-Hastings) MCMC sampling in the statistics package called mhsample. You could use it as follows:
function [p_est p_var] = estimateMeanAndVariance(x)
%Input: value of observation of x
%Returns: estimation of p given x, uncertainty of the estimate (variance)

%You need to modify the definitions of the first 3 of these PDFs
%(they don't need to be normalized, i.e. integrate to one)

%f_x(x, d, s, p) = f(x | d, s, p)
%f_d(d, p) = f(d | p)
%f_s(s) = f(s)
%f_p(p) = f(p) sigma = 1;
nsamples = 10000;
start = [0 0 0]+0.1;

%m = [d s p]
pdf = @(m)f_x(x, m(1), m(2), m(3))...
*f_d(m(1), m(3))...
*f_s(m(2))...
*f_p(m(3)); %target distribution
proppdf = @(x,y)exp(-sum((x-y).^2)/2/sigma^2);
proprnd = @(x)x+sigma*randn(1,3);

[smpl, accept] = mhsample(start, nsamples,...
'pdf',pdf,...
'proprnd',proprnd,...
'proppdf',proppdf);

p_est = mean(smpl(:,3));
p_var = var(smpl(:,3));

hist(smpl(:,3),ceil(nsamples/500))
title('Approximate posterior distribution of p given x')
xlabel('p')

disp('accept=')
disp(accept)
disp('It should ideally be around 0.3.')
disp('If not adjust tuning parameter sigma.')

function res = f_x(x, d, s, p)
res = normpdf(x, p, 1);

function res = f_d(d, p)
res = normpdf(d, 0, 1);

function res = f_s(s)
res = normpdf(s,0,1);

function res = f_p(p)
res = 1;

DrDu · Jul 1, 2010

I don't see why to use Bayesian statistics to find estimates for the parameter p. After all, omg! stated explicitly that p is not a random variable but a fixed parameter, which excludes a Bayesian setting. I also don't see why maximum likelihood should lead to "misleading" results for the estimator if the distribution is asymmetric. In the worst case, it has a small bias, which can be corrected for.

winterfors · Jul 1, 2010

DrDu said:

I don't see why to use Bayesian statistics to find estimates for the parameter p. After all, omg! stated explicitly that p is not a random variable but a fixed parameter, which excludes a Bayesian setting. I also don't see why maximum likelihood should lead to "misleading" results for the estimator if the distribution is asymmetric. In the worst case, it has a small bias, which can be corrected for.

Just for the record: Bayesian methods are applicable also to cases where there is no uncertainty in the parameter estimates (even though there are then more simple methods to come up with the same answer). Here there is clearly uncertainty in the sought parameter, so I still think a Bayesian approach is the best.

How good or bad an estimator maximum likelihood (ML) or maximum a posteriori (MAP) methods will produce depends, of course, on the shape of the probability distribution. The bias can be small, or very large.

The biggest problem with ML and MAP methods is however that you have no measure of how uncertain the estimator is. It is far better if you can also calculate some uncertainty measure (e.g. variance) on the full posterior distribution over the parameter of interest.

Efficient parameter estimation of marginal distribution

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad The problem of points

Graduate Expected numbers of cards of a last color remaining

Undergrad How do E[X] and E[|X|] relate?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect