npit said:
For example, MLE is
defined as estimating the distrubution that gives the maximum probability of the observations x_i given the distribution parameters, p(x|\theta) .
That is the underlying motivation for a maximum liklihood estimate, but technically such an estimate maximizes the "liklihood" of the data, not the "probability" of the data. For a discrete random variable, the probability mass function f(.) evaluated at a value v can be interpreted as the "probability of v", but for a continuous random variable, the probability density function f(.) evaluated at a value v, is said to give the "liklihood" of v. The "probability" of an outcome of exactly v is zero for many density functions.
However my instructor stressed that since \theta is not a random variable but a parameter of the distribution, it is meaningless to take a conditional on it, and uses the notation p(x;\theta).
Glancing at the topic on Wikipedia and Wolfram, those authors don't follow the the notation used by your instructor. However, your instructor is making a good argument from the viewpoint of "frequentist" statistics. There is a distinct difference in treating a quantity in a statistical problem as "fixed, but unknown" versus "a fixed value that is a realization of a random variable". Frequentist statistics often uses variables representing "fixed, but unknown" quantities.
I understand that the word "given" can be misleading since in natural language it can very well mean "given the specific value of the parameter" but in probability it refers to conditional probability.
That could be the distinction between "##\theta##" is a symbol representing a fixed but unspecified value of an ordinary "variable" and "##\theta##" as a symbol for the realization of a "random variable". In both cases ##\theta## can represent a fixed value. The distinction between the two situations is what ##\theta## is a value of. By analogy, in a physics book, "##\theta##" might be used to represent a specific angle of a triangle on one page and on another page it might represent the specific phase angle in a wave function.
A good example of the distinction between "fixed, but unknown" and "realized value of an random variable" is in interpreting "confidence intervals". Suppose you work a typical confidence interval problem and show "There is a 0.90 probability that the population mean is within 23.6 of the sample mean". If the mean of the sample is 130, you may
not claim that "There is a 0.90 probability that the population mean is in the interval ##[130 - 23.6, 130+23.6]##". The whole procedure of computing the "confidence interval" width is based on assuming the population man has a "fixed, but unknown" value. So you can't wind up your work by saying something about the
probabilty that the population mean is somewhere. There's no probability about it; it has a fixed but unknown value.
If you want to make claims like " "There is a 0.90 probability that the population mean is in the interval ##[130 - 23.6, 130+23.6]##", you have to define the problem differently. If you take a Bayesian approach, you assume the population mean has some prior distribution. Then you can compute a "credible interval" and make specific claims about the probability of the population mean being in it.
My question is , if the use of the | notation is valid, or the view of \theta as a random variable is illegal.
The legality of the view depends on how the problem is defined. As to notation, people use all sorts of ambiguous notation; it's largely a matter of personal preference.
Isn't MLE, however, in comparison to the MAP, viewed as a specific case of MAP where the probability of \theta [\itex] is uniform?
<br />
<br />
MAP and MLE solve two different problems. There are cases where they produce the same numerical answer. The current Wikipedia article says:<br />
<br />
<blockquote data-attributes="" data-quote="" data-source=""
class="bbCodeBlock bbCodeBlock--expandable bbCodeBlock--quote js-expandWatch">
<div class="bbCodeBlock-content">
<div class="bbCodeBlock-expandContent js-expandContent ">
A maximum likelihood estimator coincides with the <a href="https://en.wikipedia.org/wiki/Maximum_a_posteriori" target="_blank" class="link link--external" rel="nofollow ugc noopener">most probable</a> <a href="https://en.wikipedia.org/wiki/Bayesian_estimator" target="_blank" class="link link--external" rel="nofollow ugc noopener">Bayesian estimator</a> given a <a href="https://en.wikipedia.org/wiki/Uniform_distribution_%28continuous%29" target="_blank" class="link link--external" rel="nofollow ugc noopener">uniform</a> <a href="https://en.wikipedia.org/wiki/Prior_probability" target="_blank" class="link link--external" rel="nofollow ugc noopener">prior distribution</a> on the <a href="https://en.wikipedia.org/wiki/Parameter_space" target="_blank" class="link link--external" rel="nofollow ugc noopener">parameters</a>.
</div>
</div>
</blockquote><br />
That is a statement that two different problems may have the same numerical answer, not that the definition of one problem is a special case of the other.<br />
<br />
For example, if the parameter is the mean of a normal distribution, how are you going to put a "uniform prior" on it ? You will have to assume it is in some bounded interval. Setting up an estimation problem as MAP involves making more assumptions than solving an MLE problem. <br />
<br />
Maximum <i>a posteriori</i> estimation assumes that the parameter ##\theta## is a realized value of some random variable. Having done that, there are cases where maximizing the posterior liklihood of the data gives the same answer as ignoring the prior distribution of ##\theta##. For example if the family of distributions for the data has densities ##f(x;\theta)## and the density of ##\theta## is given by ##g(\theta)## and the observed data is ##x_0##, it might happen that maximizing ##f(x_0,\theta)## as a function of parameter ##\theta## gives the same answer for ##\theta## as maximizing the liklihood function ##f(x_0|\theta) g(\theta)##. That would be an example of two different mathematical problems having the same answer. <br />
<br />
What justification would you give for using the MLE method? You can make intuitive arguments like "Well, what value of the parameter should I have picked? Should I have picked one that made the data least likely? ... Huh ?". However, the MLE problem doesn't conclude anything (mathematically) about the probability of its result being close to the "true" value of the parameter being estimated.