Graduate Bayesian Priors and relation with ignorance

ChrisVer · Jul 22, 2018

Hi everyone. I am reading through these very interesting (in terms of topics) notes:
https://arxiv.org/abs/1807.05996
And so far I am at Section 5. The author gives me the impression they don't seem to fear to call what is Bayesian and what is Frequentist, making the distinction in applications quite clear. Section 5 however gave me quite a few things I had never thought about, because I felt they were very intuitively correct.
For example I always thought that a uniform prior p(\mu) can reflect complete ignorance (I think I've read it also as the highest entropy choice or so), as for any value [\mu,\mu+d\mu has the same probability as any other. Although, now I start doubting of how sure one can be by making such a claim, as this is parametrization-dependent. I.e. if we say we are ignorant on \mu^2 by choosing a uniform prior in \mu^2, this is no longer true for \mu (Sec 5.3, 2nd paragraph's last sentence). I think this is the case because of the Jacobian that shows up once we make a transformation of variables. I find that quite unintuitive: "I have no idea what your temperature is, it can be anything between 33-46C. But that is because I chose to take your temperature as my parameter. I have prior preferences of what your temperature squared might be instead!".
Does anyone have a good explanation for that?
I think this is also somewhat related with Sec5.5, although that is specific on the "non-subjective" priors (such as Jeffrey's prior), which is also weird (I have to admit I didn't look through the references). "Weird" because we constructed the non-subjective priors to obtain the least-information (maximize our ignorance) out of a specific measurement.

Orodruin · Jul 22, 2018

Indeed, a prior that is uniform in one set of parameters need not be uniform in some other set of parameters. In many cases you have to make a choice of what you consider uninformative based on some physical expectation. In some cases, there really are uniform priors that can be objectively taken to be more uninformative than others, such as that given by the Haar measure on a compact group.

Either way, a flat prior is still a prior, you can just try to make it as uninformative as possible. Likewise, the Fisher information matrix is essentially a metric on parameter space and if you are using the Jeffreys prior you are essentially using the corresponding volume element to define the prior.

Stephen Tashi · Jul 22, 2018

ChrisVer said:

"I have no idea what your temperature is, it can be anything between 33-46C. But that is because I chose to take your temperature as my parameter. I have prior preferences of what your temperature squared might be instead!".
Does anyone have a good explanation for that?

Rigorous appoaches to establishing a relationship between specific probability distributions and "rational" beliefs set down assumptions about rationa behavior. Some app;oaches model ratiional behavior in gambling. For example, one may imagine a game where player guesses the temperature of something and loses an amount that is proportional to the absolute difference between his guess and the actual temperature. This is a different game that one in which a player loses an amount that is proportional to the absoluite difference between the square of his guess and the square of the absolute temperature. So it isn't surprising that associating belief or ignorance about temperature (or its square) with some probability distribution can depend on a specific context

If you take the outlook that "ignorance" has a definition that is independent of any ideas about the consequences of one's actions then there is indeed the problem you mention. If we take "complete ignorance" to be merely an emotional state with no specific consequences then such ignorance does not imply a unique probability distribution to model it.

Dale · Jul 22, 2018

My personal opinion is that uninformative priors are a little silly anyway. We rarely encounter a scenario where we truly have no information, and our prior needs to reflect the information available. I think that people use an uninformative prior because it is easier than actually forming a good informed prior.

My preference would be for the statistical community to spend more effort making recommendations on forming a good informed prior rather than forming an uninformative prior. With an informed prior, the choice of representation is less of an issue since, for example, it is understood that prior information on temperature will have a different distribution when transformed to temperature squared.

stevendaryl · Jul 23, 2018

When it comes to a variable with infinite range, it's clear that there is no way to be unbiased in a probability distribution. You can't allow every natural number to be equally likely, for example. But any continuous variable with a finite range can be mapped to a variable with an infinite range. (Or the other way around---any continuous variable with an infinite range can be mapped to a finite range). So I think that the idea that a flat distribution on the range ##0 \leq x \leq 1## is unbiased is a little illusory.

It seems that you don't have the problem with a discrete, finite set of variables: You can just assign each the probability ##1/N## where ##N## is the number of possibilities. But maybe it's illusory to call even that unbiased, for the following reason:

When it comes to applications of probability, it's often (most of the time?) the case that you have a fixed finite number of possibilities only because you're lumping distinct possibilities into the same bin. A different way of lumping would have resulted in a different number of possibilities, and so the idea of "unbiased" distribution (all possibilities equally likely) would give rise to different probabilities.

For a sort of silly example: If randomly pick a person out of a population, I could choose two bins: Bald versus not-bald. The unbiased weighting would say that there is a 50% chance of picking someone who is bald. But suppose instead I chose the following bins: Bald vs. Red-haired vs. Black-haired vs. Blond vs. Brown-haired vs. Gray. With this collection of bins, it seems that the unbiased weighting gives only a 16.7% chance of picking a bald person.

So I think that the impossibility of getting an objectively unbiased distribution on a continuous variable is not anomalous---in practice, there's no way of getting an unbiased distribution on a finite set of possibilities, either.

Graduate Bayesian Priors and relation with ignorance

Thread 'Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense'

Similar threads

Undergrad A variant of the Monty Hall problem

Undergrad Please Explain (actually explain) The Monty Hall Problem

Undergrad What Are the Axioms of Fuzzy Logic and How Do They Extend Boolean Algebra?

High School How Rare Is Low Smartphone Usage Among Metro Travelers in Japan?

High School Onto set mapping is the surjective set mapping, and into injective?

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers