Un-estimatable properties of distributions

Stephen Tashi · Sep 11, 2012

Are there any properties of commonly encountered probability distributions that cannot be effectively estimated by sampling them?

Searching for "inestimable" lead to irrelevant links. Those links discussed not being able to estimate some parameters of a model when certain types of data are missing.

Searching for "estimable parameter" lead to links about using statistical software packages, which aren't relevant either. My question is theoretical.

I want to know about things that can (or cannot) be estimated in the sense that for each given [itex]\epsilon > 0[/itex] , the probability that the estimate is within [itex]\epsilon[/itex] of the actual value approaches 1 as the number of independent random samples approaches infinity. (This brings up the the technical question of whether the term "estimator" denotes a function of a fixed number of variables. If I want to talk about letting the number of samples approach infinity, should I talk about a sequence of estimators instead of speaking of a single estimator? )

The "properties" of a distribution are more general than the "parameters" of it. I'll define a "property" of a distribution to be some function of its parameters. For example, a (weird) example of a property of a Normal distribution is whether it's variance is rational number. You can express this kind of property as a function of the parameters. A similar example is:

On the family of Normal distributions, parameterized by their mean [itex]\mu[/itex] and the variance [itex]\sigma^2[/itex], define the function [itex]g(k,\mu,\sigma^2)[/itex] by
[itex]g(k, \mu,\sigma^2) = 1[/itex] if the k-th moment of the normal distribution wiith those parameters is irrational.
[itex]g(k,\mu,\sigma^2) = 0[/itex] otherwise.

We can also define more complicated functions, such as

[itex]\zeta(\mu,\sigma^2) = \sum_{k=1}^{\infty} \frac {g(k,\mu,\sigma^2)}{2^k}[/itex]

Can such things be effectively estimated?.

chiro · Sep 11, 2012

Here is some food for thought: Is there any property that can't be estimated from either a) an assumed existing population PDF or b) a sample PDF (i.e. histogram) generated from the sample results (i.e. put data into bins and that becomes PDF)?

Provided you have the right sample size relative to the degrees of freedom with regard to what you are estimating, the question is can you derive an estimator for any property of the underlying distribution given a function of your sample data?

I am going to speculate on your answer and say yes on the basis that every attribute of a distribution comes from a PDF whether assumed as a population, or estimated through a sample.

With regards to the questions about your irrational number, you could construct your estimator and then resort to real analysis to see if an interval contains such a number (I think you might have to do it through Dedekind cuts, but I'm not sure since there are inqeualities involving rational numbers to see if an irrational number exists in a region).

In terms of the actual function, you would probably have to resort to some kind of heavi-side or floor function (or some special trig function) to get a value of g() and then throw this in the series.

The question of evaluating the series will come down to (if it has no simple analytic result) is what kind of error you wish to tolerate and this will be based on what term you stop at (for this particular series).

You might even want to consider an integral transform on the series itself to some other domain (like the frequency domain) to see what kind of information helps you evaluate where the most influential information is in the function so that you concentrate on it rather than on the other parts.

Theoretically we know by the Methods of Moments that if we have n distinct moments then we can estimate n parameters (assuming the all independent parameters).

So given any sample, what is the requirement for a given sample to have this property? Well if we only considering non-central moments, then basically the only real requirement is that stuff doesn't diverge.

If we are going to use a sample and all values are finite (and of course the sample size is finite as well), then it means that all point-estimates of the moments will also be finite and will exist (they may be huge, but they will still be finite).

Then the question remains: can we construct any property of the distribution that we want to estimate? Again the answer I think should be yes since the moments in total completely categorize the distribution (one of the main uses of Moment Generating Functions is that a unique MGF describes a unique distribution).

So if you have an assumed population distribution and all moments are finite, then you should be able to extract all information about a distribution and create any kind of combination of these properties using some kind of specific transformation.

If you don't have any assumption of the underlying population distribution, but you can calculate the sample non-central moments, then if the sample size is large enough, one can resort to asymptotic and frequentist results (or use priors and resort to Bayesian analysis).

Stephen Tashi · Sep 11, 2012

chiro said:

Is there any property that can't be estimated from either a) an assumed existing population PDF or b) a sample PDF (i.e. histogram) generated from the sample results (i.e. put data into bins and that becomes PDF)?

II suppose you mean "property of a probability distribution", which is technically different than a property of its associated PDF. If you assume the PDF (with certainty) then you determine the distribution, hence you can theoretically determine all functions of the distributions parameters.

If you have a sample PDF then I think this amounts to having one big collection of random samples if you're doing independen samples (unless your'e saying that you lost information abaout how many samples were used to generate the histogram). So I think this amounts to the question that I posed.

With regards to the questions about your irrational number, you could construct your estimator and then resort to real analysis to see if an interval contains such a number.

Optimist! I doubt that will work because there is no [itex]\epsilon[/itex] that separates an irrational number from the "closest" rational number.

Theoretically we know by the Methods of Moments that if we have n distinct moments then we can estimate n parameters (assuming the all independent parameters).

Perhaps if we "know", the moments we know everything, but if we only have estimates of the moments, even within some [itex]\epsilon_i[/itex] of each moment, then this doesn't imply that a non-continuous function of a finite number of the moments evaluted at the estimates is within some [itex]\epsilon[/itex] of the true value. And I don't know what analysis says about limits of a function (continuous or otherwise) that has an infinite number of arguments.

chiro · Sep 11, 2012

Stephen Tashi said:

II suppose you mean "property of a probability distribution", which is technically different than a property of its associated PDF. If you assume the PDF (with certainty) then you determine the distribution, hence you can theoretically determine all functions of the distributions parameters.

If you are doing inference on some population parameter (it doesn't have to be a specific distribution, it can be a non-parametric quantity like the median or the entropy of the underlying population distribution), then as you long as you have information to make an inference (which will depend on the degrees of freedom of said information and whether you meet the requirements: for example variance/standard deviation requires two observations minimum), you should be able to construct some distribution corresponding to an estimator.

Even if you have to resort to using "worst case" variation estimates for your estimator, as long as you satisfy the information requirements (i.e. degrees of freedom) for the estimator, you can construct a mean and variance for that estimator and use a distribution from theory (like an asymptotic distribution, or perhaps some really general distribution with a specific prior).

Entropy is definitely an interesting non-parametric attribute of a distribution because it only depends on the actual probabilities and not the values of a distribution which is extremely valuable for ascertaining the distribution as opposed to its parameters (for example, the entropy for any shifted distribution should keep the value unchanged).

The simplest non-parametric test is the median though.

If you have a sample PDF then I think this amounts to having one big collection of random samples if you're doing independen samples (unless your'e saying that you lost information abaout how many samples were used to generate the histogram). So I think this amounts to the question that I posed.

If you had a specific distribution of a known form in mind, then you just use the general techniques in statistical inference. If you don't, you need to resort to non-parametric statistics and I would say that you would need to consider various forms of entropies (you can have many forms based on all possible conditional distributions) and collectively, they can help you make an inference on the actual distribution itself.

Optimist! I doubt that will work because there is no [itex]\epsilon[/itex] that separates an irrational number from the "closest" rational number.

I should have made it clear (I apologize) that I meant to look at an interval as opposed to a point estimate. Basically you would make a hypothesis that a particular interval contained an irrational number as opposed to it "being" an irrational number with certainty.

I agree that you probably can't test specifically whether something is an irrational number as a point estimate, but we can't really do that anyway in inference so I figure we should just look at the interval and save the headache of looking at an individual point (and we can use inequalities quite easily to say whether a region has an irrational number).

Perhaps if we "know", the moments we know everything, but if we only have estimates of the moments, even within some [itex]\epsilon_i[/itex] of each moment, then this doesn't imply that a non-continuous function of a finite number of the moments evaluted at the estimates is within some [itex]\epsilon[/itex] of the true value. And I don't know what analysis says about limits of a function (continuous or otherwise) that has an infinite number of arguments.

Well the way I see things, the important thing is to construct an estimator of the moments and then look at them collectively to determine both local and global properties that are being looked for.

It would be a very interesting direction of research to ascertain how continuous moments differ in any way from moments of a discrete distribution (I'm assuming the "gaps" would provide the hints needed).

I'm only going by what we know about inference of one particular piece of information and extending the idea to all sets of linearly independent information that make up the total information of the distribution (without the need to have a constrained form of parametrization).

With linear algebra, we know that the dimension of the space requires the minimum number of independent pieces of information to describe the whole thing.

Now we know that every PDF has some constraints (probabilities always positive, sum or integral equal to 1) and so on.

We also know that even for a continuous distribution, we need to bin the distributions in some specific interval in order for that bin to have a positive probability so the quantization thing is not a big issue (we actually need to do it anyway, even theoretically).

So the questions now become these: 1) How many pieces of information do we need to describe a distribution (with no knowledge of its structure at all so a worst case) with respect to a sample and its power set (We consider all possible sub-samples as well)? and 2) How do we determine the estimators of each piece of information? and finally 3) Given test-statistics and p-values for each distribution, how do we combine these values to either

a) construct a probability space of potential population distributions whereby under a given significance level, all such distributions lie in this space with said significance for all test statistics corresponding to each individual component of information

b) Take a specific hypothesis that wants to be tested (i.e. distribution is normal, some other property) and test it against the general information constraints that have been calculated with the test-statistics for each individual piece of information.

? (To finish the 3rd question)

In the above case, what should happen is that more samples shrink the variances on each estimator and constrain the possible choices of distributions just as we see happen when a huge sample size constraints the choices of some parameter value in a known distribution.

This is the best we can do statistically anyway, and all I'm doing is looking at a general non-parametric approach which looks at invariants of any distribution and the decomposition of said pieces of invariant information to construct the entire thing from scratch (as if we were re-constructing a vector in some space using the projected co-effecients with their respective basis vectors as a linear combination).

Stephen Tashi · Sep 11, 2012

This isn't a point-by-point response to Chiro's last post It's just some remarks.Thinking of a distibution as specfiied in only one way by a set of parameters may confuse the discussion, but it's a natural habit. The semantics of "degrees of freedom", in my opinion, is used in variety of contexts and ambiguous in many of them.

How many parameters do we need to specify a given distribution? I think that's at trick question. For example for a given binomial distribution defined on N+1 outcomes, we could specify it by N+1 parameters, each of which gives the probability of the corresponding outcome. Or we could specifiy by two parameters (N,p) , which is the usual way. Or, with the understanding that we only need to identify it as a particular member of a family of distributions for binomial trials on 5 things, we could specify it by one parameter p. Or if we were considering a larger faimily of distributions that included the binomial family as a subset (such as a mixture of a binomial with another distribution) we night need more than two parameters.

A "property" of a distribution might be a continuous function of its parameters under one parameterization scheme and a discontinuous function of it under a different parameterization scheme. However, it's still sensible to say that any "property" of a distribution is a function of whatever parameters define the distribution, because sufficient parameters allow us to reconstruct the entire distribution.

A "non-parametric" property of a distribution such as its median is still a function of the parameters that determine the distribution, so it includes the type of things I wish to consider. And it's not really correct to say that things like the median of a distribution aren't parameters of it. "Non-parameteric" quantities aren't parameters with respect to the way distributions are traditionally parameterized. However, there might be a way to parameterize the distributions using the "non-parametric" quantities as parameters.

chiro · Sep 12, 2012

I also want to be specific about what I mean by degrees of freedom (and I should have said this in my response): what I mean by this relates to the amount of information required for some particular characteristic whether it is an entire generic distribution without any explicit form or parameterization, or whether it's a piece of information that represents one extremely small characteristic of a general distribution.

We know that the variance requires at least two observations (even though realistically this is pointless: it is the theoretical minimum to get both a sample and a population variance and we assume that we have different independent draws from our population).

Thus we need a minimum of two observations for this particular characteristic, and usually for any estimator, this is the absolute bare minimum (without thinking about sample size considerations and whether these are required for things like an estimator based on the CLT so that the results "make sense").

So the question becomes: "How can we decompose a distribution in any number of arbitrary ways that allow inference on any general quantity in a way that it is probabilistically and statistically sound?"

Or using the main ideas (and absolutely the most important ideas) in linear algebra, what are some choices of bases for a probability distribution that allow this to be articulated mathematically (from initially formulated to a final result that is backed up by the relevant theoretical underpinnings via proofs)?

Un-estimatable properties of distributions

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Graduate Expected numbers of cards of a last color remaining

Undergrad The problem of points

Graduate Probability puzzle

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Undergrad The countability paradox of computable numbers

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect