Information contained in minimum value of truncated distribution

estebanox · Apr 18, 2016

Suppose that a given population is endowed with a pair of characteristics T and K. Let's think of these characteristics as random variables

(T,K)∼BiNormal((μT,μS),(σT,σS),ρ)

I observe the realisations of T for a sample consisting of those individuals with K<a, where the selection threshold a is unknown. Let t denote the minimum observed realisation of T in this sample.

In terms of the distributions and parameters above, what is t an estimator of?

To be more precise, I am trying to establish what information is contained in t that is not already contained in the truncated sample mean and variance. My intuition is that there must be some information: if selection was taking place on T itself, then it would seem intuitive to think of t as an estimator of a; but that's not the case here...

mfb · Apr 18, 2016

What exactly is that BiNormal?

If T and K are independent, then selecting for K<a does not tell you anything, it just reduces the sample size.

estebanox · Apr 18, 2016

The notation I used in the OP stands for bivariate normal with correlation coefficient ρ. So I'm asking about the general case in which K may have information about T (not the orthogonal case, which as you mention is uninteresting).

estebanox · Apr 18, 2016

@mfb: I just realize your confusion might be due to the fact that I noted the mean and variance of the marginal distribution of K with μ_S and σ_S. That's a typo – the obvious notation would read (T,K)∼BiNormal((μT,μK),(σT,σK),ρ) for -1<ρ<1

mfb · Apr 19, 2016

Okay, so selecting K<a depletes your normal distribution of T in some one-sided way.

If the cut is weak (e.g. a>μS+2*σS) then the mean and variance are not influenced much, but t can hold more information, especially with a strong correlation between the two variables, but you will still need the mean and variance (observed or expected) to relate this to K, otherwise you are completely insensitive to shifts/rescalings.
If the cut is strong and correlation is weak, then I would expect that the overall shape gives you more information. The sample mean and variance alone don't help unless you know the mean and variance without cut.

estebanox · Apr 19, 2016

Thanks. This makes sense: if I understand correctly, your intuition is that t is informative about shift/rescaling (and how much information will be a function of the value of t and ρ). Do you know how to derive this more precisely? In other words: is there an analytical expression for it in terms of the parameters (i.e. (μT,μK),(σT,σK),ρ)?

mfb · Apr 19, 2016

estebanox said:

your intuition is that t is informative about shift/rescaling

t alone is not.
(T,K)∼BiNormal((μT,μK),(σT,σK),ρ) and (T,K)∼BiNormal((μT,μK+c),(σT,σK),ρ) lead to the same distribution of t, but the best estimate of a has to be shifted by c. With a similar but a bit more complicated formula you can show that you can also change σK and a without changing anything related to t.

I would be surprised if there is an analytic expression for the distribution of t as function of the other parameters (if we apply the cut on K>a).

estebanox · Apr 19, 2016

That's quite clear and helpful. Thanks!

estebanox · Apr 19, 2016

mfb said:

I would be surprised if there is an analytic expression for the distribution of t as function of the other parameters (if we apply the cut on K>a).

Your answer makes me wonder if applying the cut on both K and T would change things. For instance, for some other cut b, can we know what E[Min[ T | T>t , K<b]] is?

mfb · Apr 19, 2016

If you know the parameters of the distribution, and b, you can calculate this numerically. Same as above, I don't expect an exact analytic expression. For some parameters there can be a good analytic approximation.

Stephen Tashi · Apr 20, 2016

estebanox said:

I observe the realisations of T for a sample consisting of those individuals with K<a, where the selection threshold a is unknown. Let t denote the minimum observed realisation of T in this sample.

In terms of the distributions and parameters above, what is t an estimator of?

I think the question should be rephrased. For each parameter of a distribution, any function of the sample values can be called an estimator of that parameter. We pay more attention to functions are "good" estimators of the parameter. There are various ways to defined what "good" means (e.g. unbiased, minimum variance, maximum liklihood).

A question about estimators could be made specific by asking things like

1. Is the minimum value of S in the sample an unbiased estimator of the distribution parameter "a" that defines the cutoff threshold for S ?

or

2. Is there a simple looking function f of the distribution parameters for which the minimum value of S in the sample is a maximum liklihood estimator for f evaluated at the particular parameter values of the distribution being sampled ?

The more general question of "is there information" could be phrased in terms of a question about "sufficient statistics", which I think mfb is effectively doing.

Information contained in minimum value of truncated distribution

1. What is a truncated distribution?

2. How is the minimum value of a truncated distribution determined?

3. What information does the minimum value of a truncated distribution contain?

4. How is the minimum value of a truncated distribution useful?

5. Can the minimum value of a truncated distribution change?

Similar threads

Hot Threads

Recent Insights