Information contained in minimum value of truncated distribution

Click For Summary

Discussion Overview

The discussion revolves around the properties of the minimum value of a truncated bivariate normal distribution, specifically focusing on what information this minimum value, denoted as t, contains regarding the unknown selection threshold a. Participants explore the relationship between t, the truncated sample mean and variance, and the underlying distribution parameters.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

Areas of Agreement / Disagreement

Participants express differing views on the information contained in the minimum value t and its relationship to the distribution parameters. There is no consensus on whether t can be considered a good estimator of the threshold a, and the discussion remains unresolved regarding the existence of an analytical expression for t.

Contextual Notes

Limitations include the potential dependence on the definitions of the parameters and the unresolved nature of the mathematical relationships discussed. The implications of correlation strength and the effects of different cuts on the distributions are also noted as areas requiring further exploration.

estebanox
Messages
26
Reaction score
0
Suppose that a given population is endowed with a pair of characteristics T and K. Let's think of these characteristics as random variables

(T,K)∼BiNormal((μT,μS),(σT,σS),ρ)

I observe the realisations of T for a sample consisting of those individuals with K<a, where the selection threshold a is unknown. Let t denote the minimum observed realisation of T in this sample.

In terms of the distributions and parameters above, what is t an estimator of?

To be more precise, I am trying to establish what information is contained in t that is not already contained in the truncated sample mean and variance. My intuition is that there must be some information: if selection was taking place on T itself, then it would seem intuitive to think of t as an estimator of a; but that's not the case here...
 
Physics news on Phys.org
What exactly is that BiNormal?

If T and K are independent, then selecting for K<a does not tell you anything, it just reduces the sample size.
 
The notation I used in the OP stands for bivariate normal with correlation coefficient ρ. So I'm asking about the general case in which K may have information about T (not the orthogonal case, which as you mention is uninteresting).
 
@mfb: I just realize your confusion might be due to the fact that I noted the mean and variance of the marginal distribution of K with μ_S and σ_S. That's a typo – the obvious notation would read (T,K)∼BiNormal((μT,μK),(σT,σK),ρ) for -1<ρ<1
 
Okay, so selecting K<a depletes your normal distribution of T in some one-sided way.

If the cut is weak (e.g. a>μS+2*σS) then the mean and variance are not influenced much, but t can hold more information, especially with a strong correlation between the two variables, but you will still need the mean and variance (observed or expected) to relate this to K, otherwise you are completely insensitive to shifts/rescalings.
If the cut is strong and correlation is weak, then I would expect that the overall shape gives you more information. The sample mean and variance alone don't help unless you know the mean and variance without cut.
 
Thanks. This makes sense: if I understand correctly, your intuition is that t is informative about shift/rescaling (and how much information will be a function of the value of t and ρ). Do you know how to derive this more precisely? In other words: is there an analytical expression for it in terms of the parameters (i.e. (μT,μK),(σT,σK),ρ)?
 
Last edited:
estebanox said:
your intuition is that t is informative about shift/rescaling
t alone is not.
(T,K)∼BiNormal((μT,μK),(σT,σK),ρ) and (T,K)∼BiNormal((μT,μK+c),(σT,σK),ρ) lead to the same distribution of t, but the best estimate of a has to be shifted by c. With a similar but a bit more complicated formula you can show that you can also change σK and a without changing anything related to t.

I would be surprised if there is an analytic expression for the distribution of t as function of the other parameters (if we apply the cut on K>a).
 
  • Like
Likes   Reactions: estebanox
That's quite clear and helpful. Thanks!
 
mfb said:
I would be surprised if there is an analytic expression for the distribution of t as function of the other parameters (if we apply the cut on K>a).

Your answer makes me wonder if applying the cut on both K and T would change things. For instance, for some other cut b, can we know what E[Min[ T | T>t , K<b]] is?
 
Last edited:
  • #10
If you know the parameters of the distribution, and b, you can calculate this numerically. Same as above, I don't expect an exact analytic expression. For some parameters there can be a good analytic approximation.
 
  • #11
estebanox said:
I observe the realisations of T for a sample consisting of those individuals with K<a, where the selection threshold a is unknown. Let t denote the minimum observed realisation of T in this sample.

In terms of the distributions and parameters above, what is t an estimator of?

I think the question should be rephrased. For each parameter of a distribution, any function of the sample values can be called an estimator of that parameter. We pay more attention to functions are "good" estimators of the parameter. There are various ways to defined what "good" means (e.g. unbiased, minimum variance, maximum liklihood).

A question about estimators could be made specific by asking things like

1. Is the minimum value of S in the sample an unbiased estimator of the distribution parameter "a" that defines the cutoff threshold for S ?

or

2. Is there a simple looking function f of the distribution parameters for which the minimum value of S in the sample is a maximum liklihood estimator for f evaluated at the particular parameter values of the distribution being sampled ?

The more general question of "is there information" could be phrased in terms of a question about "sufficient statistics", which I think mfb is effectively doing.
 

Similar threads

Replies
1
Views
4K
  • · Replies 12 ·
Replies
12
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 24 ·
Replies
24
Views
7K
  • Poll Poll
  • · Replies 1 ·
Replies
1
Views
3K
Replies
1
Views
2K
Replies
3
Views
3K
  • · Replies 1 ·
Replies
1
Views
4K
  • · Replies 46 ·
2
Replies
46
Views
9K
  • · Replies 13 ·
Replies
13
Views
4K