I Determine P(t = x) if % of sample drawn from gaussian > t

  • Thread starter NotASmurf
  • Start date
Sorry for the bad title, limited space

a sample group of size n, as well as a number t,is drawn randomly from a normal distribution, if we have the number of people in the sample group bigger than t, can we determine a PDF function of what value t is? Are they any simplifications we can use to hone in on it? Any help appreciated.
 

Ray Vickson

Science Advisor
Homework Helper
Dearly Missed
10,705
1,719
Sorry for the bad title, limited space

a sample group of size n, as well as a number t,is drawn randomly from a normal distribution, if we have the number of people in the sample group bigger than t, can we determine a PDF function of what value t is? Are they any simplifications we can use to hone in on it? Any help appreciated.
Your question is incomprehensible. You need to take more time explaining what you mean, and if you need more space in which to say it, then take more space.

As I understand it, you have some distribution ##F(x)## (which you say is normal, but never mind that for now), and you draw a sample of size ##n+1## from ##F##. You call the first sampled-value ##t##, then call the others ##X_1, X_2, \ldots, X_n.## After that, you lose me. Apparently you connect the values of ##t## and the ##X_i## in some way, and then ask about some probabilities, but I cannot figure out what you want.

Or, maybe the value of ##t## is not really drawn from the same distribution ##F## as are the ##X_i##. In that case, are you asking for the distribution of ##t## if it is to be less than all the ##X_i?## Well, that's easy:
$$P(t < X_1, t < X_2, \ldots t < X_n) = P(X_1 > t) P(X_2 > t) \cdots P(X_n > t)\\
\hspace{4ex} = \left( \bar{F}(t) \right)^n,$$
where ##\bar{F}(t) = 1 - F(t)## is the complementary cdf. For the normal distribution ##N(\mu,\sigma)## this would be
$$\left( 1 - \Phi \left( \frac{t-\mu}{\sigma} \right) \right)^n$$ where ##\Phi(z)## is the cdf of the standard normal random variable ##N(0,1).##
 
Last edited:
Thanks for the reply, but they are all from the same distrubution, we know that X1, X2..Xk are bigger than t, but Xk+1,Xk+2..Xn are smaller, we know all the Xi's, but NOT t, I seek to "determine" t, ie P(t = x).
Obviously if more of than half of the Xi's are bigger than t, it is more likely t is smaller than the median etc, I seek to build a distribution out of this
 

Ray Vickson

Science Advisor
Homework Helper
Dearly Missed
10,705
1,719
Thanks for the reply, but they are all from the same distrubution, we know that X1, X2..Xk are bigger than t, but Xk+1,Xk+2..Xn are smaller, we know all the Xi's, but NOT t, I seek to "determine" t, ie P(t = x).
Obviously if more of than half of the Xi's are bigger than t, it is more likely t is smaller than the median etc, I seek to build a distribution out of this
OK. You cannot hope to have the samples ##X_1, X_2, \ldots,X_n## come out in descending values: there is no reason why we cannot have ##X_1 < X_2## and ##X_2 > X_3,## for example. However the so-called order statistics ##X_{(1)}, X_{(2)}, \ldots X_{(n)}## are, by definition, the ##X_i## values re-sorted into ascending order. That is, ##X_{(1)} = ## smallest of ##S \equiv \{X_1, X_2, \ldots, X_n\},## ##X_{(2)} = ## second smallest of ##S##, all the way up to ##X_{(n)} = ## largest of ##S##. Although you want the ##X## values sorted into descending order, the standard probabilistic formulas apply to them in ascending order. For that reason, I am going to suppose that you want ##X_{(k)} < t < X_{(k+1)}##, so the first ##k## of them are less than ##t## and the remaining ##n-k## of them are ##> t##.

The question only makes sense to me if you fix the values of ##n## and ##k##, and in that case you want to know
$$P(X_{(k)} < T < X_{(k+1)})$$ where ##T## is another independently-generated sample point from the same distribution ##F## as the ##X_i.##

That is a classical problem: if the random variables ##X_i## are continuous, with probability density function ##f(x)## and (cumulative) distribution function ##F(x) = \int_{-\infty}^x f(t) \, dt## then, for any pair ##u < v## the event ##E_{u,v} = \{ X_{(k)} < u \} \cap \{X_{(k+1)} > v \}## occurs whenever ##k## if the ##X_i## are ##< u## and the remaining ##n-k## of them are ##> v##. The ones that are to be ##< u## can be chosen from the sample in ##{ n \choose k }## ways (binomial coefficient), hence
$$P(E_{u,v}) = {n \choose k} F(u)^k (1-F(v))^{n-k} $$ This implies that
$$P(X_{(k)} < T < X_{(k+1)}| T=t) = {n \choose k} \int_{u=-\infty}^t \int_{v=t}^\infty F(u)^k (1-F(v))^{n-k} \, du \, dv \\
\hspace{4ex} = {n \choose k} \left[ \int_{-\infty}^t F(u)^k \, du \right] \, \left[ \int_t^\infty (1-F(v))^{n-k} \, dv \right] $$

The answer to your question (if I have interpreted it correctly) is
$$\text{ans.} = \int_{-\infty}^\infty P(X_{(k)} < T < X_{(k+1)}| T=t) \, f(t) \, dt$$ In other words, if we set
$$A_k(t) = \int_{-\infty}^t F(u)^k \, du \;\; \text{and} \;\; B_{n-k}(t) = \int_t^\infty (1-F(v))^{n-k} \, dv$$ then we have
$$\text{ans.} = {n \choose k} \int_{-\infty}^\infty A_k(t) B_{n-k}(t) f(t) \, dt.$$ If the ##X_i## and ##T## all come from the same ##N(\mu,\sigma)## there will not be any closed-form formula for the answer (even ##F(x)## itself has no closed-form formula in terms of standard, elementary functions). Probably you would need to use numerical integration methods for a given example, and perhaps be satisfied with graphical results. However, as indicated in some of the on-line articles about order statistics, if the sample ##n## is large you might be able to employ reasonable normal-distribution approximations to some of the quantities, and so get a bit further towards a usable formula.

For more about order statistics, see
https://en.wikipedia.org/wiki/Order_statistic
and
https://www2.stat.duke.edu/courses/Spring12/sta104.1/Lectures/Lec15.pdf
 
Last edited:

RPinPA

Science Advisor
Homework Helper
444
242
Sorry for the bad title, limited space

a sample group of size n, as well as a number t,is drawn randomly from a normal distribution, if we have the number of people in the sample group bigger than t, can we determine a PDF function of what value t is? Are they any simplifications we can use to hone in on it? Any help appreciated.
You do not need to try to fit the entire question in the title. I am also struggling to understand what you are trying to do.
"
but they are all from the same distrubution
They're not? You didn't say that. What distribution is X1 from? What distribution is X2 from?

we know that X1, X2..Xk are bigger than t,
We do? How do we know that? You didn't say that before.

Isn't X1 the first random number we sample? If it's not, then what is the definition of X1?

Obviously if more of than half of the Xi's are bigger than t,
Why would they be? How was t determned? What are the Xi? Where are they coming from?

I think you'd better try to give an example of your procedure. Then we can help you model it. I think you are using non-standard terminology for almost everything in your question.
 
upload_2019-1-27_11-4-46.png


Ok so I made a program that takes all possible 6 length (arbitrary number for now) sample subsets, this is a simplified example, where the distribution is linear.

Now we notice that when t = 0 , 59.049% of the time 5/6 elements of the sample were bigger than t, when t = 1 20.48% of the time 3/6 elements were bigger than t, etc. Format is

"t = x : [percentage of the time that k/6 elements of the sample are bigger than t when t is x | k]. "

upload_2019-1-27_11-11-0.png


Here is a unnormalized graph of y = percentage time that t = x (for the case of 5/6 elements are greater)

I believe I can answer my question if I can understand why the graph is that shape and what type of graph it is. Any ideas?
 

Attachments

WWGD

Science Advisor
Gold Member
4,203
1,759
It seems like you are asking for the distribution of the sample median coming from a normal population? Or maybe you are coming up with a statistic to test for skewness of your population? The latter would not make sense.
 

Want to reply to this thread?

"Determine P(t = x) if % of sample drawn from gaussian > t" You must log in or register to reply here.

Physics Forums Values

We Value Quality
• Topics based on mainstream science
• Proper English grammar and spelling
We Value Civility
• Positive and compassionate attitudes
• Patience while debating
We Value Productivity
• Disciplined to remain on-topic
• Recognition of own weaknesses
• Solo and co-op problem solving
Top