Determine P(t = x) if % of sample drawn from gaussian > t

  • I
  • Thread starter NotASmurf
  • Start date
  • Tags
    Gaussian
In summary, the conversation discusses a scenario where a sample group of size n is drawn randomly from a normal distribution and a number t is also drawn. The question is whether we can determine a PDF function for t if we know that the number of people in the sample group is greater than t. The conversation also mentions simplifications that can be used to determine t and asks for any help or guidance on the matter.
  • #1
NotASmurf
150
2
Sorry for the bad title, limited space

a sample group of size n, as well as a number t,is drawn randomly from a normal distribution, if we have the number of people in the sample group bigger than t, can we determine a PDF function of what value t is? Are they any simplifications we can use to hone in on it? Any help appreciated.
 
Physics news on Phys.org
  • #2
NotASmurf said:
Sorry for the bad title, limited space

a sample group of size n, as well as a number t,is drawn randomly from a normal distribution, if we have the number of people in the sample group bigger than t, can we determine a PDF function of what value t is? Are they any simplifications we can use to hone in on it? Any help appreciated.

Your question is incomprehensible. You need to take more time explaining what you mean, and if you need more space in which to say it, then take more space.

As I understand it, you have some distribution ##F(x)## (which you say is normal, but never mind that for now), and you draw a sample of size ##n+1## from ##F##. You call the first sampled-value ##t##, then call the others ##X_1, X_2, \ldots, X_n.## After that, you lose me. Apparently you connect the values of ##t## and the ##X_i## in some way, and then ask about some probabilities, but I cannot figure out what you want.

Or, maybe the value of ##t## is not really drawn from the same distribution ##F## as are the ##X_i##. In that case, are you asking for the distribution of ##t## if it is to be less than all the ##X_i?## Well, that's easy:
$$P(t < X_1, t < X_2, \ldots t < X_n) = P(X_1 > t) P(X_2 > t) \cdots P(X_n > t)\\
\hspace{4ex} = \left( \bar{F}(t) \right)^n,$$
where ##\bar{F}(t) = 1 - F(t)## is the complementary cdf. For the normal distribution ##N(\mu,\sigma)## this would be
$$\left( 1 - \Phi \left( \frac{t-\mu}{\sigma} \right) \right)^n$$ where ##\Phi(z)## is the cdf of the standard normal random variable ##N(0,1).##
 
Last edited:
  • #3
Thanks for the reply, but they are all from the same distrubution, we know that X1, X2..Xk are bigger than t, but Xk+1,Xk+2..Xn are smaller, we know all the Xi's, but NOT t, I seek to "determine" t, ie P(t = x).
Obviously if more of than half of the Xi's are bigger than t, it is more likely t is smaller than the median etc, I seek to build a distribution out of this
 
  • #4
NotASmurf said:
Thanks for the reply, but they are all from the same distrubution, we know that X1, X2..Xk are bigger than t, but Xk+1,Xk+2..Xn are smaller, we know all the Xi's, but NOT t, I seek to "determine" t, ie P(t = x).
Obviously if more of than half of the Xi's are bigger than t, it is more likely t is smaller than the median etc, I seek to build a distribution out of this

OK. You cannot hope to have the samples ##X_1, X_2, \ldots,X_n## come out in descending values: there is no reason why we cannot have ##X_1 < X_2## and ##X_2 > X_3,## for example. However the so-called order statistics ##X_{(1)}, X_{(2)}, \ldots X_{(n)}## are, by definition, the ##X_i## values re-sorted into ascending order. That is, ##X_{(1)} = ## smallest of ##S \equiv \{X_1, X_2, \ldots, X_n\},## ##X_{(2)} = ## second smallest of ##S##, all the way up to ##X_{(n)} = ## largest of ##S##. Although you want the ##X## values sorted into descending order, the standard probabilistic formulas apply to them in ascending order. For that reason, I am going to suppose that you want ##X_{(k)} < t < X_{(k+1)}##, so the first ##k## of them are less than ##t## and the remaining ##n-k## of them are ##> t##.

The question only makes sense to me if you fix the values of ##n## and ##k##, and in that case you want to know
$$P(X_{(k)} < T < X_{(k+1)})$$ where ##T## is another independently-generated sample point from the same distribution ##F## as the ##X_i.##

That is a classical problem: if the random variables ##X_i## are continuous, with probability density function ##f(x)## and (cumulative) distribution function ##F(x) = \int_{-\infty}^x f(t) \, dt## then, for any pair ##u < v## the event ##E_{u,v} = \{ X_{(k)} < u \} \cap \{X_{(k+1)} > v \}## occurs whenever ##k## if the ##X_i## are ##< u## and the remaining ##n-k## of them are ##> v##. The ones that are to be ##< u## can be chosen from the sample in ##{ n \choose k }## ways (binomial coefficient), hence
$$P(E_{u,v}) = {n \choose k} F(u)^k (1-F(v))^{n-k} $$ This implies that
$$P(X_{(k)} < T < X_{(k+1)}| T=t) = {n \choose k} \int_{u=-\infty}^t \int_{v=t}^\infty F(u)^k (1-F(v))^{n-k} \, du \, dv \\
\hspace{4ex} = {n \choose k} \left[ \int_{-\infty}^t F(u)^k \, du \right] \, \left[ \int_t^\infty (1-F(v))^{n-k} \, dv \right] $$

The answer to your question (if I have interpreted it correctly) is
$$\text{ans.} = \int_{-\infty}^\infty P(X_{(k)} < T < X_{(k+1)}| T=t) \, f(t) \, dt$$ In other words, if we set
$$A_k(t) = \int_{-\infty}^t F(u)^k \, du \;\; \text{and} \;\; B_{n-k}(t) = \int_t^\infty (1-F(v))^{n-k} \, dv$$ then we have
$$\text{ans.} = {n \choose k} \int_{-\infty}^\infty A_k(t) B_{n-k}(t) f(t) \, dt.$$ If the ##X_i## and ##T## all come from the same ##N(\mu,\sigma)## there will not be any closed-form formula for the answer (even ##F(x)## itself has no closed-form formula in terms of standard, elementary functions). Probably you would need to use numerical integration methods for a given example, and perhaps be satisfied with graphical results. However, as indicated in some of the on-line articles about order statistics, if the sample ##n## is large you might be able to employ reasonable normal-distribution approximations to some of the quantities, and so get a bit further towards a usable formula.

For more about order statistics, see
https://en.wikipedia.org/wiki/Order_statistic
and
https://www2.stat.duke.edu/courses/Spring12/sta104.1/Lectures/Lec15.pdf
 
Last edited:
  • #5
NotASmurf said:
Sorry for the bad title, limited space

a sample group of size n, as well as a number t,is drawn randomly from a normal distribution, if we have the number of people in the sample group bigger than t, can we determine a PDF function of what value t is? Are they any simplifications we can use to hone in on it? Any help appreciated.

You do not need to try to fit the entire question in the title. I am also struggling to understand what you are trying to do.
"
NotASmurf said:
but they are all from the same distrubution
They're not? You didn't say that. What distribution is X1 from? What distribution is X2 from?

NotASmurf said:
we know that X1, X2..Xk are bigger than t,
We do? How do we know that? You didn't say that before.

Isn't X1 the first random number we sample? If it's not, then what is the definition of X1?

NotASmurf said:
Obviously if more of than half of the Xi's are bigger than t,
Why would they be? How was t determned? What are the Xi? Where are they coming from?

I think you'd better try to give an example of your procedure. Then we can help you model it. I think you are using non-standard terminology for almost everything in your question.
 
  • #6
upload_2019-1-27_11-4-46.png


Ok so I made a program that takes all possible 6 length (arbitrary number for now) sample subsets, this is a simplified example, where the distribution is linear.

Now we notice that when t = 0 , 59.049% of the time 5/6 elements of the sample were bigger than t, when t = 1 20.48% of the time 3/6 elements were bigger than t, etc. Format is

"t = x : [percentage of the time that k/6 elements of the sample are bigger than t when t is x | k]. "

upload_2019-1-27_11-11-0.png


Here is a unnormalized graph of y = percentage time that t = x (for the case of 5/6 elements are greater)

I believe I can answer my question if I can understand why the graph is that shape and what type of graph it is. Any ideas?
 

Attachments

  • upload_2019-1-27_11-4-46.png
    upload_2019-1-27_11-4-46.png
    4.4 KB · Views: 397
  • upload_2019-1-27_11-11-0.png
    upload_2019-1-27_11-11-0.png
    2.6 KB · Views: 435
  • #7
It seems like you are asking for the distribution of the sample median coming from a normal population? Or maybe you are coming up with a statistic to test for skewness of your population? The latter would not make sense.
 

1. What is P(t = x)?

P(t = x) represents the probability that a randomly drawn sample from a Gaussian distribution will have a value of x at a given time t.

2. How is P(t = x) determined?

P(t = x) is determined by calculating the area under the curve of the Gaussian distribution at the specific value of x and time t.

3. What does it mean if % of sample drawn from Gaussian > t?

If % of sample drawn from Gaussian > t, it means that the percentage of values in the sample that are greater than t is being considered in the calculation of P(t = x).

4. How does the Gaussian distribution affect the calculation of P(t = x)?

The Gaussian distribution, also known as the normal distribution, is a probability distribution that is commonly used to model real-world phenomena. It affects the calculation of P(t = x) by providing the shape and characteristics of the distribution that are used to determine the probability of a specific value occurring at a given time t.

5. Can P(t = x) be greater than 1 or less than 0?

No, P(t = x) cannot be greater than 1 or less than 0. This is because probabilities are always between 0 and 1, with 0 representing impossibility and 1 representing certainty. If P(t = x) is greater than 1 or less than 0, it would not accurately reflect the likelihood of a specific value occurring at a given time t.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
440
  • Set Theory, Logic, Probability, Statistics
Replies
0
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
429
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
896
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
Back
Top