Analysing Sample's Statistical Fluctuation Probability

  • Context: Graduate 
  • Thread starter Thread starter Swatje
  • Start date Start date
Click For Summary

Discussion Overview

The discussion revolves around quantifying the probability of statistical fluctuations in a sample set, particularly focusing on the application of Poisson and t-distributions. Participants explore various statistical methods for analyzing sample data, including the implications of using estimated versus actual means and standard deviations.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant questions how to quantify the probability of a statistical fluctuation in a Poisson distribution and suggests using the mean and standard deviation to assess significance.
  • Concerns are raised about the appropriateness of using the t-distribution when the mean is estimated, with a suggestion that this may introduce inaccuracies.
  • Another participant discusses the formulation of one-sample and two-sample t-statistics, highlighting differences in their application based on known parameters.
  • Different scenarios are presented regarding the calculation of t-values when dealing with known versus unknown means and standard deviations, raising questions about the validity of each method.
  • There is a discussion about the assumption of equal variances in statistical tests, particularly in the context of pooled t-tests.

Areas of Agreement / Disagreement

Participants express differing views on the appropriateness of various statistical methods and the implications of using estimated versus actual parameters. No consensus is reached on which methods are definitively correct or incorrect.

Contextual Notes

Participants note limitations regarding the assumptions made in statistical tests, such as the requirement for known variances and the implications of using estimated values. The discussion remains open-ended regarding the best approach to take in different scenarios.

Swatje
Messages
35
Reaction score
0
Hi,

I was wondering, if you have a certain sample set. How can you quantify the probability of a certain statistical fluctuation?

For example, let's say you have a poisson distribution of high numbers, up into the fifties. You have one peak... How do you determine the probability of this being a statistical fluctuation?

The simple answer would be for me, calculate the the mean, take the square root of the mean as the standard deviance of a poisson, and then see how much sigma's this is. You could then use the gauss tables, (for high numbers the poisson will look like a gauss), and see the probability.

Some things bother me with this, for example: the standard deviance is an estimation, therefor you should use the t-distribution. But the t-distribution requires the knowledge of the actual mean. Would it be better to use the t-distribution anyways, because the lack of knowledge of the standard deviance is a more weighing factor? How ever, if you would use the estimated mean in the t-distribution, you would still be wrong. Is there a distribution that gives values for both an estimated mean and an estimated standard deviation?

My second question lies in the fact, that if a poisson goes to a gauss for high numbers, would it mean you could estimate the standard deviation by using the square root of 1/(N-1) summation rule? Or do these meet at high numbers, so it's always better to use the poisson way?

I thank you in advance for your time.

EDIT: I forgot to ask this:

If you were to analyse this sample, would it be better to count the fluctation into your mean, or leave it out...? So do you consider it a part of your sample set, or do you take it away and see if it is consistent with the rest of your sample set...?
 
Physics news on Phys.org
The one-sample t-statistic has the form (x - m)/s.e. where x is a normally-distributed random variable (e.g. sample average) and m is a deterministic value; and s.e. is the standard error of x (m is usually the expected value of x, although in regression analysis it's typically zero.)

In contrast, the two-sample t-statistic is (x - y)/s.e.p where x and y are both random variables and the denominator is the pooled standard error of x and y. In this formulation, the t-stat. does not include a deterministic parameter value. (Note that x and y are both normal, therefore x - y is normal.)
 
Ok. Thanks for your reply. Let's consider two situations:

1) You have 10 numbers, and one hops out. What you do is you calculate s, you calculate the average, then you look at difference between the average and the "one hopping out", divide by s and then look at the t-value...

t = \frac{\bar{x}-x}{s}

With 9 d.o.f.

2) You have 10 numbers, and one hops out. But now you already know the actual mean value of your parent distribution, and again, you calculate the t-value in the same way...

t = \frac{\mu-x}{s}

With 10 d.o.f. (Because you know the mean.)

3) You have 10 numbers, and a different test of 40 numbers gave a certain average, but you do not know the s or sigma of that test, or any of the numbers to calculate it. You want to know if your mean is consistent with the other one, so you again:

t = \frac{\bar{x_{1}}-\bar{x_{2}}}{s_{1}/\sqrt{10}}

This one bothers me, but I see no other way to do it.

4) Same situation as 3, but now you know the s of the second sample.

(Two-sample pooled t-test, equal variances... A long formula which I'm not going to type out.)

Which methods are correct, and which are wrong?
 
(3) assumes equal variances (for the lack of a better solution), but (4) doesn't have to. Not so?
 

Similar threads

  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 13 ·
Replies
13
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 24 ·
Replies
24
Views
6K
  • · Replies 4 ·
Replies
4
Views
2K