Confidence in the means despite high standard deviation?

In summary: Student's t should be between -2 and 2.In summary, the data does not seem to fit well with a model, but the means are still within the standard deviation.
  • #1
Hypatio
151
1
I have a one-dimensional set of data which I have attached an image of. This data is an intensity z, as a function of time, t. There are many data points over time with a large amount of scatter. However, if the data is binned over very short wavelengths we see that there is smooth variation over time. The standard deviation is large due to the scatter in the data, but it is also about the same for all times. In addition, we can create a model of the phenomenon at work and fit the means very well. On the other hand, if we simply change parameters in the model, we can get a model which fits the means poorly, but is still within the standard deviation because the standard deviation is so large.

What I want to know is what kind of statistical problems are encountered if you want to argue that a model which does not fit the bins well (even if it lay within the standard deviation) is actually a bad fit? I can think of at least two things but I don't know how to talk about them as well as I would like:

1) There is clearly a systematic long-wavelength variation which can be fit by a model, but how does that translate to a statistical argument for confidence that the means are more important than the standard deviation?

2) The roughness of the data may simply be noise which is randomly distributed. If the 'roughness' of the noise (e.g. root-mean squared residual) is about the same magnitude as the standard deviation, would this not prove that the means are more robust than it would seem, given the large standard deviation?

In short, how do you argue about the robustness the means in data with large standard deviation?

What do you think?
 

Attachments

  • TEST.jpg
    TEST.jpg
    27 KB · Views: 397
Physics news on Phys.org
  • #2
Your questions about standard deviation seem to assume that all the "error" is in the luminosity measurement and that there is no error in the time measurement. Is that your assumption?
 
  • #3
Actually, the assumption is that the "error" (standard deviation) lay in a source of non-experimental random noise. I say non-experimental because the noise is real but could be linked to a second-order process. The measurements themselves (luminosity and time) could be treated as exact.

So we have a main process A which generate the large-scale variation, and then a small-scale process B which generates the "noise".

So if I want to constrain the variation due to process A, it is correct to fit to the means only and incorrect to just fit any curve within the standard deviation. But what kind of statistical ideas are involved here? How can you statistically argue that you must fit to the means?
 
Last edited:
  • #4
So basically are you trying to say given some assumption for an error residual, what is the effect of this residual in terms of how it affects the measured distribution (with respect to the true population) and how we should deal with this when we want to estimate the population distribution (without the noise)?
 
  • #5
Hypatio said:
So if I want to constrain the variation due to process A, it is correct to fit to the means only and incorrect to just fit any curve within the standard deviation. But what kind of statistical ideas are involved here? How can you statistically argue that you must fit to the means?

The "best fit" is a subjective decision until you have an objective definition of "best". Some people define "best" by expressing faith in some statistic, like the (estimator of the) mean. Some people cn define a "loss function' or "merit function" and try to minimized the expected loss or maximize the expected merit. As I recall, using the mean value of a population to predict samples from a population minimizes a quadratic loss function. The mean value of a sample is one estimator of the mean value of the population.

If you are working on something that you'll try to publish, then look in the journals that you're submitting it to and see what kind of statistical methods the editors of the journal have accepted.
 
  • #6
Chiro,

I think that sounds right. I want to say that the error residual, with respect to the sample mean (as an approximation of the true population mean), is superfluous to "process A". I suppose that this requires an estimate of the population distribution without the noise by determining the effect of the error residual on the measured distribution?
 
  • #7
You might need to invent some theory yourself but what I would recommend you do is basically look at the standard goodness of fit tests and allow some "slack" space to take into account the random noise.

Without noise, something like a Chi-Square Goodness of Fit test would be the exact way to detect a disturbance and when you add noise, then what happens is this uncertainty (and hence variance) becomes larger as you are essentially adding more variance to the distribution.

One thing I definitely think you should do is to use some statistical theory to "average" over all possible distributions to get a "mean" distribution and then use that for your chi-square goodness of fit.

You could also use a Bayesian approach.

The above ideas should all provide some way of factoring in the random noise.
 

FAQ: Confidence in the means despite high standard deviation?

1. What does a high standard deviation mean in terms of confidence in the means?

A high standard deviation indicates that there is a wide range of values in the data set. This can suggest that the data is not tightly clustered around the mean, and therefore, there may be more variability in the results.

2. How does a high standard deviation affect the accuracy of the mean?

A high standard deviation can decrease the accuracy of the mean. This is because the mean is heavily influenced by extreme values, which can skew the overall result. A high standard deviation indicates that there are more extreme values present, which can result in a less accurate mean.

3. Can we still have confidence in the mean if the standard deviation is high?

Yes, it is possible to have confidence in the mean even if the standard deviation is high. This can occur if the data is normally distributed, meaning that a majority of the values are clustered around the mean. In this case, the mean can still be a reliable measure of central tendency.

4. How can we interpret the confidence interval when the standard deviation is high?

A confidence interval is a range of values that is likely to contain the true population mean. When the standard deviation is high, the confidence interval will also be wider, indicating that there is more uncertainty in the estimate of the mean. This means that the true population mean could potentially be further away from the estimated mean.

5. Is a high standard deviation always a bad thing for confidence in the means?

Not necessarily. A high standard deviation can indicate a high level of variability in the data, but it can also indicate that there is a diverse range of values present. In some cases, this can be important information and may not necessarily affect the confidence in the means. It is important to consider the context and the distribution of the data when interpreting the impact of a high standard deviation on confidence in the means.

Similar threads

Back
Top