How to Quantify Statistical Significance of Model Deviation from Data Points?

  • Context: Graduate 
  • Thread starter Thread starter Amanheis
  • Start date Start date
  • Tags Tags
    Data Theory
Click For Summary

Discussion Overview

The discussion centers on quantifying the statistical significance of model deviations from a set of predicted data points, specifically in the context of a one-parameter model of a time-dependent quantity. Participants explore methods to assess how well alternative models fit the predicted data, which includes error bars, without performing a traditional fit.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested

Main Points Raised

  • One participant seeks to quantify how much models with parameters a1 and a2 deviate from predicted data points, which include error bars, and expresses difficulty in proceeding after calculating the likelihood L(a).
  • Another participant questions why the actual observed values cannot be calculated from the predicted values and error terms.
  • A subsequent participant clarifies that the error bars are indeed forecasted errors, prompting a request for further details on what these errors represent.
  • One participant provides an example involving historical temperature predictions to illustrate their understanding of the problem, noting that they want to constrain the model as much as possible based on expected error bars from a future experiment.
  • Another participant suggests a method for determining whether projections based on different parameters (a0 vs. a1) are statistically significantly different by resampling data points using a joint distribution implied by the error bars.

Areas of Agreement / Disagreement

Participants express differing levels of understanding regarding the nature of the predicted errors and the feasibility of calculating observed values. There is no consensus on the best method to quantify the statistical significance of model deviations, and multiple approaches are discussed.

Contextual Notes

The discussion involves assumptions about the nature of the predicted errors and the statistical methods applicable to the problem. Some participants may have different interpretations of the data and its implications for model fitting.

Who May Find This Useful

This discussion may be useful for researchers and practitioners in fields involving statistical modeling, particularly those dealing with predictions and uncertainties in experimental data.

Amanheis
Messages
67
Reaction score
0
I need to quantify the statistical significance of how much a model deviates from a given set of data points, and I cannot do a fit.

Let's say the model is a one-parameter description of some time dependent quantity f_a(t). I have data points at n different times including error bars, so pi = {ti, fa(ti), sigmai}. The reason I cannot do a fit is that the data is actually the predicted errors for some fiducial value a_0, and the fit would obviously just find a = a0.

I want to know how far off a couple models with either a = a1 or a = a2 are. So in other words, how well does fa1(t) fit the data points pi? Can I rule it out on some confidence level?


It seems difficult to express my problem, I hope it became clear enough. All I am asking for is a hint in the right direction, so if you know some relevant reference, I can just get it in the library. I already calculated the likelihood L(a) but don't really know how to proceed.

Thanks.
 
Physics news on Phys.org
If you have the predicted (projected) value and the error term for each t, why can't you calculate ("back out") the actual observed value corresponding to that t?
 
I don't understand. How am I supposed to do that? The actual observed value won't be available until several years from now. And "observed value" implies that it is observed, I don't see how I can calculate it.
 
So, the error bars you mentioned are forecast (future) errors? That wasn't clear to me.

Can you post your understanding of what these errors are, or what they represent?
 
Last edited:
Yes, the error bars are predicted. I am basically using the specifications of the experiment and the Fisher matrix formalism to assess the quality of the constraints that will be put on fiducial values of a set of parameters.
 
Let me make up an example: you are predicting the value of random variable Y for, say February 2012. You have a point estimate, \hat Y(Feb. 2012), and an error "bar" "wrapped around" the point estimate. Have I got it?
 
EnumaElish said:
Let me make up an example: you are predicting the value of random variable Y for, say February 2012. You have a point estimate, \hat Y(Feb. 2012), and an error "bar" "wrapped around" the point estimate. Have I got it?

Not quite. We already kind of know the value of Y(Feb 2012). For the sake of clarity let's choose a date in the ast, say the average temperature at some place in the year 1500. Also I will adopt my notation from the original post.
We already kind of know how the temperature behaves over time, depending on a set of parameters, one of them being a=a0. According to earlier measurements, we expect something like f(1500)=20C, f(1700)=22C and f(1900)=25C.
But we want to constrain it as much as possible and plan an improved experiment. According to my calculations, we are expecting to get error bars of .1C, 0.15C and 0.2C on each of these values after that new experiment. Now we want to see if we are going to be able to test a competing theory (ie. the parameter a has a different value). I want to quantify how much the curve of the new theory with a=a1 would deviate from the data with the predicted error bars.
 
I can describe how to tell whether the projections based on a0 are statistically significantly different from a projection based on a1. The error bars imply a joint distribution of temperatures across years. You can resample data points using this joint distribution and re-estimate f as a function of t based on each (re)sample. This will yield a set of values (an empirical distribution) for the a parameter, with mean = a0. You can then test whether a1 is statistically different from the mean, using the empirical distribution of a.
 
Last edited:

Similar threads

  • · Replies 20 ·
Replies
20
Views
3K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
Replies
28
Views
4K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 23 ·
Replies
23
Views
4K
Replies
24
Views
3K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 13 ·
Replies
13
Views
2K
  • · Replies 18 ·
Replies
18
Views
3K