How to set up and interpret Chi^2 test results for my data?

  • Context: Graduate 
  • Thread starter Thread starter liquidFuzz
  • Start date Start date
  • Tags Tags
    Chi square
Click For Summary
SUMMARY

The discussion focuses on setting up and interpreting Chi-squared (χ²) test results for data derived from a nonlinear growth model. Participants clarify that observed values (y_i) are the actual data points, while expected values (e_i) are derived from the model, such as simple linear regression. A low χ² value does not always indicate a good fit; it must be evaluated against a ChiSq per degree of freedom (P) table to assess its significance. Additionally, the treatment of bins with expected values close to zero is discussed, emphasizing the importance of merging bins to avoid skewed results.

PREREQUISITES
  • Understanding of Chi-squared test and its formula: χ² = Σ((y_i - e_i)² / e_i)
  • Familiarity with least squares estimation and regression models
  • Knowledge of statistical significance and p-values
  • Experience with data binning and distribution testing
NEXT STEPS
  • Research the Chi-squared goodness of fit test and its applications
  • Learn about merging bins in statistical tests to handle zero expected values
  • Explore the implications of overestimating uncertainties in data analysis
  • Investigate geometric Brownian motion and ARIMA models for time series analysis
USEFUL FOR

Statisticians, data analysts, researchers working with growth models, and anyone involved in hypothesis testing and data distribution analysis.

liquidFuzz
Messages
107
Reaction score
6
I have a curve fit of a nonlinear function (a growth model). As a sanity check I do a chi2 test, but I'm not really sure how to set it up properly. My data is as such: sample point and estimated points. In a chi2 test the variables are often referred to as observations and expected. What would this translate to in a chi test of a least square method. In addition, if I get a really low chi2 test value, is that always a good thing, i.e., there's nothing I should worry about in close proximity to origin or such?

Thanks!
 
  • Like
Likes   Reactions: Agent Smith
Physics news on Phys.org
The ##\chi^ 2##-test function is defined as $$\sum_{i} \frac{\left(y_i-e_i\right)^2}{e_i},$$
where ##y_i## are the observed value and ##e_i## is the estimated value (say, if you want to test if your data come from a binomial distribution, ##\text{Bin}\left(n,p\right)##, then ##e_i=n\cdot p##.) After extracting the test function value you have to compare it to a table of values (according to the degrees of freedom you have), otherwise you could extract the related to the test value p-value and compare it to the significance level ##\alpha ##. I hope I gave some kind of an answer to your question, as (tbh) I didn't understand it completely.
 
  • Like
Likes   Reactions: Agent Smith and FactChecker
What are my observed values yi and expected value ei if you calculated a model with a least square method?
 
The observed values are the data given and the expected ones, if your model is, say the simple linear, are $$e_i=\hat{\beta}_0+\hat{\beta}_1x_i,$$
where ##\hat{\beta}_0## and ##\hat{\beta}_1## are the least square estimators.
Analogously, for any other kind of model, e.g. multiple regression etc.
 
  • Informative
Likes   Reactions: berkeman
Thanks!
 
  • Like
Likes   Reactions: berkeman
I am only familiar with the Chi-squared goodness of fit test which compared the histogram of data with the expected theoretical frequency distribution. This seems to be a different test.
 
liquidFuzz said:
f I get a really low chi2 test value, is that always a good thing, i.e., there's nothing I should worry about in close proximity to origin or such?
Look at the ChiSq per degree of freedom (P) table to see how reasonable a low value can be. It will give the probability that your value of p will be exceeded for the degrees of freedom (number of data points minus the number of parameters). If the probability is large then the ChiSq is probably too low. One thing that can account for this is overestimating the uncertainties of the data.
 
A question regarding zero entries in expected values.

Lets say I want to test whether a set of data could be considered normal distributed. How do I treat bins where the expected value is close to zero. Fewer bins or just upright rejecting the hypothesis..?

Edit, additional, if I instead tests against the accumulative distribution, can I use that as a test?
 
liquidFuzz said:
A question regarding zero entries in expected values.

Lets say I want to test whether a set of data could be considered normal distributed. How do I treat bins where the expected value is close to zero. Fewer bins or just upright rejecting the hypothesis..?

Edit, additional, if I instead tests against the accumulative distribution, can I use that as a test?
If you include those bins in your test, does it change the results? You can combine some bins to add up to non-zero expected numbers. If your hypothesized distribution has many expected zero bins and your sample has results in those bins, than the hypothesis might be rightfully rejected. It is not unusual for the extreme tails of an actual distribution to be different from a normal distribution. You will have to use your judgement, based on the situation, on what to do in that case.
 
  • #10
Thanks! I'll play around with merged bins and see if I get something useful out of it.

I was hoping for a clear yes or no... 🤪
 
  • Haha
Likes   Reactions: Agent Smith
  • #11
If you have a growth model, linearize it by taking differences or log returns. If you don’t do this the data won’t be stationary and most statistical tests won’t make sense. Look at geometric Brownian motion or ARIMA models for examples
 
  • #12
liquidFuzz said:
I was hoping for a clear yes or no...
You poor soul! :-p
I read Chi-squared test as part of the null hypothesis. It's interesting.

##\displaystyle \sum_i \frac{(y_i - e_i)^2}{e^i}## is the crux of it. Gracias @mathguy_1995
 

Similar threads

  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 20 ·
Replies
20
Views
4K
  • · Replies 5 ·
Replies
5
Views
4K
  • · Replies 5 ·
Replies
5
Views
9K
  • · Replies 2 ·
Replies
2
Views
4K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 6 ·
Replies
6
Views
5K
  • · Replies 6 ·
Replies
6
Views
23K
  • · Replies 4 ·
Replies
4
Views
2K