How to set up and interpret Chi^2 test results for my data?

  • Context: Graduate 
  • Thread starter Thread starter liquidFuzz
  • Start date Start date
  • Tags Tags
    Chi square
Click For Summary

Discussion Overview

The discussion centers around the setup and interpretation of Chi-squared (χ²) test results in the context of a nonlinear growth model. Participants explore how to properly define observed and expected values in a Chi-squared test, particularly in relation to least squares fitting, and address concerns regarding low Chi-squared values and their implications.

Discussion Character

  • Technical explanation
  • Conceptual clarification
  • Debate/contested

Main Points Raised

  • One participant seeks clarification on how to set up a Chi-squared test for their nonlinear growth model, questioning the definitions of observed and expected values in this context.
  • Another participant provides the formula for the Chi-squared test function and suggests comparing the test value to a table based on degrees of freedom or extracting a p-value for significance testing.
  • A participant asks for clarification on what constitutes observed and expected values when using the least squares method.
  • It is suggested that observed values are the actual data points, while expected values can be derived from the fitted model, such as a linear regression equation.
  • Concerns are raised about the interpretation of low Chi-squared values, with one participant noting that a low value may not always be favorable and suggesting the use of Chi-squared per degree of freedom for better assessment.
  • Questions arise regarding how to handle expected values that are close to zero in the context of testing for normal distribution, with suggestions to either combine bins or reject the hypothesis based on the situation.
  • Another participant mentions the importance of linearizing growth models to ensure data stationarity before applying statistical tests.

Areas of Agreement / Disagreement

Participants express differing views on the implications of low Chi-squared values and how to handle expected values near zero. The discussion remains unresolved regarding the best approach to these issues, with multiple perspectives presented.

Contextual Notes

Some participants highlight the need for careful consideration of data characteristics, such as stationarity and the treatment of bins with low expected values, which may affect the validity of the Chi-squared test results.

liquidFuzz
Messages
107
Reaction score
6
I have a curve fit of a nonlinear function (a growth model). As a sanity check I do a chi2 test, but I'm not really sure how to set it up properly. My data is as such: sample point and estimated points. In a chi2 test the variables are often referred to as observations and expected. What would this translate to in a chi test of a least square method. In addition, if I get a really low chi2 test value, is that always a good thing, i.e., there's nothing I should worry about in close proximity to origin or such?

Thanks!
 
  • Like
Likes   Reactions: Agent Smith
Physics news on Phys.org
The ##\chi^ 2##-test function is defined as $$\sum_{i} \frac{\left(y_i-e_i\right)^2}{e_i},$$
where ##y_i## are the observed value and ##e_i## is the estimated value (say, if you want to test if your data come from a binomial distribution, ##\text{Bin}\left(n,p\right)##, then ##e_i=n\cdot p##.) After extracting the test function value you have to compare it to a table of values (according to the degrees of freedom you have), otherwise you could extract the related to the test value p-value and compare it to the significance level ##\alpha ##. I hope I gave some kind of an answer to your question, as (tbh) I didn't understand it completely.
 
  • Like
Likes   Reactions: Agent Smith and FactChecker
What are my observed values yi and expected value ei if you calculated a model with a least square method?
 
The observed values are the data given and the expected ones, if your model is, say the simple linear, are $$e_i=\hat{\beta}_0+\hat{\beta}_1x_i,$$
where ##\hat{\beta}_0## and ##\hat{\beta}_1## are the least square estimators.
Analogously, for any other kind of model, e.g. multiple regression etc.
 
  • Informative
Likes   Reactions: berkeman
Thanks!
 
  • Like
Likes   Reactions: berkeman
I am only familiar with the Chi-squared goodness of fit test which compared the histogram of data with the expected theoretical frequency distribution. This seems to be a different test.
 
liquidFuzz said:
f I get a really low chi2 test value, is that always a good thing, i.e., there's nothing I should worry about in close proximity to origin or such?
Look at the ChiSq per degree of freedom (P) table to see how reasonable a low value can be. It will give the probability that your value of p will be exceeded for the degrees of freedom (number of data points minus the number of parameters). If the probability is large then the ChiSq is probably too low. One thing that can account for this is overestimating the uncertainties of the data.
 
A question regarding zero entries in expected values.

Lets say I want to test whether a set of data could be considered normal distributed. How do I treat bins where the expected value is close to zero. Fewer bins or just upright rejecting the hypothesis..?

Edit, additional, if I instead tests against the accumulative distribution, can I use that as a test?
 
liquidFuzz said:
A question regarding zero entries in expected values.

Lets say I want to test whether a set of data could be considered normal distributed. How do I treat bins where the expected value is close to zero. Fewer bins or just upright rejecting the hypothesis..?

Edit, additional, if I instead tests against the accumulative distribution, can I use that as a test?
If you include those bins in your test, does it change the results? You can combine some bins to add up to non-zero expected numbers. If your hypothesized distribution has many expected zero bins and your sample has results in those bins, than the hypothesis might be rightfully rejected. It is not unusual for the extreme tails of an actual distribution to be different from a normal distribution. You will have to use your judgement, based on the situation, on what to do in that case.
 
  • #10
Thanks! I'll play around with merged bins and see if I get something useful out of it.

I was hoping for a clear yes or no... 🤪
 
  • Haha
Likes   Reactions: Agent Smith
  • #11
If you have a growth model, linearize it by taking differences or log returns. If you don’t do this the data won’t be stationary and most statistical tests won’t make sense. Look at geometric Brownian motion or ARIMA models for examples
 
  • #12
liquidFuzz said:
I was hoping for a clear yes or no...
You poor soul! :-p
I read Chi-squared test as part of the null hypothesis. It's interesting.

##\displaystyle \sum_i \frac{(y_i - e_i)^2}{e^i}## is the crux of it. Gracias @mathguy_1995
 

Similar threads

  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 20 ·
Replies
20
Views
4K
  • · Replies 5 ·
Replies
5
Views
4K
  • · Replies 5 ·
Replies
5
Views
9K
  • · Replies 2 ·
Replies
2
Views
4K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 6 ·
Replies
6
Views
5K
  • · Replies 6 ·
Replies
6
Views
24K
  • · Replies 4 ·
Replies
4
Views
2K