How to set up and interpret Chi^2 test results for my data?

  • Thread starter Thread starter liquidFuzz
  • Start date Start date
  • Tags Tags
    Chi square
AI Thread Summary
To properly set up a Chi-squared test for a nonlinear growth model, observed values correspond to the actual data points, while expected values are derived from the model's predictions. A low Chi-squared value can indicate a good fit, but it should be assessed against the Chi-squared per degree of freedom to ensure it's reasonable; overestimated uncertainties can lead to misleadingly low values. When dealing with expected values close to zero, it may be necessary to combine bins to avoid issues with the test's validity. Linearizing the growth model through differences or log returns is recommended to ensure data stationarity for statistical tests. Overall, careful consideration of the model and data characteristics is crucial for accurate interpretation of Chi-squared results.
liquidFuzz
Messages
107
Reaction score
6
I have a curve fit of a nonlinear function (a growth model). As a sanity check I do a chi2 test, but I'm not really sure how to set it up properly. My data is as such: sample point and estimated points. In a chi2 test the variables are often referred to as observations and expected. What would this translate to in a chi test of a least square method. In addition, if I get a really low chi2 test value, is that always a good thing, i.e., there's nothing I should worry about in close proximity to origin or such?

Thanks!
 
  • Like
Likes Agent Smith
Physics news on Phys.org
The ##\chi^ 2##-test function is defined as $$\sum_{i} \frac{\left(y_i-e_i\right)^2}{e_i},$$
where ##y_i## are the observed value and ##e_i## is the estimated value (say, if you want to test if your data come from a binomial distribution, ##\text{Bin}\left(n,p\right)##, then ##e_i=n\cdot p##.) After extracting the test function value you have to compare it to a table of values (according to the degrees of freedom you have), otherwise you could extract the related to the test value p-value and compare it to the significance level ##\alpha ##. I hope I gave some kind of an answer to your question, as (tbh) I didn't understand it completely.
 
  • Like
Likes Agent Smith and FactChecker
What are my observed values yi and expected value ei if you calculated a model with a least square method?
 
The observed values are the data given and the expected ones, if your model is, say the simple linear, are $$e_i=\hat{\beta}_0+\hat{\beta}_1x_i,$$
where ##\hat{\beta}_0## and ##\hat{\beta}_1## are the least square estimators.
Analogously, for any other kind of model, e.g. multiple regression etc.
 
Thanks!
 
I am only familiar with the Chi-squared goodness of fit test which compared the histogram of data with the expected theoretical frequency distribution. This seems to be a different test.
 
liquidFuzz said:
f I get a really low chi2 test value, is that always a good thing, i.e., there's nothing I should worry about in close proximity to origin or such?
Look at the ChiSq per degree of freedom (P) table to see how reasonable a low value can be. It will give the probability that your value of p will be exceeded for the degrees of freedom (number of data points minus the number of parameters). If the probability is large then the ChiSq is probably too low. One thing that can account for this is overestimating the uncertainties of the data.
 
A question regarding zero entries in expected values.

Lets say I want to test whether a set of data could be considered normal distributed. How do I treat bins where the expected value is close to zero. Fewer bins or just upright rejecting the hypothesis..?

Edit, additional, if I instead tests against the accumulative distribution, can I use that as a test?
 
liquidFuzz said:
A question regarding zero entries in expected values.

Lets say I want to test whether a set of data could be considered normal distributed. How do I treat bins where the expected value is close to zero. Fewer bins or just upright rejecting the hypothesis..?

Edit, additional, if I instead tests against the accumulative distribution, can I use that as a test?
If you include those bins in your test, does it change the results? You can combine some bins to add up to non-zero expected numbers. If your hypothesized distribution has many expected zero bins and your sample has results in those bins, than the hypothesis might be rightfully rejected. It is not unusual for the extreme tails of an actual distribution to be different from a normal distribution. You will have to use your judgement, based on the situation, on what to do in that case.
 
  • #10
Thanks! I'll play around with merged bins and see if I get something useful out of it.

I was hoping for a clear yes or no... 🤪
 
  • Haha
Likes Agent Smith
  • #11
If you have a growth model, linearize it by taking differences or log returns. If you don’t do this the data won’t be stationary and most statistical tests won’t make sense. Look at geometric Brownian motion or ARIMA models for examples
 
  • #12
liquidFuzz said:
I was hoping for a clear yes or no...
You poor soul! :-p
I read Chi-squared test as part of the null hypothesis. It's interesting.

##\displaystyle \sum_i \frac{(y_i - e_i)^2}{e^i}## is the crux of it. Gracias @mathguy_1995
 

Similar threads

Back
Top