About chi-squared and r-squared test for fitting data

In summary, the chi-squared statistic is used to determine if a model is a good fit for data. It is distinct from Pearson's Chi-square statistic. The chi-squared statistic is used to determine if a model is a good fit for data without taking into account the variance of the data.
  • #1
chastiell
11
0
hi all, i just want you to tell me if my ideas are correct or not :

As far as i can see the R^2 test is usually used in OLS (ordinary least squares) method where many conditions for data is showed (something like linearity in coefficients, expectation values for perturbations must be zero, and variance for perturbations a constant value).

Many times in experiments the conditions are not accomplished. After reading something like that i found an old book in my files statistical data analysis from Glenn Cowan, where least squares is derived from maximum likelihood parameters estimation method, i think this feel so natural and general because linear and constant variance restriction is not showed, you have data with any known variance , any model where linearity in coefficients is not obligatory, then you only need to maximize the function:

##\chi^2=\sum_i {ydata_i-f(parameters,xdata)\over yerror_i^2}##
(sorry for that strange latex code but i don't know how to use equations in this editor)
<<Moderator's note: simply use the proper tags. See https://www.physicsforums.com/help/latexhelp/>>

for the parameters , then the chi-squared/number of degrees of freedom is used as a measure of goodness of fit. Any method for maximization can be used (i prefer numerical methods).

Without more information (because i didn't find it) I'm really tempted to conclude (at least this is my hypothesis) that r-squared is a coefficient for goodness of fit only (there are few methods to use it if variance condition is not accomplished) if conditions are given and that chi-squared goodness of fit is used in a more general way (i mean without the variance condition and linearity ) are you agree with me ? why yes? or why not? thanks for you answers :)
 
Last edited by a moderator:
Physics news on Phys.org
  • #2
chastiell said:
After reading something like that i found an old book in my files statistical data analysis from Glenn Cowan, where least squares is derived from maximum likelihood parameters estimation method, i think this feel so natural and general because linear and constant variance restriction is not showed, you have data with any known variance , any model where linearity in coefficients is not obligatory, then you only need to maximize the function:

##\chi^2=\sum_i {ydata_i-f(parameters,xdata)\over yerror_i^2}##
I think these notes summarize what you are talking about: https://www.physics.ohio-state.edu/~gan/teaching/spring04/Chapter6.pdf

You need an "i" subscript on the "xdata": ##\chi^2=\sum_i {ydata_i-f(parameters,xdata_i)\over yerror_i^2}##
and that chi-squared goodness of fit is used in a more general way (i mean without the variance condition and linearity ) are you agree with me ? why yes? or why not? thanks for you answers :)

This is a use of chi-square which is distinct from "Pearson's Chi-square". Unlike Pearson's, it does not used binned data. However, you need to know "##yerror_i##, the standard deviation of the errors in each of the measurements. In the terminology of notes in that link, you need the same information for R as for ##\chi^2##. Perhaps you are talking about a definition of R that differs from the one given in that link.

I agree that a maximum likelihood fit has a clearer intuitive meaning than a least squares fit. The only caution about maximum likelihood is that the most likely thing that may happen in a model may not be very likely at all. Maxiumum likelihood fits are convincing when is a large probability that something "about the same" as the most likely event will happen. If the maximum likelihood event is an isolated thin peak in a distribution then that event or something "about the same" may have a very small probability of occurring.
 
  • #3
Hi Stephen, I'm a bit confused about your last paragraph. Can you elaborate more what you mean? A likelihood function is not a probability so I get a bit confused when you say you use likelihood and large probability in the same sentence.
 
  • #4
MarneMath said:
Hi Stephen, I'm a bit confused about your last paragraph. Can you elaborate more what you mean? A likelihood function is not a probability so I get a bit confused when you say you use likelihood and large probability in the same sentence.

An interval around the single value x = a that produces maximum liklihood has a probability. Fitting by the criteria of maximum liklihood makes sense when the predicted distribution has an interval "near" x = a that has a large probability - using whatever definition of "near" applies to specific practical problem.
 
  • #5
I think I get what you're saying now. I usually hear your point in terms of Bayesian criticism of the MLE, ie that the mle fails to account for the volume of the parameter space that fits the data well.
 
  • #6
hi all thanks for your answers, thanks for the link :)
 

FAQ: About chi-squared and r-squared test for fitting data

What is chi-squared test and when is it used?

The chi-squared test is a statistical test used to determine if there is a significant difference between the observed data and the expected data. It is typically used for categorical or discrete data, such as survey responses or counts.

How is the chi-squared test calculated?

The chi-squared test is calculated by taking the sum of the squared differences between the observed and expected values, divided by the expected values. This results in a chi-squared statistic, which is then compared to a critical value from a chi-squared distribution to determine the significance.

What is r-squared and how does it relate to chi-squared?

R-squared is a measure of how well a regression line fits the data. It represents the proportion of variation in the dependent variable that can be explained by the independent variable. While chi-squared is used for categorical data, r-squared is used for continuous data, and they are not directly related.

What does a high or low chi-squared value indicate?

A high chi-squared value indicates a larger difference between the observed and expected values, meaning there is likely a significant difference between the two. A low chi-squared value indicates a smaller difference and suggests that the observed and expected values are similar.

Can the chi-squared test be used for all types of data?

No, the chi-squared test is only appropriate for categorical or discrete data. For continuous data, other tests such as t-test or ANOVA should be used.

Similar threads

Replies
1
Views
1K
Replies
1
Views
1K
Replies
5
Views
2K
Replies
7
Views
2K
Replies
5
Views
4K
Replies
7
Views
1K
Back
Top