Solving Overdetermined Problems: X2 Distribution Requirements

  • Context: Graduate 
  • Thread starter Thread starter Niles
  • Start date Start date
  • Tags Tags
    Distribution
Click For Summary

Discussion Overview

The discussion revolves around the conditions under which the chi-squared statistic follows a chi-squared distribution in the context of overdetermined problems, specifically addressing the implications of data distribution on degrees of freedom. Participants explore the relationship between normally distributed data and the degrees of freedom in chi-squared tests, as well as the implications for non-normally distributed data.

Discussion Character

  • Debate/contested
  • Technical explanation
  • Conceptual clarification

Main Points Raised

  • One participant questions whether the degrees of freedom for the chi-squared statistic is always m-n, regardless of the distribution of the data, or if it specifically requires normally distributed data as stated in their book.
  • Another participant requests clarification on the types of parameters and data points being discussed, suggesting that the lack of clear definitions may be contributing to the confusion.
  • A participant expresses uncertainty about their own understanding, acknowledging that if the data is not normally distributed, it raises questions about how to perform goodness-of-fit estimates without using chi-squared.
  • Some participants reference the Central Limit Theorem as a justification for assuming normality in certain cases, suggesting that sums or averages of random variables may approximate normal distribution.
  • Links to external sources are provided by participants to support claims regarding the necessity of normal distribution for chi-squared tests and to elaborate on the conditions for degrees of freedom.

Areas of Agreement / Disagreement

Participants do not reach a consensus on whether the requirement for normality is essential for the chi-squared distribution or if degrees of freedom can be applied more generally. Multiple competing views remain regarding the implications of data distribution on statistical analysis.

Contextual Notes

Participants note the importance of clearly defining terms such as "parameters" and "data points," as well as the potential limitations of applying chi-squared tests to non-normally distributed data. The discussion highlights the need for further exploration of goodness-of-fit methods in such cases.

Niles
Messages
1,834
Reaction score
0
Hi

I'm not sure this is the right place to post, but I'll go ahead. In my book it says that if I am dealing with an overdetermined problem with m data points and n parameters (so m>n), then my observed chi square X2obs follows a X2 distribution with m-n degrees of freedom if the data points are normally distributed.

I thought that the number of degrees of freedom was always m-n, regardless of what distribution my data follows. Am I right or is it correct what the book is stating?
 
Physics news on Phys.org
Niles said:
Am I right or is it correct what the book is stating?

I think no one has answered this because you haven't given a clear statement of what the book said. For example, what kind of parameters is the book talking about? Means? Covariances? Any old parameter? What kind of data are the "data points"?

Do you have a source or link that supports your own opinion that the random variables need not be normally distributed?
 
Niles said:
Hi

I'm not sure this is the right place to post, but I'll go ahead. In my book it says that if I am dealing with an overdetermined problem with m data points and n parameters (so m>n), then my observed chi square X2obs follows a X2 distribution with m-n degrees of freedom if the data points are normally distributed.

I thought that the number of degrees of freedom was always m-n, regardless of what distribution my data follows. Am I right or is it correct what the book is stating?

The Chi-squared distribution has an essential parameter called number of degrees of freedom. So, the bolded and red text in your quote is all part of the name.
 
By "parameters" I mean parameters used to make a fit to the data. And data points are physically measured data, which is why I believe the book is so keen on always dealing with normally distributed data (cf. Central Limit Theorem).

I have no source for my statement. In fact I believe I might be wrong. But I still think it is an interesting question: If I am dealing with data that isn't Gaussianly distributed, then how would I go about and make a goodness-of-fit estimate, considering I can't use X2?

Thanks.
 
Niles said:
By "parameters" I mean parameters used to make a fit to the data.

And by parameters I meant coefficients that characterize the probability density function, just like the expectation a and the standard deviation [itex]\sigma[/itex] in the normal distribution [itex]\mathcal{N}(a, \sigma)[/itex], or the endpoints a and b in the uniform distribution [itex]\mathcal{U}(a, b)[/itex] or the parameter [itex]\lambda[/itex] in the Poisson distribution [itex]\mathcal{P}(\lambda)[/itex].
 
The "if the data points are normally distributed." part may be invoked by using the Central Limit Theorem. If the data points are sums or averages of many RVs then one may assume it is "close to" normally distributed and thus the statistic is "close to" chi-squared.

(BTW: one should say "regardless of" or "irrespective of" or even "irregarding" but not "irregardless".)
 
http://en.wikipedia.org/wiki/Cochran%27s_theorem" gives the precise conditions when the distribution is chi-square and what the number of degrees of freedom is.
 
Last edited by a moderator:

Similar threads

Replies
1
Views
4K
Replies
5
Views
6K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 4 ·
Replies
4
Views
4K
  • · Replies 24 ·
Replies
24
Views
4K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 9 ·
Replies
9
Views
5K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K