Chi-squared test for normality

Joon · Mar 16, 2019

Homework Statement

Hello, I was given 2 sets of data, showing 20 temperature values and 35 temperature values respectively. The data sets look like below:

Data 1 Data 2
Temperature Temperature
30.9 28.5
30.6 30.4
..
..
continued (20 values) continued (35 values)

I have done the Chi-squared test for these two sets of data. However, I am also required to analyse the third case, where no data is provided. It was only given that the third dataset consists of 66 data points (and within the limits of any random effects matches the characteristics of datasets 1 and 2) and I need to suggest the number of Chi-squared groups for the third dataset.

I want to ask what the best way is to determine the number of Chi-squared groups for specific number of data points (in this case 66).

Homework Equations

The Attempt at a Solution

For the first and second sets of data, I simply separated the values by 5 points, so 4 groups for the first set and 7 for the second.

WWGD · Mar 16, 2019

The Chi-squared tests for normality I know use the 68-95-99.7% rule: You compute the sample data mean, SE and then you expect 68% of the data points to be within 1 SE from the mean, etc. Is that the one you have in mind?

Also, I am not sure I understood. Did you conduct a Chi-Squared on each data set, checking for normality? Are you comparing the two data sets to test if they come from the same distribution? If the latter maybe a Wilcoxon rank test may work.

Joon · Mar 16, 2019

Thanks for your reply, do you mean standard deviation by SE?

WWGD · Mar 16, 2019

Joon said:

Thanks for your reply, do you mean standard deviation by SE?

Sure. But, do you have the population SD? If not, you need to use SE from the sample data.

Joon · Mar 16, 2019

It was only given that the third dataset has 66 data points, nothing more. In this case, should I use the mean and SD taken from Dataset 1 or 2?

I am required to suggest a possible scheme, suggest how the 66 data points could be separated into n groups.

WWGD · Mar 16, 2019

I would think to do the same type/level of binning as in the other cases, i.e., the same number of categories. There isn't really much going on in normal distributions beyond 4sigma from the mean. If your data is equal up to random error, then it seems you would use the same number of bins/categories.

EDIT: Are the sample mean, SD roughly the same for data sets 1,2?
EDIT2: It ultimately comes down to comparing the observed frequency and compare it , using the Chi^2, to the expected frequency. If the data comes from the same pop., you can assume the same mean ( except maybe if your SE is extremely small) and SE for the third group.

It seems a bit confusing to me: did you reject/accept the claim from datasets 1,2? What is the purpose then for the 3rd data set? Just trying to understand better what you are aiming/testing for.

Joon · Mar 16, 2019

I initially separated data 1 with 20 data points into 4 groups, 5 data points in each group. 7 groups for data 2 with 35 data points.
I want to ask a question:
Is the Chi-squared value of a dataset affected by how I separate the data points? For instance, I could separate 20 data points into 4 groups but in 4,6,4,6 points in each group. Or irrespective of number of groups and how many data points are in each group, the Chi squared value ends up having the same value?

Could you explain a bit more about the mean +- 4 sd ?
Also, by same number of bins/ categories, do you mean dataset 3 with 66 data points could just be separated into 66/5 (about 11 groups?)

WWGD · Mar 16, 2019

EDIT: Good questions, please give me some more time to think it through. My point about the ## pm 4 SD ##s (pm := plues/minus; I forgot my Latex for it) is re the 68-95-99.7 rule: Assuming normality, 99.7 % of your data will be within that range of the mean. But I will check the remainder. It is also an issue of how well/fast the data converges to a normal. You expect tht the larger the data set (assuming normality) the closer you get to an actual normal. 35 should look closer to normal than 20 and 66 would be closer to normal than 35. My thoughts are of using the SD in data set 2 and diving by ##\sqrt 66## instead of ## \sqrt 35 ## and then seeing how many SEs from that last data set cover the entire range of the data.

Joon · Mar 16, 2019

For EDIT: For the first dataset, mean is 50.10 and SD is 9.77. For the second dataset, mean is 49.89 and SD is 9.98. They are similar to some extent but I'm not sure if it is okay to take the mean and SD values from one of these datasets for data 3.
EDIT 2: I understand what you mean. Thanks for the explanation.

It is my statistics coursework, and without giving the actual dataset for data 3, I'm required to divide 66 data points into n groups. Below are the questions that were placed right below the data 3 question part and I need to answer them, do you think these make data 3 more useful? (To test a student's knowledge on this topic)

Suppose that for a given grouping arrangement the sum of Chi-Squared values obtained was 0.1 (arbitrary units here) what would the confidence level be? Also, there is often some sensitivity to the grouping arrangements and if 2 other grouping arrangements produced Chi-Squared sums of 0.2 and 0.4 (arb. units) what would be the confidence levels for these cases?
-To be honest, I have no idea what the question requires. Confidence level can simply be checked from Chi squared distribution table if I know degrees of freedom, what do you think is the point of the question?

Joon · Mar 16, 2019

I calculated SD using sqrt(66) instead of sqrt(35):
SD from dataset 2 is 9.98 so 9.98 * sqrt(35) / sqrt(66) = 7.27.
Dataset 2 has values from 28.4 up to 70.6 and therefore 70.6 / 7.27 gives 9.71.

WWGD · Mar 16, 2019

Joon said:

I calculated SD using sqrt(66) instead of sqrt(35):
SD from dataset 2 is 9.98 so 9.98 * sqrt(35) / sqrt(66) = 7.27.
Dataset 2 has values from 28.4 up to 70.6 and therefore 70.6 / 7.27 gives 9.71.

Ok, so my idea is that you do 4SEs unless you have enough outlier values beyond that. So you would go from the sample mean 4SE's in either direction unless you find enough outliers , though that is not likely under the assumption that data set 3 is similar to data sets 1,2. But let me mull it over some more.

Joon · Mar 16, 2019

Sorry, I've just found this figure from my lecture note.
I suppose for data 3 it is not to do with any calculations and using mean and SD from either Data 1 or 2, it's just using this graph?
Data 3 with 66 data points will have 10 groups according to the graph.

How would I determine the number of members in each group though? Do you have any idea?

andrewkirk · Mar 16, 2019

@Joon It is hard to give useful advice without fully understanding the problem. A myriad of questions arise, eg are we to assume the two samples come from the same population, and that the third sample will, too?

Could you please supply the full statement of the problem, showing what we are told to assume and what we are asked to test for?

Joon · Mar 16, 2019

Dataset 1,2 and 3 are all repeated data from the same sample. No actual data was given for dataset 3, just the number of datapoints: 66.
I have done the Chi squared analysis for Dataset 1 and 2. I just need to suggest a possible scheme for the number of groups and number of datapoints in each group for dataset 3. I've found the graph above, data with 66 data points should be split into 10 groups according to the graph.
66 / 10 = 6.6, I'm trying to figure out how many datapoints I should have in each group.

Joon · Mar 16, 2019

Suppose that for a given grouping arrangement the sum of Chi-Squared values obtained was 0.1 (arbitrary units here) what would the confidence level be? Also, there is often some sensitivity to the grouping arrangements and if 2 other grouping arrangements produced Chi-Squared sums of 0.2 and 0.4 (arb. units) what would be the confidence levels for these cases?

For the bit above, I think it's just asking basic knowledge. Degrees of freedom = Number of groups - 1 - Number of variables = 10 - 1 - 2 = 7
The values of Chi-squared sums are given, so confidence levels can simply be calculated.

Chi-squared test for normality

Homework Statement

Homework Equations

The Attempt at a Solution

Discussion

Attachments

Hi! Can someone explain about Differential Equations?

Investigating the real roots of a cubic equation

Deriving spatial derivatives

What does "compute Aut(G)" mean?

J_1(x) = (x^2/10)*(J_1(x) + J_3(x)) How to solve?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Chi-squared test for normality

Homework Statement

Homework Equations

The Attempt at a Solution

Discussion

Attachments

Similar threads

Undergrad Two-sample t test vs. chi-squared test for homogeneity

Graduate Using Statistics to Test for Normality of Pi

Statistics Chi Square test of normally distributed data

Graduate Defining a Symmetry Statistic to test for Normality?

High School Chi-squared test stat choosing a critical value

Graduate Chi squared test for data with error

Graduate What is a Chi-Squared Test Against Weighted Mean?

Undergrad About chi-squared and r-squared test for fitting data

Undergrad How to do a chi squared test on a linear fit

Graduate Question on Pearson's Chi-squared test