MHB Is Normalization Necessary for Chi-Square and K-S Tests?

AI Thread Summary
The discussion centers on the necessity of normalizing two-digit numbers (like 10, 20, 30, etc.) to a range between 0 and 1 for statistical tests, specifically the Kolmogorov-Smirnov (K-S) test and the Chi-square test. It is noted that normalization is essential for the K-S test, where values must be between 0 and 1. However, for the Chi-square test, normalization is not explicitly required. The formula for the Chi-square test, χ² = Σ((O-E)²/E), indicates that observed (O) and expected (E) values do not need to be normalized in the same way.The discussion raises questions about how to normalize O and E and whether their sums must be equal. It suggests that normalization might depend on the context, especially when dealing with randomly generated data. A specific example illustrates that different normalization methods can yield vastly different Chi-square results, highlighting the ambiguity of normalization in this context.
shivajikobardan
Messages
637
Reaction score
54
Say two digits numbers are given like 10,20,30,55,95,85,12,13,52...etc. Is it necessary to normalize them to numbers between 0 to 1? i.e 0.10 for 10, 0.20 for 20 and so on? I've read this to be the case for K-S test. But I'm not sure for chi-square test. I'm not 100% sure on this information as I've not seen it everywhere.
http://www-i4.informatik.rwth-aache...s/sub/simulation/simulationSS06/slides/05.pdf
It looks like it's the case for K-S test as x needs to be between 0 to 1, not sure for Chi-square test?
 
Technology news on Phys.org
It doesn't seem as though it's necessary according to https://www.scribbr.com/statistics/chi-square-tests/
(also 10 doesn't normalize to .10 unless the total is 100)
They define:
##\chi^2 = \Sigma \frac{(O-E)^2}{E}## which isn't quite a normalization. O is observed E is expected.
You also need some standard, which they call a critical value.
I am not sure if it matters, though.

If we look at a specific example for 1 data point (using your "normalization", i.e. divide by 100):
O = 15, E =10 -> ##\chi^2 = \frac{5^2}{10} = 2.5## vs ## \frac{.05^2}{.1} =.025##

Thinking about it, it's not even clear how you should normalize O and E. Is it always necessary that Sum(O) = Sum(E)? I think it depends.
If your list is randomly generated, you should expect 50 from each, say we have 10 data points, that's 500 total expected units. It could be the case that in some niche trial the computer generates all 1's or all 99's, in which case there would be more observed units than expected units. The only way to actually normalize both O and E would be to normalize them separately, like ##\frac{O_i}{\Sigma_k O_k}## and the same for E. I'm not sure if that's reasonable or not.
Looking at the case where the numbers are all random, we generate 10 points from 1 to 99, the average is 50, which is what we should expect from each. The computer generates 10-1's. Our normalized O values would be 1/10, and our normalized E values would be 50/500 = 1/10. This would give chi^2 = 0, which would imply that our model was right on the money. This is clearly not the case.

If we normalize using ##\frac{O_i}{\Sigma_k E_k}## we get ##\bar{O_i} = \frac{1}{500} \to \chi^2 = \Sigma \frac{(.002 - .1)^2}{.1} = .9604##
Compare that without "normalization" and we have
##\chi^2 = \Sigma \frac{(1-50)^2}{50} = 480.2##
2 very different numbers, but interestingly enough the 2nd one is 500 times larger.
The first one seems to be something to the effect of %error, and at the end of they day, I think it's all going to come down to standards. The website I read didn't do it, nor did they mention it, and in the wikipedia, they give a specific example and don't do it either.

I haven't seen it written, but I think the concept of "normalization" is ambiguous for this calculation.
https://en.wikipedia.org/wiki/Chi-squared_test#Example_chi-squared_test_for_categorical_data
 
Thread 'Is this public key encryption?'
I've tried to intuit public key encryption but never quite managed. But this seems to wrap it up in a bow. This seems to be a very elegant way of transmitting a message publicly that only the sender and receiver can decipher. Is this how PKE works? No, it cant be. In the above case, the requester knows the target's "secret" key - because they have his ID, and therefore knows his birthdate.
I tried a web search "the loss of programming ", and found an article saying that all aspects of writing, developing, and testing software programs will one day all be handled through artificial intelligence. One must wonder then, who is responsible. WHO is responsible for any problems, bugs, deficiencies, or whatever malfunctions which the programs make their users endure? Things may work wrong however the "wrong" happens. AI needs to fix the problems for the users. Any way to...
Back
Top