Why shift the Mann Whitney distribution?

  • Thread starter Thread starter fadecomic
  • Start date Start date
  • Tags Tags
    Distribution Shift
AI Thread Summary
The discussion centers on the Mann Whitney test and its use of the U statistic, which represents the difference between the observed sum of ranks and the maximum or minimum possible sum of ranks. The question raised is why the U distribution is shifted rather than directly tabulated based on the sum of ranks, as seen in the Wilcoxon test. The participant seeks clarity on the advantages of this approach and its necessity, especially since the expected value of the U statistic is easily proven to be (n_a*n_b)/2. The confusion lies in understanding the rationale behind this shifting of the distribution. Overall, the discussion highlights a fundamental inquiry into the methodology of the Mann Whitney test and its statistical implications.
fadecomic
Messages
10
Reaction score
0
I'm reading up on the Mann Whitney test, and I can't wrap my head around one thing. Most of the test makes perfect sense. If two samples come from populations with similar medians, then the sum of ranks of both of those populations should hover around some expected value. The "T" or "U" statistic, depending on what you're reading, is determined, and one determines whether or not U falls within a certain interval. Fine. U is defined as the difference of the observed sum of ranks and either the minimum or maximum possible value of the sum of ranks (doesn't matter). The U distribution is parameterized by the two sample sizes and nothing more, as is the maximum possible U. That means the max possible U is nothing but a shift of the distribution. Why bother? Is there some advantage to doing it that way? Why not tabulate the distribution based on the sum of ranks only (which is done--it's called the "Wilcoxan test")?

Thanks.
 
Physics news on Phys.org
Incidentally, the expected value for the U statistic is (n_an_b)/2, which is easy to prove. So why does the Mann-Whitney test require a statistic that is shifted to a central value of (n_an_b)/2?
 
Back
Top