Generalization of mean, median

In summary: The \mathcal L^{\infty} mean might be of some use (just a supposition; I can't think of any off the top of my head).
  • #1
JoAuSc
198
1
I recently learned that if you minimize these functions with respect to "a", you get the mean and the median respectively:

[tex]\sum (y_i - a)^2[/tex]

[tex]\sum |y_i - a|[/tex]

What would you get if you minimized an expression like [tex]\sum |y_i - a|^n [/tex] for various n's? Do the resulting expressions have any use, or are they just a slightly different, more complicated mean?
 
Physics news on Phys.org
  • #2
Higher powers of (y-a) may not have real solutions, they may all be imaginary.

What is an example of min sum |y - a|^n that does not produce the identical result with min sum |y - a|^2? (Remember, the median has to be an element of the data set.)
 
Last edited:
  • #3
EnumaElish said:
What is an example of min sum |y - a|^n that does not produce the identical result with min sum |y - a|^2?

The multiset {0, 0, 3} would work. It's maximized at a = 1.327480002... for n = 4, but a = 1 for n = 2.
 
  • #4
CRGreathouse said:
The multiset {0, 0, 3} would work. It's maximized at a = 1.327480002... for n = 4, but a = 1 for n = 2.
But 1.327480002... is not in the set. (Neither is 1.) By convention, the "4th-order median" is identical to the median (=0).
 
  • #5
EnumaElish said:
But 1.327480002... is not in the set. (Neither is 1.) By convention, the "4th-order median" is identical to the median (=0).

Why would a need to be in the multiset? It's a fourth-order mean.
 
  • #6
I looked at this problem in MATLAB (code is below), and it does seem like there exists a "mean" for each n that lies within the range of values "a" is defined on.


Code:
clear
hold off

% determine distribution over [0,1]
x = rand(100,1);
t = 0.0:(1.0/99.0):1.0;
y = exp(-10*(x).^2);
y2 = exp(-10*(t).^2);
plot(t,y2)

a = 0.0:0.001:1.0;

for k = 1:size(a,2)
    for n = 1:10
        S(k,n) = sum( ( abs(y-a(k)) ).^n );
    end
end

figure
hold all
minPts = zeros(1,10);
for n = 1:10
    % plot
    plot(a,log(S(:,n)))
    % determine minimums (this line keeps screwing up for some reason, so
    % I commented it out)
%    minPts(n) = find( S(:,n) == min(S(:,n)) );
end
title('log plot of summary functions');

figure
hold all
for n = 1:10
    plot(a,S(:,n))
end
title('summary functions')


% a(minPts)
 
  • #7
JoAuSc said:
I looked at this problem in MATLAB (code is below), and it does seem like there exists a "mean" for each n that lies within the range of values "a" is defined on.
My post about imaginary root was in reference to the mean, not the median. ({1,2,3,4,5} - a)^n does not have a real root for every n.
 
Last edited:
  • #8
CRGreathouse said:
Why would a need to be in the multiset? It's a fourth-order mean.
My post in reference to your multiset was about the median, not the mean. The main question of the OP, I think, was also the median.
 
  • #9
Choquet theory, anyone?

JoAuSc said:
I recently learned that if you minimize these functions with respect to "a", you get the mean and the median respectively:
[tex]\sum (y_i - a)^2[/tex]
[tex]\sum |y_i - a|[/tex]

With some qualifications (in the case of median), you can easily generalize this to hold for finite-dimensional vector spaces. (If you like that, and have a graduate level background in functional analysis, check out Choquet theory, in which we average over infinite dimensional simplices and find beautiful geometric interpretations of important concepts in ergodic theory.)

What memorable formal property of the median is most easily spotted in the vector space setting?

Speaking of mean and median, in the one-dimensional case, what can you say about situations in which the mean exceeds the median and conversely? Can these situations arise plausibly when grading quizzes?

(A phrase from "A Prairie Home Companion" always makes me smile: "Where all the children are above average". Outside Lake Wobbegon we can't do quite that well, but with a sufficiently unlikely distribution :wink: we can come close!)

JoAuSc said:
What would you get if you minimized an expression like [tex]\sum |y_i - a|^n [/tex] for various n's? Do the resulting expressions have any use, or are they just a slightly different, more complicated mean?

Well, first of all, what can you say about the "resulting expressions"?
 
Last edited:
  • #10
JoAuSc said:
What would you get if you minimized an expression like [tex]\sum |y_i - a|^n [/tex] for various n's? Do the resulting expressions have any use, or are they just a slightly different, more complicated mean?

I suppose you could call [tex]\sum |y_i - a|^n [/tex] the [tex]\mathcal L^n[/tex] mean. The question is, does such a mean yield any practical value? The median and mean are quite statistical measures. The [tex]\mathcal L^{\infty}[/tex] mean might be of some use (just a supposition; I can't think of any off the top of my head). For example, the [tex]\mathcal L^{\infty}[/tex] mean of the set {0, 0, 3} is 1.5.
 
Last edited:

1. What is the generalization of mean and median?

The generalization of mean and median is a statistical concept that involves finding a central tendency of a set of data. It is used to describe the typical or average value of a dataset.

2. How is the generalization of mean and median calculated?

The generalization of mean is calculated by adding all the values in a dataset and dividing it by the total number of values. The generalization of median is calculated by arranging the values in ascending order and finding the middle value. If there is an even number of values, the median is the average of the two middle values.

3. What is the purpose of generalizing mean and median?

The purpose of generalizing mean and median is to summarize a large set of data into a single value that represents the central tendency. It helps in understanding the overall trend and variability of the data.

4. Can the generalization of mean and median be affected by outliers?

Yes, the generalization of mean and median can be affected by outliers. An outlier is a value that is significantly different from the other values in the dataset. Outliers can skew the value of the mean and median, making them less representative of the data.

5. When is it appropriate to use the generalization of mean and median?

The generalization of mean and median is appropriate to use when the data is normally distributed or when there are no extreme outliers present. It is also useful when comparing two or more datasets with similar ranges of values.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
3K
  • Set Theory, Logic, Probability, Statistics
2
Replies
64
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
901
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
754
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
704
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
788
Back
Top