I was told that given a probability distribution p(x) dx, the expected value for x is given by: <x> = Ʃ xi P(xi) = ∫ x P(x) dx This part makes sense to me. It was justified to me through the use of weighted averages. However, my teacher then made a hand-wavy move to generalize the above formula. I quote: This way of calculating the average can be easily generalized, since it depends neither on the numbers of different events nor on the total number of events, it only depends on the probabilities of all different possibilities. So we can consider an experiment where we are measuring some quantity x, and all the possible outcomes are x1, x2, ... , xn. If we denote the probability of the outcome xi to be P(xi) then we can write the average of x as <x>= Ʃ xi P(xi) = ∫ x P(x) dx (17) We may also be interested in calculating the average of some given function of x, call it f(x). The different possible values of f(x) are f(x1), f(x2), ... , f(xn), and the probability P(f(xi)) of the value f(xi) is, of course, the same as for x to have the value xi, i.e. P(xi) P(f(xi)) = P(xi) We can now use the rule (17) to find the average of f(x) <f>= Ʃ f(xi) p(f(xi)) = ∫ f(x) P(f(x)) dx <f>= Ʃ f(xi) p(xi) ∫ f(x) P(x) dx -end of quote- It's this last part I don't understand: P(f(xi)) = P(xi) I don't see how this can be true for anything other than f(xi)=xi or P(xi)=(a constant). Can someone please justify this to me? ----- Edit: after extensive searching, I finally came across this: http://en.wikipedia.org/wiki/Law_of_the_unconscious_statistician I've been searching for a long time now, and I still haven't found justification for why this works.