Sample mean and linear relation?

AI Thread Summary
The discussion focuses on the calculation of the sample mean and its simplification through linear transformations. The sample mean is defined as the sum of values divided by the number of observations, and when applying a linear transformation, the mean of the transformed data can be expressed in terms of the original mean. The choice of 280 as a reference point for transformation is to simplify calculations, but any constant can be used, as demonstrated with an alternative choice of 20. The importance of constants 'a' and 'b' in the linear relation y = ax + b is emphasized, with 'a' representing the scaling factor and 'b' the shift, which aids in simplifying the computation of the mean. Ultimately, the transformation allows for easier calculations while ensuring the final mean remains consistent across different transformations.
jwxie
Messages
278
Reaction score
0

Homework Statement



The book states the following:

The sample mean is defined by: \bar{x}=\sum_{i=1}^{n}x_{i}/n

The computation of the sample mean can often be simplified by noting that if for constants
a and b, y_{i}=ax_{i}+b, then the sample mean of the data set y1 , . . . , yn is: \bar{y}=\sum_{i=1}^{n}(ax_{i}+b)/n=\sum_{i=1}^{n}ax_{i}/n+\sum_{i=1}^{n}b/n=a\bar{x}+b


Given the question:


The winning scores in the U.S. Masters golf tournament in the years from
1982 to 1991 were as follows: 284, 280, 277, 282, 279, 285, 281, 283, 278, 277

The book computes as follows:

Rather than directly adding these values, it is easier to first subtract 280 from
each one to obtain the new values yi = xi − 280, we obtain:

4, 0, −3, 2, −1, 5, 1, 3, −2, −3

ecause the arithmetic average of the transformed data set is
y_bar = 6/10
̄
it follows that
x_bar = y_bar + 280 = 280.6
(1) Why are we choosing 280?? What is the reason for that? I tried other numbers don't they don't give the same sample mean.

(2) I also want to confirm my understanding of the linear relation given the summation. I know the constant a should be the 1/n, as x1 / n + x2 / n + xn / n is the same as (x1+x2+xn)/n. Why do we need the b, the intercept? Is it just stating the obvious ,a general form? Or is it possible that we will encounter a statistical sample mean that cross the y?

But what I just said don't make sense to "yi = xi − 280". the constant a is 1...Can someone correct me? Thanks!
 
Physics news on Phys.org
Let's choose 20 instead of 280. Then, new data set becomes

264 260 257 262 259 265 261 263 258 257

hence, y_bar = 260.6. It follows that x_bar = 20 + 260.6 = 280.6

y = ax + b is the general form of the linear relation between x and y.
 
The book states that if the mean \bar{x}=\sum_{i=1}^{n}x_{i}/n

Then for another distribution which is related as y_{i}=ax_{i}+b

the mean is \bar{y}=a\bar{x}+b

what they are trying to imply here is that by properly choosing 'a' and 'b' the process of finding mean can be simplified.

in the example solution you mentioned

a = 1, b = -280 (the values can be any real number, but it is chosen in such a way to make the problem simpler)

so the equation becomes y_{i} = x_{i} - 280

this effectively converts the problem into a much simpler problem involving small numbers. Once we compute \bar{y}
we have to get back
\bar{x}
This is done by solving \bar{y}=a\bar{x}+b
( \bar{y} , a ,b are known)

This gives the answer.
 
Back
Top