Statistics Sum of Squares x*y Proof

In summary, the proof shows that the sum of the products of the deviations of x and y from their respective means is equal to the sum of the products of the deviations of x from its mean and y itself. This is proven by simplifying the equation and using the relation between the sum and mean values.
  • #1
laz0r
17
0

Homework Statement



Prove that

[tex]\sum[(x_{i} - \overline{x})(y_{i} - \overline{y})] = \sum[(x_{i} - \overline{x})y_{i}][/tex]


Homework Equations


None.


The Attempt at a Solution


I tried using the fact that the sum of the mean values is just the mean value, because the sum of a constant is simply a constant, and I expanded out the first sum, but I didn't end up anywhere.

[tex] \sum[x_{i}y_{i}] - \overline{y}\sum(x_{i}) - \overline{x}\sum(y_{i}) + \overline(x)\overline(y) [/tex]

I have no idea where to go from here, so some inspiration would be much appreciated.
 
Physics news on Phys.org
  • #2
laz0r said:

Homework Statement



Prove that

[tex]\sum[(x_{i} - \overline{x})(y_{i} - \overline{y})] = \sum[(x_{i} - \overline{x})y_{i}][/tex]


Homework Equations


None.


The Attempt at a Solution


I tried using the fact that the sum of the mean values is just the mean value, because the sum of a constant is simply a constant, and I expanded out the first sum, but I didn't end up anywhere.

[tex] \sum[x_{i}y_{i}] - \overline{y}\sum(x_{i}) - \overline{x}\sum(y_{i}) + \overline(x)\overline(y) [/tex]

I have no idea where to go from here, so some inspiration would be much appreciated.

I assume you have n values ##x_1, \ldots, x_n## and ##y_1, \ldots,y_n##. In that case your last term above, ##\bar{x} \bar{y}## is incorrect. Can You see why? You can also do other simplifications, but if I say more I will essentially be doing the question for you.
 
  • #3
laz0r said:

Homework Statement



Prove that

[tex]\sum[(x_{i} - \overline{x})(y_{i} - \overline{y})] = \sum[(x_{i} - \overline{x})y_{i}][/tex]


Homework Equations


None.


The Attempt at a Solution


I tried using the fact that the sum of the mean values is just the mean value, because the sum of a constant is simply a constant, and I expanded out the first sum, but I didn't end up anywhere.

[tex] \sum[x_{i}y_{i}] - \overline{y}\sum(x_{i}) - \overline{x}\sum(y_{i}) + \overline(x)\overline(y) [/tex]

I have no idea where to go from here, so some inspiration would be much appreciated.

Notice that your first and third terms give what is on the right side of the equation. So you are left with showing$$
-\sum x_i\bar y + \sum \bar y \bar x = 0$$You're pretty close...
 
  • #4
Ok, so I actually did end up getting somewhere and didn't realize it =)

EDIT:

I've done proof

[tex] \sum[(x_{i}-\overline{x})y_{i}] + \sum[\overline{y}(\overline{x} - x_{i})][/tex]

Playing around in excel has taught me that [tex]\sum[\overline{y}(\overline{x} - x_{i})][/tex] is actually just equal to zero, because your subtracting an array of x values from the mean and then just multiplying them by a constant, [tex]\overline{y}[/tex], then adding them, which nets you zero.
 
Last edited:
  • #5
laz0r said:
Ok, so I actually did end up getting somewhere and didn't realize it =)

I've done most of the proof and I've gotten up until this point

[tex] \sum[(x_{i}-\overline{x})y_{i}] + \sum[\overline{y}(\overline{x} - x_{i})][/tex]

So I take it that either [tex] \overline{y} [/tex] or [tex] (\overline{x} - x_{i}) [/tex] must equal zero, but I'm not entirely sure why.

Try writing out a few small examples, such as for n = 2 or n = 3. These are small enough that you can write down everything explicitly and see exactly what is going on.
 
  • Like
Likes 1 person
  • #6
I've figured out why, but I'm not sure how to explain it symbolically, do you think I would need to elaborate more or is my edited explanation good enough in your view?

Thanks for the help!
 
  • #7
##\bar x## and ##\sum x_i## are related to each other. How?
 
  • Like
Likes 1 person
  • #8
LCKurtz said:
##\bar x## and ##\sum x_i## are related to each other. How?

I'm a little bit rusty on my operation of sums, so excuse me if this is incorrect, but can I do this?

[tex]

[(\overline{y})/n][\sum x_{i}] - \overline{y}[\sum x_{i}]

= [(\overline{y}^2)/n][\sum(x_{i} - x_{i})]

[/tex]

Then what tends to zero appears to be obvious
 
  • #9
LCKurtz said:
##\bar x## and ##\sum x_i## are related to each other. How?

laz0r said:
I'm a little bit rusty on my operation of sums, so excuse me if this is incorrect, but can I do this?

[tex]

[(\overline{y})/n][\sum x_{i}] - \overline{y}[\sum x_{i}]

= [(\overline{y}^2)/n][\sum(x_{i} - x_{i})]

[/tex]

Then what tends to zero appears to be obvious

I don't follow that at all. Why don't you just answer my question at the top?
 
  • #10
LCKurtz said:
I don't follow that at all. Why don't you just answer my question at the top?

They're related by

[tex]

\overline{x} = [\sum x_{i}]/n

[/tex]
 
  • #11
laz0r said:
They're related by

[tex]

\overline{x} = [\sum x_{i}]/n

[/tex]

So you could use ##\sum x_i = n\bar x##. Do you see how you might use that to get$$
\sum \bar y(\bar x - x_i) = 0\text{?}$$
 
  • #12
LCKurtz said:
So you could use ##\sum x_i = n\bar x##. Do you see how you might use that to get$$
\sum \bar y(\bar x - x_i) = 0\text{?}$$

Wow, it appears my brain took a nap today.

Thank you for your help lol, I appreciate it.
 
  • #13
So amuse me and show the finishing steps...
 
  • #14
Sorry, I'm very slow at typing this latex code..

[tex]
\overline{y}*[\sum(\overline{x} - x_{i})] = 0
[/tex]

[tex]
[\sum(\overline{x} - x_{i})] = 0
[/tex]

[tex]
\sum(\overline{x}) - \sum(x_{i}) = 0
[/tex]

[tex]
n(\overline{x}) - n(\overline{x}) = 0
[/tex]

LHS = RHS

used

[tex]
\sum(c) = n*c
[/tex]

Where n is the upper bound and 1 is the lower bound of the sum
 
  • #15
Good. Thanks for finishing it up.
 

1. What is the purpose of calculating the "Sum of Squares x*y" in statistics?

The Sum of Squares x*y is a measure of the variability between two variables in a data set. It is used to quantify the relationship between the two variables and determine if there is a linear correlation between them.

2. How is the "Sum of Squares x*y" calculated?

The Sum of Squares x*y is calculated by multiplying each data point in one variable by its corresponding data point in the other variable, and then adding up all of these products. This calculation is usually done using a statistical software or calculator.

3. What does a high value for "Sum of Squares x*y" indicate?

A high value for the Sum of Squares x*y indicates a strong positive linear relationship between the two variables. This means that as one variable increases, the other variable also increases in a predictable manner.

4. Is the "Sum of Squares x*y" affected by outliers in the data?

Yes, outliers can greatly affect the value of the Sum of Squares x*y. If there are extreme values in the data set, they can greatly increase or decrease the Sum of Squares x*y, and therefore, impact the overall interpretation of the relationship between the two variables.

5. How is the "Sum of Squares x*y" used in hypothesis testing?

The Sum of Squares x*y is used to calculate the correlation coefficient, which is then used in hypothesis testing to determine if there is a significant linear relationship between the two variables. A high Sum of Squares x*y and a significant correlation coefficient indicate that the relationship between the variables is not due to chance.

Similar threads

  • Calculus and Beyond Homework Help
Replies
3
Views
1K
  • Calculus and Beyond Homework Help
Replies
5
Views
1K
  • Calculus and Beyond Homework Help
Replies
15
Views
1K
  • Calculus and Beyond Homework Help
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Introductory Physics Homework Help
Replies
8
Views
2K
  • Advanced Physics Homework Help
Replies
4
Views
1K
  • Calculus and Beyond Homework Help
Replies
1
Views
1K
Replies
1
Views
937
  • Calculus and Beyond Homework Help
Replies
1
Views
2K
Back
Top