# Homework Help: Mean centering data

1. Apr 13, 2010

### physical101

1. The problem statement, all variables and given/known data

Hi there I have been currently working on spectroscopic data and I have mean centered them all before I carry out PCA on them. The mean centering and standardisation operations are simple, just take away the mean and divide through by the standard deviation respectively. I have only today wondered however, how the mean centering operation actually works. If you take away the mean from the dataset will you not redistribute the data set so that the new dataset is representative of the distance of the original data point from the mean? How then is the data mean centered such that the addition of columns in a matrix will be O. I would of though that for this to be true then the negitive values must equal the positive values in the mean centered data. How does this happen, would this not mean that you can only use mean centered data on equally distributed data? If so what would be the point as it would be restricted to a very few cases? Please help im really stuck

2. Apr 13, 2010

### jrlaguna

I'm not sure this is the right forum... Anyway... yes, mean centering means that you substract the mean, so you get only the "deviations". This is needed if you're going to fit it to a standard shape, i.e.: a gaussian or lorenzian...

3. Apr 13, 2010

### physical101

But once you have the deviations from the mean, why is the sum of their total equal to 0?

4. Apr 13, 2010

### jrlaguna

An example. Let's say your data are 4, 5, 6. Mean: 5. Substracting the mean: -1, 0, 1.

5. Apr 13, 2010

### physical101

Okay but the data above is equally distributed both sides of the mean. What if you had more negitives than positives, how come this still equates to 0? So sorry to bother you, just really stuck

6. Apr 13, 2010

### jrlaguna

Want a proof, eih? :) [btw, no bother at all!]

The mean of $$x_i$$ with $$i=1\cdots N$$ is defined as

$$\bar x={1\over N} \sum x_i$$

OK, now substract the mean from the data $$y_i=x_i-\bar x$$ and take the mean of these values:

$$\bar y_i = {1\over N} \sum (x_i - \bar x) = \bar x - \bar x = 0$$

In more simple terms. Let's say you have some data and its average is 5. If you add 7 to all the values... the new average is 12, right?

7. Apr 13, 2010

### physical101

thank you so much - i have been strugling all day with this - if i knew you id but you chocolates

8. Apr 13, 2010

### jrlaguna

:) I appreciate them even if they're virtual... ;)