Understanding Mean Centering in Spectroscopic Data Analysis

  • Thread starter Thread starter physical101
  • Start date Start date
  • Tags Tags
    Data Mean
AI Thread Summary
Mean centering in spectroscopic data analysis involves subtracting the mean from each data point, resulting in a dataset where the average is zero. This operation allows for the analysis of deviations from the mean, which is essential for techniques like PCA. The sum of these deviations equals zero because the mean is defined as the average of the dataset, ensuring that positive and negative deviations balance out. This principle holds true regardless of the distribution of the data, as the mean centering process inherently adjusts for any skew. Understanding this concept is crucial for effective data analysis in spectroscopy.
physical101
Messages
41
Reaction score
0

Homework Statement



Hi there I have been currently working on spectroscopic data and I have mean centered them all before I carry out PCA on them. The mean centering and standardisation operations are simple, just take away the mean and divide through by the standard deviation respectively. I have only today wondered however, how the mean centering operation actually works. If you take away the mean from the dataset will you not redistribute the data set so that the new dataset is representative of the distance of the original data point from the mean? How then is the data mean centered such that the addition of columns in a matrix will be O. I would of though that for this to be true then the negitive values must equal the positive values in the mean centered data. How does this happen, would this not mean that you can only use mean centered data on equally distributed data? If so what would be the point as it would be restricted to a very few cases? Please help I am really stuck
 
Physics news on Phys.org
I'm not sure this is the right forum... Anyway... yes, mean centering means that you substract the mean, so you get only the "deviations". This is needed if you're going to fit it to a standard shape, i.e.: a gaussian or lorenzian...
 
But once you have the deviations from the mean, why is the sum of their total equal to 0?
 
An example. Let's say your data are 4, 5, 6. Mean: 5. Substracting the mean: -1, 0, 1.
 
Okay but the data above is equally distributed both sides of the mean. What if you had more negitives than positives, how come this still equates to 0? So sorry to bother you, just really stuck
 
Want a proof, eih? :) [btw, no bother at all!]

The mean of x_i with i=1\cdots N is defined as

\bar x={1\over N} \sum x_i

OK, now substract the mean from the data y_i=x_i-\bar x and take the mean of these values:

\bar y_i = {1\over N} \sum (x_i - \bar x) = \bar x - \bar x = 0

In more simple terms. Let's say you have some data and its average is 5. If you add 7 to all the values... the new average is 12, right?
 
thank you so much - i have been strugling all day with this - if i knew you id but you chocolates
 
:) I appreciate them even if they're virtual... ;)
 
Back
Top