PCA principal component analysis standardized data

Click For Summary

Discussion Overview

The discussion revolves around the use of standardized data in principal component analysis (PCA), specifically comparing the normalization of data using a correlation matrix versus converting all measurements to the same units. The context includes theoretical considerations and practical implications in data analysis.

Discussion Character

  • Debate/contested
  • Technical explanation

Main Points Raised

  • Some participants question the advantage of using a correlation matrix for normalization over simply converting all time measurements to a consistent unit, such as meters per second.
  • Others express confusion about the definition of "normalizing" data and whether it involves calculating z-scores.
  • One participant mentions that their professor analyzed race data using both methods and compared the results, suggesting that different approaches may yield different outcomes.
  • There is a suggestion that defining what "better" means mathematically is necessary to evaluate the effectiveness of each method.
  • Some participants note that the order of unit conversion and PCA application may lead to different results, but do not conclude which method is superior.

Areas of Agreement / Disagreement

Participants express differing views on the effectiveness of using a correlation matrix versus unit conversion for normalization in PCA. The discussion remains unresolved, with no consensus on which method is preferable.

Contextual Notes

Participants highlight the need for clarity on definitions and the mathematical implications of the methods discussed. There is an acknowledgment that the choice of method may depend on specific contexts and interpretations of "better."

cutesteph
Messages
62
Reaction score
0
Why is better to use the standardized data using the correlation matrix than say converting data into just similar units. Like say I had data that measured car speeds measured in seconds for some data and the other data measured in minutes. Why would it be better just to measure the data using the correlation matrix to normalize data than to just covert all the times to say meters traveled per second.
 
Physics news on Phys.org
Why would it be better just to measure the data using the correlation matrix to normalize data than to just covert all the times to say meters traveled per second.
I don't understand this sentence, but in general data analysis requires all data to have the same units.
 
I mean like say we are looking are car race data like from 1/4 a mile 1 mile are in seconds, while data for a 10 mile and a 50 mile race are in minutes. Can't you use normalize data using the correlation matrix within each group like 1/4 mile race even though it is in seconds to a 10 mile race even though it is in minutes? My professor analyzed data that way in a lecture and compared it to a method to just covert all units to meters per second and just take the covariance matrix of that.
 
cutesteph said:
Why would it be better just to measure the data using the correlation matrix to normalize data than to just covert all the times to say meters traveled per second.

What is your definition of "normalizing" the data? Does it amount to replacing the data on each axis by the "z-score" of the data?
 
Yes. It would be which would be equivalent to using the correlation matrix in lieu of the covariance matrix for PCA. I just not sure exactly why would it be better to use that method than to just chance the units to the same units of say in my example meters per second for each different race length.
 
cutesteph said:
I just not sure exactly why would it be better to use that method than to just chance the units to the same units of say in my example meters per second for each different race length.

We'd have to define what "better" means mathematically to investigate that question.

Perhaps the professor was illustrating that you can get different answers if you convert units and do PCA than if you do PCA and convert the units in the principal components afterwards. That difference doesn't mean that one way is always better or worse than the other.
 

Similar threads

  • · Replies 29 ·
Replies
29
Views
7K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 20 ·
Replies
20
Views
3K
  • · Replies 24 ·
Replies
24
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
Replies
6
Views
1K