Does higher order moments means more attention to local area?

Click For Summary
SUMMARY

This discussion centers on the relationship between higher order moments in statistical analysis and their impact on data representation in algorithms like Principal Component Analysis (PCA) and Independent Component Analysis (ICA). The first two moments (mean and variance) focus on overall data dispersion, while the third (skewness) and fourth (kurtosis) moments emphasize local features and outliers. The consensus is that higher order moments, particularly kurtosis, allow algorithms to better capture local static properties, contrasting with PCA's focus on global trends through covariance matrices.

PREREQUISITES
  • Understanding of statistical moments: mean, variance, skewness, and kurtosis.
  • Familiarity with Principal Component Analysis (PCA) and Independent Component Analysis (ICA).
  • Knowledge of static analysis algorithms and their applications.
  • Basic concepts of probability density functions (PDF) and their Fourier transforms.
NEXT STEPS
  • Research the implications of higher order moments in statistical analysis.
  • Study the differences between PCA and ICA in depth, focusing on their mathematical foundations.
  • Explore the role of kurtosis in data analysis and its effect on outlier detection.
  • Learn about the relationship between moments and frequency information in probability distributions.
USEFUL FOR

Data scientists, statisticians, and machine learning practitioners interested in advanced data analysis techniques and the theoretical foundations of algorithms like PCA and ICA.

Wenlong
Messages
9
Reaction score
0
Dear all,

Sorry to post this question in this section again.

I am currently looking into few static analyse algorithms. I noticed that they are analysing with different order moments or cumulants to analyse the data. I guess it is because these algorithms are focus on different aspect of the data itself.

So far as I know, 1st (mean) and 2nd(variance) moments are focus on the dispersion of the data as a whole, and 3rd moment (skewness) looks into the tail area of the distribution. 4th moment (kurtosis) concentrates on the peaks.

Can I then deduce that higher moments means the algorithm pays more attention to local static property?

Can anyone answer me explicitly to help me out of this headache? Or can you recommend some books or papers? I do extremely appreciate for your kind consideration and help.

Best wishes
Wenlong
 
Physics news on Phys.org
The odd and even moments tend to behave differently. Odd moments will naturally tell you things about lopsidedness (because an odd power of a negative number is negative). The mean is a measure of lopsidedness compared with a distribution more evenly placed about 0. Even moments treat both sides equally, so say more about spread.
Higher order moments put a heavier weight on the outliers. A distribution with a sharp peak and long tails will have a higher kurtosis than one with the same variance but which has a broader centre then falls off quickly.
 
Hi, Haruspex

Thank you very much for your reply. It helps a lot.

Then may I ask a further question base on this? Take PCA and ICA (independent component analysis) for example, PCA compute principal components with covariance matrix (2nd order moment) while ICA compute independent components with negentropy (measured by kurtosis or higher order moments).

By comparison of principal components and independent components of same set of observations, I find that independent components are better to represent local features while principal components are better to represent global trends.

Is this because the different order of moments they use? Or it just a coincidence?

Many thanks in advance.

Best wishes
Wenlong
 
haruspex said:
The odd and even moments tend to behave differently. Odd moments will naturally tell you things about lopsidedness (because an odd power of a negative number is negative). The mean is a measure of lopsidedness compared with a distribution more evenly placed about 0. Even moments treat both sides equally, so say more about spread.
Higher order moments put a heavier weight on the outliers. A distribution with a sharp peak and long tails will have a higher kurtosis than one with the same variance but which has a broader centre then falls off quickly.

Hi, Haruspex

Thank you very much for your reply. It helps a lot.

Then may I ask a further question base on this? Take PCA and ICA (independent component analysis) for example, PCA compute principal components with covariance matrix (2nd order moment) while ICA compute independent components with negentropy (measured by kurtosis or higher order moments).

By comparison of principal components and independent components of same set of observations, I find that independent components are better to represent local features while principal components are better to represent global trends.

Is this because the different order of moments they use? Or it just a coincidence?

Many thanks in advance.

Best wishes
Wenlong

BTW, how can I reply to a respondent directly in this forum?
 
Wenlong said:
Dear all,

Sorry to post this question in this section again.

I am currently looking into few static analyse algorithms. I noticed that they are analysing with different order moments or cumulants to analyse the data. I guess it is because these algorithms are focus on different aspect of the data itself.

So far as I know, 1st (mean) and 2nd(variance) moments are focus on the dispersion of the data as a whole, and 3rd moment (skewness) looks into the tail area of the distribution. 4th moment (kurtosis) concentrates on the peaks.

Can I then deduce that higher moments means the algorithm pays more attention to local static property?

Can anyone answer me explicitly to help me out of this headache? Or can you recommend some books or papers? I do extremely appreciate for your kind consideration and help.

Best wishes
Wenlong

Hey Wenlong.

Are you aware of the relationship between the moments and the characteristic probability function, and what the interpretation of the Fourier and inverse Fourier transform is with respect to frequency information?

This will help you understand the relationship between the various moments (not central moments, just moments) and the frequency information of the PDF itself.
 
Wenlong said:
Take PCA and ICA (independent component analysis) for example, PCA compute principal components with covariance matrix (2nd order moment) while ICA compute independent components with negentropy (measured by kurtosis or higher order moments).

By comparison of principal components and independent components of same set of observations, I find that independent components are better to represent local features while principal components are better to represent global trends.

Is this because the different order of moments they use? Or it just a coincidence?
You've gone beyond my limits of expertise with that one.
As far as I've been able to discern:
- PCA is often used as a preliminary (whitening) step for ICA anyway;
- ICA requires non-Gaussianity in (all but one of) the sources, whereas PCA does not;
- ICA doesn't rank the components
 

Similar threads

  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 3 ·
Replies
3
Views
1K
  • · Replies 9 ·
Replies
9
Views
13K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 13 ·
Replies
13
Views
3K
  • · Replies 10 ·
Replies
10
Views
5K
  • · Replies 11 ·
Replies
11
Views
4K
  • · Replies 23 ·
Replies
23
Views
37K