Grouping Non-Continuous Variables

In summary, the conversation discussed alternative techniques to PCA for grouping non-continuous variables together, such as Latent Class Analysis and Analysis of Variance. The use of kernels and auto-encoders were also suggested as potential methods. It was noted that discrete optimization problems often benefit from continuity, and the use of ICA was mentioned as another possible approach.
  • #1
WWGD
Science Advisor
Gold Member
7,010
10,469
Hi All,
Is there a technique other than PCA (Principal Component Analysis) to decide whether it is somehow reasonable to group together , aka " collapse" several non-continuous ( Categorical, Likert, Ordinal, etc. ) into a single one. The idea is, of course, to lose only a negligible amount of explanatory/predictive power by doing this. PCA ( Possibly Latent Component Analysis --LCA -- as well ) collects groups through the use of the Covariance matrix.
Questions:
1): Are there other basis/justifications for collapsing several
2) To what extent does PCA generalize into non-continuous variables?
Thanks.
 
Physics news on Phys.org
  • #2
There is a statistical field called "analysis of variance" that can be used with discrete and non-ordered variables. With it, you can analyze which variables do the most to explain the variance of the data and which do not contribute much that the important variables did not already explain.
 
  • Like
Likes WWGD
  • #3
CORRECTION: I'm sorry. The Analysis of Variance (ANOVA) that I know of is to be used to explain the variance of a continuous dependent variable. The independent variables do not have to be continuous. Maybe there are also ways to use it with a discrete or non-ordered dependent variable, but I am not familiar with it. And I don't think it is a good replacement for cluster analysis (which may be what you are looking for, but I am also not familiar with that.)

Therefore, I can not help and will bow out of this thread.
 
  • #4
FactChecker said:
CORRECTION: I'm sorry. The Analysis of Variance (ANOVA) that I know of is to be used to explain the variance of a continuous dependent variable. The independent variables do not have to be continuous. Maybe there are also ways to use it with a discrete or non-ordered dependent variable, but I am not familiar with it. And I don't think it is a good replacement for cluster analysis (which may be what you are looking for, but I am also not familiar with that.)

Therefore, I can not help and will bow out of this thread.
Thanks. If you're interested, Latent Class Analysis does some of this.
 
  • Like
Likes FactChecker
  • #5
WWGD said:
Thanks. If you're interested, Latent Class Analysis does some of this.
Thanks. I'll check it out.
 
  • #6
FactChecker said:
Thanks. I'll check it out.
No problem. It is actually pretty interesting stuff IMO: As I understand it, It is the study of (quantitative) qualities that are not directly observable, like Depression, Intelligence; you don't measure them directly , but observe their presence. You observe signs/evidence of these traits and you infer from it the existence of the unobservables. It ultimately seems to come down to using some version of PCA and see if variances line up the right way.
 
  • Like
Likes FactChecker
  • #7
A few thoughts:

If you get tired of regular PCA, you may enjoy mixing in kernels and doing kernel PCA.

You may also check out auto-encoders -- basically neural nets meet PCA.

in general discrete optimization problems have a habit of being NP Hard and continuity (or something 'close') turns out to be a very helpful relaxation. Coming at this from a different direction-- consider the use of the Fiedler vector in the max cut problem.
 
  • Like
Likes WWGD
  • #8
StoneTemplePython said:
A few thoughts:

If you get tired of regular PCA, you may enjoy mixing in kernels and doing kernel PCA.

You may also check out auto-encoders -- basically neural nets meet PCA.

in general discrete optimization problems have a habit of being NP Hard and continuity (or something 'close') turns out to be a very helpful relaxation. Coming at this from a different direction-- consider the use of the Fiedler vector in the max cut problem.
Thanks. Congrats on the SA Badge.
 
  • Like
Likes StoneTemplePython

What is grouping of non-continuous variables?

Grouping of non-continuous variables is a data analysis technique where similar data points are categorized into groups or categories based on their values. This is done to simplify data analysis and make it easier to interpret the data.

Why is grouping of non-continuous variables important?

Grouping of non-continuous variables is important because it helps to simplify complex data and make it easier to understand and interpret. It also allows for easier comparison between data points and identification of patterns or trends.

How is grouping of non-continuous variables done?

Grouping of non-continuous variables is typically done by dividing the range of values into intervals or categories, and then assigning each data point to the appropriate group based on its value. This can be done manually or using statistical software.

What are the benefits of grouping non-continuous variables?

The benefits of grouping non-continuous variables include simplifying data analysis, making it easier to interpret data, identifying patterns and trends, and facilitating comparison between data points.

What are the potential drawbacks of grouping non-continuous variables?

One potential drawback of grouping non-continuous variables is the loss of information and precision. Grouping can also lead to biased results if the intervals are not chosen carefully. Additionally, grouping can make it more difficult to detect outliers or unusual data points.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
2K
  • STEM Academic Advising
Replies
13
Views
2K
  • Poll
  • Science and Math Textbooks
Replies
2
Views
5K
  • STEM Academic Advising
Replies
10
Views
4K
Replies
0
Views
2K
  • Feedback and Announcements
Replies
2
Views
4K
Replies
26
Views
8K
Replies
21
Views
25K
Back
Top