Feature Selection for Revolution: Stats or Subject Matter?

  • Context: Graduate 
  • Thread starter Thread starter WWGD
  • Start date Start date
Click For Summary

Discussion Overview

The discussion revolves around the factors contributing to the onset of revolution, specifically examining the role of demographic features such as a large population of young people and a population pyramid that is thick at the bottom. Participants explore whether these features are determined by statistical analysis, subject matter knowledge, or a combination of both, and consider methodologies for feature selection in this context.

Discussion Character

  • Debate/contested
  • Exploratory
  • Technical explanation

Main Points Raised

  • Some participants propose that feature selection for revolution onset could involve statistical methods, including ANOVA and cross-validation techniques, to ensure robust findings.
  • Others argue that relying solely on statistics may be flawed due to small sample sizes and the influence of numerous other variables, citing historical counterexamples like the USSR and Germany.
  • One participant suggests that older populations may be less likely to initiate revolutions due to their vested interests in the status quo, while younger populations may be more inclined to seek change.
  • There is a mention of the potential impact of socioeconomic factors, such as wealth and the GINI coefficient, on the likelihood of revolutions, indicating that demographic features alone may not be sufficient to explain revolutionary activity.

Areas of Agreement / Disagreement

Participants express differing views on the validity of using statistical methods to predict revolutionary outcomes, with some emphasizing the limitations of such approaches and others advocating for their use in conjunction with subject matter knowledge. The discussion remains unresolved regarding the best methodology for feature selection and the role of demographics in revolutions.

Contextual Notes

Participants highlight the complexity of establishing causal relationships between demographic features and revolutionary activity, noting the influence of external factors and the potential for statistical analysis to yield misleading results.

WWGD
Science Advisor
Homework Helper
Messages
7,785
Reaction score
13,040
TL;DR
How to find features that provide a high correlation with a dependent variable.
Hi, I remember reading a paper a while back that argued/proved that a large population of young people ( say <19 y.o or so) and a population pyramid that is thick at the bottom is a necessary feature for the onset of revolution .
** My Question** Is this determination based on Statistics alone, subject matter knowledge or a combination of both? What process would one follow in order to do feature selection if one wanted to determine the \a choice of feature to associate with a given Dependent Variable other than just using basic correlation analysis. Maybe some type of Anova?
 
Physics news on Phys.org
WWGD said:
Summary:: How to find features that provide a high correlation with a dependent variable.

Hi, I remember reading a paper a while back that argued/proved that a large population of young people ( say <19 y.o or so) and a population pyramid that is thick at the bottom is a necessary feature for the onset of revolution .
** My Question** Is this determination based on Statistics alone, subject matter knowledge or a combination of both? What process would one follow in order to do feature selection if one wanted to determine the \a choice of feature to associate with a given Dependent Variable other than just using basic correlation analysis. Maybe some type of Anova?
There are a number of ways.
https://scikit-learn.org/stable/modules/feature_selection.html

Always make sure that you test on a subset of data that didn't inform the selection. Or, for example, if you are using the features for a predictive model, you can do feature selection within the cross-validation loop if you're using that, but not before hand on the full data. If your feature selection process include an optimal parameter search, you should do that within an inner/nested cross validation loop. In classical machine learning, often the feature selection process itself is part of the model (the whole pipeline is, including preprocessing, feature selection, and parameter tuning).

In some cases, there are a high number of candidate features, and just searching for the best ones can fail, since there is some chance that fluctuations/noise can by chance produce a distribution showing correlation. In those cases, and in general to some extent, it is important to also have some reason to believe the feature might have a causal relationship, or that the population distributions should show a correlation. That way, you begin with a hypothesis, and a much smaller number of candidates, and you have a better chance that your finding is reliable.

It is believed that a very large subset of statistical research is faulty because of this issue. Different scientific fields/sub-fields are always trying to work towards more robust methodology to avoid these kind of pitfalls. For example, the p-value threshold to be relied on depends strongly on the application. Many, many works have presented false discoveries, or bad results in general due to this issue, for example, by assuming p<0.05 is enough (not to mention the hacking).
 
Last edited:
  • Like
Likes   Reactions: WWGD
Don't think statistics will prove anything as the samples are too small and too many other variables. Did the USSR have a revolution in 1991? What about Germany in 1918? If so, there are counter examples. Until the last few decades most every country had a pyramid-shaped demographic period except for periods where war had killed a large number of younger people (like the post-war USSR). The countries over the past 20-30 years without a large number of young people tended to be rich liberal democracies - so is the lack of revolutions in Western Europe due to being rich or old?
 
BWV said:
Don't think statistics will prove anything as the samples are too small and too many other variables. Did the USSR have a revolution in 1991? What about Germany in 1918? If so, there are counter examples. Until the last few decades most every country had a pyramid-shaped demographic period except for periods where war had killed a large number of younger people (like the post-war USSR). The countries over the past 20-30 years without a large number of young people tended to be rich liberal democracies - so is the lack of revolutions in Western Europe due to being rich or old?
The idea is that older people usually have other concerns like work and taking care of their families and tend to have more invested in the status quo ( than younger people) and are thus less willing to threaten their station in life by trying to overthrow the system. It may be more likely that a pyramidal distribution increases the odds; but a level of general discontent must prevail too. maybe a GINI coefficient beyond a certain point does too.
 

Similar threads

  • · Replies 54 ·
2
Replies
54
Views
6K
  • · Replies 874 ·
30
Replies
874
Views
46K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 1 ·
Replies
1
Views
3K
Replies
5
Views
3K
  • · Replies 62 ·
3
Replies
62
Views
11K
Replies
1
Views
2K
  • · Replies 13 ·
Replies
13
Views
4K
  • · Replies 10 ·
Replies
10
Views
5K
  • · Replies 177 ·
6
Replies
177
Views
31K