Variable selection CART and SVM?

In summary, when fitting models like CART and SVM, it is not necessary to include all variables. Instead, you can use various techniques to select the most relevant and significant variables for better performance.
  • #1
Niendel
2
0
Hey, i am taking an applied statistics course and have a question related to an analysis of a data set.
I am observing a great degree of positive correlation among the variables, and as expected I find that some of the variables are non significant when i apply general linear regression.
I also want to try methods like CART and SVM for this classification response. I am wondering, when i fit these models, is it necessary to include all the variables? How can I find out what to include in CART and SVM?
If i use only the significant variables that i found trough backward selection in the GLM analysis, I see that the method has a smaller error and MRS than if i include all of the variables.
Is there any method for doing this variable selection method formally for CART and SVM?
 
Physics news on Phys.org
  • #2
Yes, there are methods for formally selecting variables for CART and SVM. For CART, you can use a variety of feature selection techniques such as recursive feature elimination (RFE), forward selection, backward elimination, and more. These techniques will help you identify the most important features that should be included in your model. For SVM, you can use a genetic algorithm or a grid search to select the most relevant features. Additionally, you can use regularization techniques such as L1 or L2 regularization to reduce the number of features and make the model more interpretable.
 

Related to Variable selection CART and SVM?

1. What is the purpose of variable selection in CART and SVM models?

The purpose of variable selection in CART (Classification and Regression Trees) and SVM (Support Vector Machines) models is to identify the most important variables or features in predicting the outcome variable. This helps to simplify the model, reduce overfitting, and improve the model's interpretability.

2. How is variable selection done in CART and SVM models?

In CART models, variable selection is done by recursively splitting the data based on the variable that provides the most information gain. In SVM models, variable selection is done by finding the optimal hyperplane that separates the data points into different classes. This hyperplane is determined by the support vectors, which are the data points closest to the decision boundary.

3. How do CART and SVM models handle categorical variables in variable selection?

In CART models, categorical variables are split into binary variables at each node of the tree. The splitting is done based on the category that provides the most information gain. In SVM models, categorical variables are converted into numerical values using one-hot encoding before the model is trained.

4. Are there any limitations to variable selection in CART and SVM models?

One limitation of variable selection in CART models is that it may not identify interactions between variables. In SVM models, selecting variables based on their individual predictive power may not capture the true relationship between variables if the data is not linearly separable. Additionally, both models may suffer from overfitting if the selected variables are not truly predictive of the outcome variable.

5. Can variable selection be automated in CART and SVM models?

Yes, there are automated variable selection techniques available for CART and SVM models. These include methods such as forward/backward selection, step-wise selection, and regularization techniques like LASSO and ridge regression. It is important to note that these automated methods may not always select the most optimal variables, so it is important to carefully evaluate the results and consider expert knowledge in the variable selection process.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
940
  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
564
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
6K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
25
Views
7K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
Back
Top