A Regression: Regularization parameter 0

AI Thread Summary
Using ScikitLearn's Elastic Net for regression, a hyperparameter of α=0 indicates no regularization is preferred, suggesting minimal risk of overfitting due to having more data points than features. This scenario implies that ordinary least squares (OLS) may be the most effective method, particularly when features are uncorrelated. The discussion highlights that with a large number of data points relative to features, overfitting is less of a concern, as fitting to each data point becomes challenging. Additionally, while Elastic Net can manage collinearity and low t-stat variables, it does not address autocorrelation, which requires Generalized Least Squares (GLS) or Generalized Method of Moments (GMM) for proper handling. Overall, the findings emphasize the effectiveness of OLS in this context and the limitations of Elastic Net regarding autocorrelation.
SchroedingersLion
Messages
211
Reaction score
56
TL;DR Summary
ScikitLearn Elastic Net regression gives a hyperparameter of 0, implying that ordinary least squares is the best method.
Hi guys,

I am using ScikitLearn's Elastic Net implementation to perform regression on a data set where number of data points is larger than number of features. The routine uses crossvalidation to find the two hyperparameters: ElasticNetCV
The elastic net minimizes ##\frac {1}{2N} ||y-Xw||^2 + \alpha c ||w||_1 + \frac 1 2 \alpha (1-c) ||w||_2^2 ##, where ##\alpha## and ##c## are the hyperparameters.

However, I obtain a hyperparameter of ##\alpha=0##, which means the routine prefers no regularization at all. I was wondering what this means. The regularization is done in order to decrease overfitting on test data. What does a parameter of 0 imply? Does it mean I cannot have overfitting in this case?SL
 
Physics news on Phys.org
Generally, if the number of features <<< data points and there is little correlation between them, then OLS is the best method
 
So overfitting is not an issue when I have more data points than features?
Makes somewhat sense, because if I want to fit a line with n parameters, and I have N>>n data points, I can not hit each data point as good as I want, not even in the training data. So overfitting is suppressed.
 
The elastic net is a combination of lasso and ridge and will penalize collinear and low t-stat variables - so if you get the same results as an OLS your predictors are fine
 
  • Like
Likes SchroedingersLion
Thank you!
 
Should add that none of these methods deal well with autocorellation - need GLS or GMM for that
 

Similar threads

Back
Top