A Regression: Regularization parameter 0

SchroedingersLion
Messages
211
Reaction score
56
TL;DR Summary
ScikitLearn Elastic Net regression gives a hyperparameter of 0, implying that ordinary least squares is the best method.
Hi guys,

I am using ScikitLearn's Elastic Net implementation to perform regression on a data set where number of data points is larger than number of features. The routine uses crossvalidation to find the two hyperparameters: ElasticNetCV
The elastic net minimizes ##\frac {1}{2N} ||y-Xw||^2 + \alpha c ||w||_1 + \frac 1 2 \alpha (1-c) ||w||_2^2 ##, where ##\alpha## and ##c## are the hyperparameters.

However, I obtain a hyperparameter of ##\alpha=0##, which means the routine prefers no regularization at all. I was wondering what this means. The regularization is done in order to decrease overfitting on test data. What does a parameter of 0 imply? Does it mean I cannot have overfitting in this case?SL
 
Physics news on Phys.org
Generally, if the number of features <<< data points and there is little correlation between them, then OLS is the best method
 
So overfitting is not an issue when I have more data points than features?
Makes somewhat sense, because if I want to fit a line with n parameters, and I have N>>n data points, I can not hit each data point as good as I want, not even in the training data. So overfitting is suppressed.
 
The elastic net is a combination of lasso and ridge and will penalize collinear and low t-stat variables - so if you get the same results as an OLS your predictors are fine
 
  • Like
Likes SchroedingersLion
Thank you!
 
Should add that none of these methods deal well with autocorellation - need GLS or GMM for that
 
Hi all, I've been a roulette player for more than 10 years (although I took time off here and there) and it's only now that I'm trying to understand the physics of the game. Basically my strategy in roulette is to divide the wheel roughly into two halves (let's call them A and B). My theory is that in roulette there will invariably be variance. In other words, if A comes up 5 times in a row, B will be due to come up soon. However I have been proven wrong many times, and I have seen some...
Thread 'Detail of Diagonalization Lemma'
The following is more or less taken from page 6 of C. Smorynski's "Self-Reference and Modal Logic". (Springer, 1985) (I couldn't get raised brackets to indicate codification (Gödel numbering), so I use a box. The overline is assigning a name. The detail I would like clarification on is in the second step in the last line, where we have an m-overlined, and we substitute the expression for m. Are we saying that the name of a coded term is the same as the coded term? Thanks in advance.

Similar threads

Back
Top