Why Does L1 Regularization Lead to Sparse Solutions Unlike L2?

  • Context: Graduate 
  • Thread starter Thread starter newphysist
  • Start date Start date
  • Tags Tags
    Regularization
Click For Summary
SUMMARY

L1 regularization leads to sparse solutions in ordinary least squares (OLS) due to its geometric properties, specifically the rectangular constraints it imposes in R2 representation. Unlike L2 regularization, which uses spherical constraints allowing for any point on the sphere to be a solution, L1 regularization intersects the ellipsoid only at the corners of the square. This results in a thresholding effect where the minimum remains at zero until a significant influence from the quadratic term is introduced. The parameter 'a' plays a crucial role in determining this threshold, illustrating the unique behavior of L1 regularization compared to L2.

PREREQUISITES
  • Understanding of ordinary least squares (OLS) regression
  • Familiarity with L1 and L2 regularization techniques
  • Basic knowledge of geometric representations in R2
  • Concept of subderivatives in optimization
NEXT STEPS
  • Explore the geometric interpretations of L1 and L2 regularization in greater detail
  • Study the implications of thresholding effects in optimization problems
  • Learn about the role of the parameter 'a' in controlling the influence of regularization terms
  • Investigate other regularization techniques and their effects on model sparsity
USEFUL FOR

Data scientists, machine learning practitioners, and statisticians interested in understanding the differences between L1 and L2 regularization and their impact on model performance and sparsity.

newphysist
Messages
12
Reaction score
0
I am trying to understand the difference between L1 vs. L2 regularization in OLS. I understand the concept of center of ellipsoid being the optimal solution and ellipse itself being contours of constant squared errors. And when we use L2 regularization we introduce a spherical constraint on coefficient and when we use L1 the constraints are rectangle in R2 representation.

In all corresponding pictorial representation of the above in literature,etc, the representation in R2 always shows ellipsoid intersecting the circle in first quadrant but the square on one of the axis i.e at corners. How come in L1 regularization the ellipsoid intersects the square only on corners but in case of L2 any point on the sphere. Wouldn't we get a sparse solution for L2 as well if ellipse intersects the circle (R2 representation) at axis.

Thanks
 
Physics news on Phys.org
The best intuition I've found comes from considering a special case, the 1D minimization problem
argmin_x |x| + a/2(x-1)^2.

The absolute value term tries to put the minimum at x=0, and the quadratic term tries to put the minimum at x=1. The constant 'a' determines the relative influence of the quadratic term compared to the absolute value term, and so as you change 'a', the minimum will move between 0 and 1.

However a really interesting "thresholding" phenomenon occurs. The minimum stays at exact zero for 'a' < 1, and only when 'a'=1 does the minimum start increasing. Draw some pictures to convince yourself this is true. This is because the absolute value function has many "subderivatives" (tangent lines lying beneath the function) at zero, and so it can absorb changes to other terms derivatives, but only up to a certain limit. It is convenient, since it says that 'x' should be exactly zero unless there is a really good reason for it to not be, and the threshold for letting it be nonzero is determined by the parameter 'a'.

On the other hand, if the first term was differentiable instead of absolute value, even for a very small 'a', the minimum would be slightly positive (recall that the minimum is where the derivative of quantity to be minimized is zero). Thus there is no thresholding effect if both terms are differentiable.

l1 regularization is basically a multidimensional generalization of this principle.
 

Similar threads

  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 2 ·
Replies
2
Views
1K
Replies
2
Views
3K
  • · Replies 42 ·
2
Replies
42
Views
8K
  • · Replies 2 ·
Replies
2
Views
617
  • · Replies 6 ·
Replies
6
Views
3K