Models Overview =============== The library provides both basic and SR3-empowered implementations of modern regularized regression algorithms. Below we give an overview of available options and give advice on how to choose a model suitable for your goals. Table of Models --------------- Linear Models ~~~~~~~~~~~~~ Located in ``pysr3.linear`` submodule ================= ===================== ======================== Regularizer without SR3 with SR3 ================= ===================== ======================== No regularization ``SimpleLinearModel`` ``SimpleLinearModelSR3`` L0 ``LinearL0Model`` ``LinearL0ModelSR3`` LASSO ``LinearL1Model`` ``LinearL1ModelSR3`` CAD ``LinearCADModel`` ``LinearCADModelSR3`` SCAD ``LinearSCADModel`` ``LinearSCADModelSR3`` ================= ===================== ======================== The classes ``LinearModel`` is a base class that implements core functionality. Linear Mixed-Effects Models ~~~~~~~~~~~~~~~~~~~~~~~~~~~ ================= ================== ===================== Regularizer without SR3 with SR3 ================= ================== ===================== No regularization ``SimpleLMEModel`` ``SimpleLMEModelSR3`` L0 ``L0LmeModel`` ``L0LmeModelSR3`` LASSO ``L1LmeModel`` ``L1LmeModelSR3`` CAD ``CADLmeModel`` ``CADLmeModelSR3`` SCAD ``SCADLmeModel`` ``SCADLmeModelSR3`` ================= ================== ===================== The class ``LMEModel`` is a base class that implements core functionality. Regularizers ------------ The library currently implements five popular regularizers: L0, LASSO, A-LASSO, CAD, and SCAD. Below we plot the values and the proximal mappings for three of them: .. code:: ipython3 from pysr3.regularizers import L1Regularizer, CADRegularizer, SCADRegularizer import numpy as np from matplotlib import pyplot as plt x = np.linspace(-4, 4) fig, ax = plt.subplots(nrows=2, ncols=3) for i, (name, regularizer) in enumerate({"L1": L1Regularizer(lam=1), "CAD": CADRegularizer(lam=1, rho=1), "SCAD": SCADRegularizer(lam=1, sigma=0.5, rho=3.7)}.items()): ax[0, i].plot(x, [regularizer.value(xi) for xi in x]) ax[0, i].set_title(f"{name}: value") ax[1, i].plot(x, [regularizer.prox(xi, alpha=1) for xi in x]) ax[1, i].set_title(f"{name}: prox mapping") fig.tight_layout() plt.show() .. image:: models_overview_files/models_overview_3_0.png L0 ~~ This is a regularizer that allows no more than a pre-defined number ``k`` of parameters to be non-zero: .. math:: R(x) = \delta (\|x\|_0 \leq k) where :math:`\delta` is an indicator function. L1 (LASSO) ~~~~~~~~~~ This is a well-known sparsity-promoting regularizer that penalizes the loss-funciton with a sum of absolute values of the model’s coefficients: .. math:: R(x) = \lambda \|x\|_1 In ``pysr3``, it’s implemented as ``pysr3.regularizers.L1Regularizer`` class. :math:`\lambda` is a hyper-parameter that controls the strength of the regularization. Larger values of :math:`\lambda` lead to sparser sets of coefficients. The optimal value is normally set via cross-validation grid-search that minimizes an information criterion of the choice (AIC, BIC, e.t.c.) on a validation part of the splits. See the example in Quickstart tutorial. Adaptive LASSO (A-LASSO) ~~~~~~~~~~~~~~~~~~~~~~~~ Adaptive LASSO uses :math:`\bar{x}` – a solution for a non-regularized problem – as weights for LASSO penalty: .. math:: R(x) = \lambda\sum_{i=1}^n \frac{|x_i|}{|\bar{x}_i|} In ``pysr3``, A-LASSO can be implemented by providing custom regularization weights to LASSO model: .. code:: ipython3 from pysr3.linear.problems import LinearProblem from pysr3.linear.models import SimpleLinearModel, LinearL1Model x, y = LinearProblem.generate(num_objects=100, num_features=200, true_x=[0, 1]*100).to_x_y() non_regularized_coefficients = SimpleLinearModel().fit(x, y).coef_["x"] regularization_weights = 1/(non_regularized_coefficients + 1e-3) # safeguard against zeros in the denominator alasso_model = LinearL1Model(lam=1).fit(x, y, regularization_weights=regularization_weights) CAD ~~~ Clipped Absolute Deviation (CAD) penalty is a clipped version of LASSO. When the values of the coefficients are less than a hyper-parameter :math:`\rho` – it works like LASSO, but it does not penalize the coefficients that are bigger than that in their absolute value. It addresses an issue of LASSO penalizing big coefficients too much. In ``pysr3``, CAD regularizer is implemented as ``pysr3.regularizers.CADRegularizer``. SCAD ~~~~ Smooth Clipped Absolute Deviation (SCAD) penalty is a smoothed version of SCAD. Instead of imposing a hard threshold, it uses 2-knot smooth spline interpolation to connect the flat and the absolute value parts. The knots of the spline are located at points :math:`\rho` and :math:`\sigma\rho`. In ``pysr3``, SCAD regularizer is implemented as ``pysr3.regularizers.CADRegularizer``. Regular PGD vs SR3-PGD ---------------------- Proximal Gradient Descent (PGD) is currently the core numerical solver for ``pysr3``. It has a very simple form of iteration that requires only gradient information of the loss function and the proximal operator of the regularizer: :math:`x_{k+1} = \text{prox}_{\alpha R}(x_k - \alpha \nabla f(x_k))` The core methodological innovation of ``pysr3`` is Sparse Relaxed Regularized Regression (SR3) framework. It improves the conditioning of the loss function, that leads to faster convergence and more accurate feature selection. The picture below illustrates how SR3 changes the landscape of the likelihood for a Linear Mixed-Effect Model. Notice that the same optimization method takes nearly two orders of magnitude less iterations to converge. |image0| Every model in pysr3 has its SR3-empowered version. For all of them, the parameter ``ell`` controls the degree of relaxation, with larger parameter values giving tighter relaxations. ``ell`` can be left to its default value or can be found via grid-search simultaneously with :math:`\lambda`. .. |image0| image:: sr3_mixed_intuition.png