Analysis of Cross-Sectional Data

Machine Learning Approaches to Regression

Kevin Sheppard

Analysis of Cross-Sectional Data

Best Subset Regression and Stepwise Regression

  • Best Subset Regression
  • Forward Stepwise Regression
  • Backward Stepwise Regression
  • Hybrid Stepwise Regression

Best Subset Regression

  • With $p$ candidate variables, consider all $2^p$ possible models
  • For each $k=1,\ldots,p$ find the best model in terms of $SSE$
  • From set of $p$ best models, select preferred model using cross-validation
  • Example regression is portfolio tracking/replication
$$ VWM_t = \sum_{i=1}^k \beta_i R_{i,t} + \epsilon_t$$
  • $R_{i,t}$ are industry portfolio returns
  • BSR example uses 15 randomly selected industry portfolios due to computational limits

Best Subset Regression

In [3]:
xval_plot(bsr_sse, bsr_sse_xv)

Forward Stepwise Regression

  • When $p$ is large Best Subset Regression is computationally impossible
  • Forward Stepwise Regression builds a sequence of models using two principles
    1. Start from the previous model with $i$ variables
    2. Considering all excluded variables, add the variable to model $i+1$ that produces the largest reduction in SSE
  • Begins with model $0$ that includes no variables (or possibly a constant)
  • Produces a sequence of $p$ (or $p+1$ if a constant) models
  • Preferred model is selected from these models using cross-validated SSE

Forward Stepwise Regression

In [5]:
xval_plot(fs_sse, fs_sse_xv)

Backward Stepwise Regression

  • Similar to Forward Stepwise only constructed by removing variables
  • Backward Stepwise Regression builds a sequence of models using two principles
    1. Start from the previous model with $i$ variables
    2. Considering all included variables in model $i$, remove the variable that increases the SSE the least to construct model $i-1$
  • Begins with model $p$ that includes all variables
  • Like FSR, produces a sequence of $p$ (or $p+1$ if a constant) models
  • Preferred model is selected from these models using cross-validated SSE

Backward Stepwise Regression

In [7]:
xval_plot(bs_sse, bs_sse_xv)

Comparing Forward and Backward Stepwise Regression

In [8]:
fs_model = fs_models[pd.Series(fs_sse_xv).idxmin()]
bs_model = bs_models[pd.Series(bs_sse_xv).idxmin()]
print(f"Number of common regressors: {len(set(bs_model).intersection(fs_model))}")
print(f"Only in FS: {', '.join(sorted(set(fs_model).difference(bs_model)))}")
print(f"Only in BS: {', '.join(sorted(set(bs_model).difference(fs_model)))}")
Number of common regressors: 23
Only in FS: Mach
Only in BS: Coal, Fun, Meals

Hybrid Approaches

  • Forward and Backward can be combined to produce a larger set of candidate models
  • Forward-Backward
    • Use FSR to build select candidate models with $k=1,\ldots,p$ variables
    • Starting with each of these $p$ model, perform Backward Stepwise
  • Can be iterated as much as one is willing to wait
  • Might better approximate Best Subset Regression
  • Final model selected from enlarged set of candidate models by minimizing the cross-validated SSE
In [10]:
print(f"The number of models selected using forward-backward is {len(fb_models)}")
print(f"Not in FS: {', '.join(sorted(set(fb_model).difference(fs_model)))}")
print(f"Not in BS: {', '.join(sorted(set(fb_model).difference(bs_model)))}")
The number of models selected using forward-backward is 117
Not in FS: Coal, Meals
Not in BS: 

Hybrid Approaches

In [11]:
xval_plot(fb_sse, fb_sse_xv, k=fb_k)

Analysis of Cross-Sectional Data

Shrinkage Estimators

  • Ridge Regression
  • LASSO: Least Absolute Shrinkage and Selection Operator

Note: Important to standardized regressors when using shrinkage estimators

Ridge Regression

Fit a modified least squares problem

$$ \arg\!\min_{\boldsymbol{\beta}}\,\left(\mathbf{y}-\mathbf{X}\boldsymbol{\beta}\right)^{\prime}\left(\mathbf{y}-\mathbf{X}\boldsymbol{\beta}\right)\text{ subject to }\sum_{j=1}^{k}\beta_{j}^{2}\leq\omega $$

Equivalent formulation

$$ \arg\!\min_{\boldsymbol{\beta}}\,\left(\mathbf{y}-\mathbf{X}\boldsymbol{\beta}\right)^{\prime}\left(\mathbf{y}-\mathbf{X}\boldsymbol{\beta}\right)+\lambda\sum_{j=1}^{k}\beta_{j}^{2}$$

Analytical solution

$$\hat{\boldsymbol{\beta}}^{\textrm{Ridge}}=\left(\mathbf{X}^{\prime}\mathbf{X}+\lambda \mathbf{I}_{k}\right)^{-1}\mathbf{X}^{\prime}\mathbf{y}$$
  • $\lambda$ is key shrinkage (ore regularization) parameter
  • Optimal $\lambda$ is chosen using cross-validations across a grid of values

Ridge Regression $\alpha$ Selection

In [13]:
ridge_cv_plot()