# Analysis of Cross-Sectional Data¶

## Best Subset Regression and Stepwise Regression¶

• Best Subset Regression
• Forward Stepwise Regression
• Backward Stepwise Regression
• Hybrid Stepwise Regression

# Best Subset Regression¶

• With $p$ candidate variables, consider all $2^p$ possible models
• For each $k=1,\ldots,p$ find the best model in terms of $SSE$
• From set of $p$ best models, select preferred model using cross-validation
• Example regression is portfolio tracking/replication
$$VWM_t = \sum_{i=1}^k \beta_i R_{i,t} + \epsilon_t$$
• $R_{i,t}$ are industry portfolio returns
• BSR example uses 15 randomly selected industry portfolios due to computational limits

# Best Subset Regression¶

In [3]:
xval_plot(bsr_sse, bsr_sse_xv)


# Forward Stepwise Regression¶

• When $p$ is large Best Subset Regression is computationally impossible
• Forward Stepwise Regression builds a sequence of models using two principles
1. Start from the previous model with $i$ variables
2. Considering all excluded variables, add the variable to model $i+1$ that produces the largest reduction in SSE
• Begins with model $0$ that includes no variables (or possibly a constant)
• Produces a sequence of $p$ (or $p+1$ if a constant) models
• Preferred model is selected from these models using cross-validated SSE

# Forward Stepwise Regression¶

In [5]:
xval_plot(fs_sse, fs_sse_xv)


# Backward Stepwise Regression¶

• Similar to Forward Stepwise only constructed by removing variables
• Backward Stepwise Regression builds a sequence of models using two principles
1. Start from the previous model with $i$ variables
2. Considering all included variables in model $i$, remove the variable that increases the SSE the least to construct model $i-1$
• Begins with model $p$ that includes all variables
• Like FSR, produces a sequence of $p$ (or $p+1$ if a constant) models
• Preferred model is selected from these models using cross-validated SSE

# Backward Stepwise Regression¶

In [7]:
xval_plot(bs_sse, bs_sse_xv)


# Comparing Forward and Backward Stepwise Regression¶

In [8]:
fs_model = fs_models[pd.Series(fs_sse_xv).idxmin()]
bs_model = bs_models[pd.Series(bs_sse_xv).idxmin()]
print(f"Number of common regressors: {len(set(bs_model).intersection(fs_model))}")
print(f"Only in FS: {', '.join(sorted(set(fs_model).difference(bs_model)))}")
print(f"Only in BS: {', '.join(sorted(set(bs_model).difference(fs_model)))}")

Number of common regressors: 23
Only in FS: Mach
Only in BS: Coal, Fun, Meals


# Hybrid Approaches¶

• Forward and Backward can be combined to produce a larger set of candidate models
• Forward-Backward
• Use FSR to build select candidate models with $k=1,\ldots,p$ variables
• Starting with each of these $p$ model, perform Backward Stepwise
• Can be iterated as much as one is willing to wait
• Might better approximate Best Subset Regression
• Final model selected from enlarged set of candidate models by minimizing the cross-validated SSE
In [10]:
print(f"The number of models selected using forward-backward is {len(fb_models)}")
print(f"Not in FS: {', '.join(sorted(set(fb_model).difference(fs_model)))}")
print(f"Not in BS: {', '.join(sorted(set(fb_model).difference(bs_model)))}")

The number of models selected using forward-backward is 117
Not in FS: Coal, Meals
Not in BS:


# Hybrid Approaches¶

In [11]:
xval_plot(fb_sse, fb_sse_xv, k=fb_k)


# Analysis of Cross-Sectional Data¶

## Shrinkage Estimators¶

• Ridge Regression
• LASSO: Least Absolute Shrinkage and Selection Operator

Note: Important to standardized regressors when using shrinkage estimators

# Ridge Regression¶

Fit a modified least squares problem

$$\arg\!\min_{\boldsymbol{\beta}}\,\left(\mathbf{y}-\mathbf{X}\boldsymbol{\beta}\right)^{\prime}\left(\mathbf{y}-\mathbf{X}\boldsymbol{\beta}\right)\text{ subject to }\sum_{j=1}^{k}\beta_{j}^{2}\leq\omega$$

Equivalent formulation

$$\arg\!\min_{\boldsymbol{\beta}}\,\left(\mathbf{y}-\mathbf{X}\boldsymbol{\beta}\right)^{\prime}\left(\mathbf{y}-\mathbf{X}\boldsymbol{\beta}\right)+\lambda\sum_{j=1}^{k}\beta_{j}^{2}$$

Analytical solution

$$\hat{\boldsymbol{\beta}}^{\textrm{Ridge}}=\left(\mathbf{X}^{\prime}\mathbf{X}+\lambda \mathbf{I}_{k}\right)^{-1}\mathbf{X}^{\prime}\mathbf{y}$$
• $\lambda$ is key shrinkage (ore regularization) parameter
• Optimal $\lambda$ is chosen using cross-validations across a grid of values

# Ridge Regression $\alpha$ Selection¶

In [13]:
ridge_cv_plot()