Analysis of Cross-Sectional Data

Kevin Sheppard

Course Structure

  • Course presented through three channels:
    1. Pre-recorded content with a focus on technical aspects of the course
      • Designed to be viewed in sequence
      • Each module should be short
      • Approximately 2 hours of content per week
    2. In-person lectures with a focus on applied aspects of the course
      • Expected that pre-recorded content has been viewed before the lecture
    3. Notes that accompany the lecture content
      • Read before or after the lecture or when necessary for additional background
  • Slides are primary – material presented during lecturers, either pre-recorded or live is examinable
  • Notes are secondary and provide more background for the slides
  • Slides are derived from notes so there is a strong correspondence

Monitoring Your Progress

  • Self assessment
    • Review questions in pre-recorded content
    • Multiple choice questions on Canvas made available each week
      • Answers available immediately
    • Long-form problem distributed each week
      • Answers presented in a subsequent class
  • Marked Assessment
    • Empirical projects applying the material in the lectures
    • Both individual and group
    • Each empirical assignment will have a written and code component

Analysis of Cross-Sectional Data

Introduction to Regression Models

  • Notation
  • Factor Models
  • Data
  • Variable Transformations

Linear Regression

Scalar Notation

$$ Y_{i}=\beta_{1}X_{1,i}+\beta_{2}X_{2,i}+\ldots+\beta_{k}X_{k,i}+\epsilon_{i} $$
  • $Y_{i}$: Regressand, Dependent Variable, LHS Variable
  • $X_{j,i}$: Regressor, also Independent Variable, RHS Variable, Explanatory Variable
  • $\epsilon_{i}$: Innovation, also Shock, Error or Disturbance
  • $n$ observations, indexed $i=1,2,\ldots,n$
  • $k$ regressors, indexed $j=1,2,\ldots,k$

Linear Regression

Matrix Notation

Common to use convenient matrix notation

$$ \mathbf{y} = \mathbf{X} \boldsymbol{\beta} + \boldsymbol{\epsilon} $$
  • $\mathbf{y}$ is $n$ by $1$
  • $\mathbf{X}$ is $n$ by $k$
  • $\boldsymbol{\beta}$ is $k$ by $1$
  • $\boldsymbol{\epsilon}$ is $n$ by $1$

Factor Models

  • Factor models are widely used in finance

    • Capital Asset Pricing Model (CAPM)
    • Arbitrage Pricing (APT)
    • Risk Exposure
  • Basic specification $R_{i}=\mathbf{f}_{i}\boldsymbol{\beta}+\epsilon_{i}$

    • $R_{i}$: Return on dependent asset, often excess $(R_{i}^{e})$
    • $\mathbf{f}_{i}$: $1\times k$ vector of factor innovations
    • $\epsilon_{i}$ innovation, $corr(\epsilon_{i},F_{j,i})=0$, $j=1,2,\ldots,k $
  • Special Case: CAPM $$R_{i}-R_{i}^{f}=\beta(R_{i}^{m}-R_{i}^{f})+\epsilon_{i} $$ $$R_{i}^{e}=\beta R_{i}^{me}+\epsilon_{i}$$

Data

  • Data from the Fama-French 3 factors + Momentum
    • $VWM^e$ - Excess return on Value-Weighted-Market
    • $SMB$ - Return on the size portfolio
    • $HML$ - Return on the value portfolio
    • $MOM$ - Return on the momentum portfolio
  • Size-Value sorted portfolio return data
    • Size
      • S: Small
      • B: Big
    • Value
      • H: High BE/ME
      • M: Middle BE/ME
      • L: Low BE/ME
  • 49 Industry Portfolios
  • All returns excess except $SMB$, $HML$, $MOM$

Fama-French Factors

Summary Statistics

In [3]:
summ_plot(factors)

Fama-French Factors

Correlation Structure

In [4]:
factors.corr()
Out[4]:
$VWM^e$ $SMB$ $HML$ $MOM$
$VWM^e$ 1.000000 0.300958 -0.226222 -0.149518
$SMB$ 0.300958 1.000000 -0.174962 -0.024014
$HML$ -0.226222 -0.174962 1.000000 -0.195242
$MOM$ -0.149518 -0.024014 -0.195242 1.000000

Size and Value components

In [6]:
summ_plot(components)

Industry Portfolios

In [7]:
summ_plot(subset)

Variable Transformations

  • Dummy variables
    • 0-1 variables based on an indicator function $$ I_{[X_{i,j} > 0]} $$
      • Asymmetries at 0
      • Monthly Effects
In [9]:
monthly_dummies.head(8)
Out[9]:
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
0 0 0 0 0 0 0 1 0 0 0 0 0
1 0 0 0 0 0 0 0 1 0 0 0 0
2 0 0 0 0 0 0 0 0 1 0 0 0
3 0 0 0 0 0 0 0 0 0 1 0 0
4 0 0 0 0 0 0 0 0 0 0 1 0
5 0 0 0 0 0 0 0 0 0 0 0 1
6 1 0 0 0 0 0 0 0 0 0 0 0
7 0 1 0 0 0 0 0 0 0 0 0 0

Variable Transformation: Interactions

  • Interactions dramatically expand the functional forms that can be specified
    • Powers and Cross-products: $ X_{i,j}^2 $, $X_{i,j}X_{i,m}$
    • Dummy Interactions to Produce Asymmetries: $ X_{i,j} \times I_{[X_{i,j}<0]} $
In [11]:
interactions.tail(10)
Out[11]:
Market Negative Negative Return Squared Returns
2019-11-30 0 0.00 14.9769
2019-12-31 0 0.00 7.6729
2020-01-31 1 -0.11 0.0121
2020-02-29 1 -8.13 66.0969
2020-03-31 1 -13.38 179.0244
2020-04-30 0 0.00 186.3225
2020-05-31 0 0.00 31.1364
2020-06-30 0 0.00 6.0516
2020-07-31 0 0.00 33.2929
2020-08-31 0 0.00 58.0644

Analysis of Cross-Sectional Data

Parameter Estimation and Model Fit

  • Parameter Estimation
  • Models with Interactions
  • Other estimated quantities
  • Regression Coefficient in Factor Models

Parameter Estimation

Least Squares

$$\textrm{argmin}_{\beta}\sum_{i=1}^{n}(Y_{i}-\mathbf{x}_{i}\boldsymbol{\beta})^{2}$$
In [13]:
ls = smf.ols("BHe ~ 1 + VWMe + SMB + HML + MOM", data).fit(cov_type="HC0")
summary(ls)
coef std err z P>|z| [0.025 0.975]
Intercept -0.0859 0.043 -1.991 0.046 -0.170 -0.001
VWMe 1.0798 0.012 93.514 0.000 1.057 1.102
SMB 0.0019 0.017 0.110 0.912 -0.032 0.036
HML 0.7643 0.021 36.380 0.000 0.723 0.805
MOM -0.0354 0.013 -2.631 0.009 -0.062 -0.009

Parameter Estimation

Least Absolute Deviations

$$\textrm{argmin}_{\beta} \sum_{i=1}^{n}|Y_{i}-\mathbf{x}_{i}\boldsymbol{\beta}|$$
In [14]:
lad = smf.quantreg("BHe ~ 1 + VWMe + SMB + HML + MOM", data).fit(q=0.5)
summary(lad)
coef std err t P>|t| [0.025 0.975]
Intercept -0.0306 0.044 -0.696 0.487 -0.117 0.056
VWMe 1.0716 0.010 103.257 0.000 1.051 1.092
SMB 0.0161 0.015 1.090 0.276 -0.013 0.045
HML 0.7503 0.016 47.702 0.000 0.719 0.781
MOM -0.0272 0.011 -2.581 0.010 -0.048 -0.007

Estimating Models with Interactions

Added an asymmetry and a square of $VWM$ to the 4-factor model

$$ Util_i =\beta_1 + \beta_2 VWM_i^e + \beta_3 \left(VWM_i^e\right)^2 + \beta_4 VWM_i^e I_{[VWM_i^e < 0]} + \beta_5 SMB_i + \beta_6 HML_i + \beta_7 MOM_i +\epsilon_i $$
In [15]:
model = f"Util ~ 1 + VWMe + I(VWMe**2) + I(VWMe * (VWMe < 0)) + SMB + HML + MOM"
ls_interact = smf.ols(model, data).fit(cov_type="HC0")
summary(ls_interact)
coef std err z P>|z| [0.025 0.975]
Intercept 0.2857 0.225 1.268 0.205 -0.156 0.727
VWMe 0.4594 0.089 5.154 0.000 0.285 0.634
I(VWMe ** 2) 0.0159 0.007 2.240 0.025 0.002 0.030
I(VWMe * (VWMe < 0)) 0.3524 0.188 1.870 0.061 -0.017 0.722
SMB -0.1972 0.048 -4.087 0.000 -0.292 -0.103
HML 0.3470 0.060 5.810 0.000 0.230 0.464
MOM 0.0611 0.039 1.578 0.114 -0.015 0.137

Expected and Fitted Values

  • Fitted values:
$$ \hat{Y}_i = \mathbf{x}_i \hat{\boldsymbol{\beta}} $$
  • Expected values:
$$ E[Y|X=\mathbf{x}] = \mathbf{x} \hat{\boldsymbol{\beta}} $$

Expected and Fitted Values

In [18]:
plot_market_interactions()

Typical Regression Coefficients

Factor Components

In [20]:
beta_plot(betas, titles)

Typical Regression Coefficients

Industry Portfolios

In [22]:
beta_plot(betas, titles)

Evidence of Non-linear returns

  • Add square and asymmetry to 4-factor model
In [24]:
beta_plot(betas, titles)

Measuring fit

  • Coefficient of Determination $$ R^2 = 1- \frac{SSE}{TSS} = \frac{RSS}{TSS}$$
  • Based on a complete decomposition $ TSS = SSE + RSS $
  • Total Sum of Squares $$ TSS = \sum_{i=1}^n (Y_i - \bar{Y})^2$$
  • Sum of Squared Errors $$SSE = \sum_{i=1}^n \hat{\epsilon}_i^2$$
  • Regression Sum of Squares $$ RSS = \sum_{i=1}^n (\hat{Y}_i - \bar{Y})^2 = \sum_{i=1}^n (\mathbf{x}_i\hat{\boldsymbol{\beta}}- \bar{Y})^2$$

Measuring Fit

In [25]:
ls = smf.ols("BHe ~ 1 + VWMe + SMB + HML + MOM", data).fit(cov_type="HC0")
summary(ls, [0])
Dep. Variable: BHe R-squared: 0.954
Model: OLS Adj. R-squared: 0.954

Measuring Fit

Component and Industry Fits

In [27]:
r2_plot()

Measuring Fit

Shifting variables

$$ BH_i^e + 99 = \beta_1 + \beta_2 VWM_i^e + \beta_3 SMB_i + \beta_4 HML_i + \beta_5 MOM_i + \epsilon_i$$
In [28]:
shifted_mod = "I(BHe + 99) ~ 1 + VWMe + SMB + HML + MOM"
ls_shift = smf.ols(shifted_mod, data).fit(cov_type="HC0")
summary(ls_shift, [0])
Dep. Variable: I(BHe + 99) R-squared: 0.954
Model: OLS Adj. R-squared: 0.954
In [29]:
summary(ls, [0])
Dep. Variable: BHe R-squared: 0.954
Model: OLS Adj. R-squared: 0.954

Measuring Fit

Rescaling variables

$$ \pi BH_i^e = \beta_1 + \beta_2 VWM_i^e + \beta_3 SMB_i + \beta_4 HML_i + \beta_5 MOM_i + \epsilon_i$$
In [30]:
rescaled_mod ="I(np.pi * BHe) ~ 1 + VWMe + SMB + HML + MOM"
ls_scale = smf.ols(rescaled_mod, data).fit(cov_type="HC0")
summary(ls_scale, [0])
Dep. Variable: I(np.pi * BHe) R-squared: 0.954
Model: OLS Adj. R-squared: 0.954
In [31]:
summary(ls, [0])
Dep. Variable: BHe R-squared: 0.954
Model: OLS Adj. R-squared: 0.954

Measuring Fit

Changing the LHS Variable

$$ (BH_i^e - VWM^e_i - HML_i) = \beta_1 + \beta_2 VWM_i^e + \beta_3 SMB_i + \beta_4 HML_i + \beta_5 MOM_i + \epsilon_i$$
In [32]:
model = "I(BHe - VWMe - HML) ~ 1 + VWMe + SMB + HML + MOM"
ls_lhs = smf.ols(model, data).fit(cov_type="HC0")
summary(ls_lhs, [0])
Dep. Variable: I(BHe - VWMe - HML) R-squared: 0.382
Model: OLS Adj. R-squared: 0.378
In [33]:
summary(ls, [0])
Dep. Variable: BHe R-squared: 0.954
Model: OLS Adj. R-squared: 0.954

Measuring fit

Caveats when model excludes the constant

$$ BH_i^e + 99 = \beta_1 VWM_i^e + \beta_2 SMB_i + \beta_3 HML_i + \beta_4 MOM_i + \epsilon_i$$
In [34]:
ls_p99 = smf.ols("I(BHe + 99) ~ VWMe + SMB + HML + MOM - 1", data).fit(cov_type="HC0")
summary(ls_lhs, [0, 1])
Dep. Variable: I(BHe - VWMe - HML) R-squared: 0.382
Model: OLS Adj. R-squared: 0.378
coef std err z P>|z| [0.025 0.975]
Intercept -0.0859 0.043 -1.991 0.046 -0.170 -0.001
VWMe 0.0798 0.012 6.910 0.000 0.057 0.102
SMB 0.0019 0.017 0.110 0.912 -0.032 0.036
HML -0.2357 0.021 -11.219 0.000 -0.277 -0.195
MOM -0.0354 0.013 -2.631 0.009 -0.062 -0.009

Estimating the residual variance

Small-sample corrected estimator

  • Variance of shock estimated using model residuals
$$ s^2 = \frac{1}{n-k}\sum_{i=1}^n \hat{\epsilon}_i^2 = \frac {\hat{\boldsymbol{\epsilon}}^\prime\hat{\boldsymbol{\epsilon}}}{n-k}$$
In [36]:
s2 = eps.T @ eps / (n - k)
pretty(f"{s2:0.3f}")
1.132

Estimating the residual variance

Large-sample estimator

  • Asymptotic results usually use the large-sample version of the variance estimator
$$ \hat{\sigma}^2 = \frac{1}{n}\sum_{i=1}^n \hat{\epsilon}_i^2 = \frac {\hat{\boldsymbol{\epsilon}}^\prime\hat{\boldsymbol{\epsilon}}}{n} $$
In [37]:
sigma2 = eps.T @ eps / n
pretty(f"{sigma2:0.3f}")
1.124

Scores and the first-order condition of OLS

  • The FOC of a regression is $$ \mathbf{X}^\prime \hat{\boldsymbol{\epsilon}} = \sum_{i=1}^n \mathbf{x}^\prime_i \hat{\epsilon}_i = \mathbf{0} $$
  • Estimated residuals are always orthogonal with included regressors
  • Later we will see these can be used to test models if $\approx \mathbf{0}$
In [39]:
scores
Out[39]:
Scores
Intercept 7.056578e-13
VWMe 1.274314e-11
SMB 5.496090e-13
HML 3.228973e-12
MOM -2.858463e-12

Analysis of Cross-Sectional Data

Properties of OLS Estimators

  • Invariance to Affine Transformations
  • Asymptotic Distribution
  • Feasible Central Limit Theorems
  • Bootstrap Estimation of the Covariance

Variable Transformations

Rescaling by a constant

$$ \frac{Y_i}{100} = \beta_1 + \beta_2 \frac{X_{i,2}}{100} + \ldots + \beta_k \frac{X_{i,k}}{100} + \epsilon_i$$
In [40]:
model = "BHe ~ 1 + VWMe + SMB + HML + MOM"
rescaled_ls = smf.ols(model, data / 100.0).fit(cov_type="HC0")
show_params(rescaled_ls, ls, columns=["Rescaled", "Orig"])
Out[40]:
Rescaled Orig
Intercept -0.000859 -0.085899
VWMe 1.079785 1.079785
SMB 0.001894 0.001894
HML 0.764300 0.764300
MOM -0.035397 -0.035397

Variable Transformations

Rescaling single variables

$$ Y_i = \beta_1 + \beta_2 \left(2 VWM^e_i\right) + \beta_3 SMB_i + \beta_4 \frac{HML_i}{2} + \beta_4 MOM_i + \epsilon_i$$
In [41]:
model = "BHe ~ 1 + I(2 * VWMe) + SMB + I(1/2 * HML) + MOM"
ls_p10 = smf.ols(model, data).fit(cov_type="HC0")
show_params(ls_p10, columns=["Plus 10"])
Out[41]:
Plus 10
Intercept -0.085899
I(2 * VWMe) 0.539893
SMB 0.001894
I(1 / 2 * HML) 1.528600
MOM -0.035397

Variable Transformations

Affine Transformations

$$ \left(3 BH^e + 7 \right) = \beta_1 + \beta_2 \left(2 VWM^e_i - 9\right) + \beta_3 \frac{SMB_i}{2} + \beta_4 HML_i + \beta_4 MOM_i + \epsilon_i$$
In [42]:
model = "I(3 * BHe + 7)  ~ 1 + I(2 * VWMe - 9) + I(1/2 *SMB) + HML + MOM"
ls_affine = smf.ols(model, data).fit(cov_type="HC0")
show_params(ls_affine, columns=["Affine"])
Out[42]:
Affine
Intercept 21.319408
I(2 * VWMe - 9) 1.619678
I(1 / 2 * SMB) 0.011361
HML 2.292901
MOM -0.106192

Characterizing Parameter Estimation Error

  • Central Limit Theorem
$$\sqrt{n}\left(\hat{\boldsymbol{\beta}}_n-\boldsymbol{\beta}\right) \stackrel{d}{\rightarrow} N\left(\mathbf{0},\boldsymbol{\Sigma}_{XX}^{-1}\mathbf{S}\boldsymbol{\Sigma}_{XX}^{-1}\right)$$
  • Covariance components $\boldsymbol{\Sigma}_{XX} = E\left[\mathbf{x}_i^\prime\mathbf{x}_i\right]$ and $\mathbf{S} = \mathrm{p}-\lim_{n\rightarrow\infty} \mathrm{Var}\left[\sqrt{n}\frac{1}{n}\sum_{i=1}^n\mathbf{x}_i^\prime \epsilon_i\right] $.

  • In practice

$$\hat{\boldsymbol{\beta}}_n \stackrel{\approx}{\sim}N\left(\boldsymbol{\beta},\frac{\hat{\boldsymbol{\Sigma}}_{XX}^{-1}\hat{\mathbf{S}}\hat{\boldsymbol{\Sigma}}_{XX}^{-1}}{n}\right) $$

Characterizing Parameter Estimation Error

Parameter Covariance Matrix

In [44]:
ls.cov_params()
Out[44]:
Intercept VWMe SMB HML MOM
Intercept 0.001860 -0.000171 0.000079 -0.000157 -0.000154
VWMe -0.000171 0.000133 -0.000060 0.000039 0.000019
SMB 0.000079 -0.000060 0.000297 0.000042 0.000019
HML -0.000157 0.000039 0.000042 0.000441 0.000122
MOM -0.000154 0.000019 0.000019 0.000122 0.000181

Characterizing Parameter Estimation Error

Estimating the Covariance

$$ \hat{\boldsymbol{\Sigma}}_{XX} = \frac{1}{n}\mathbf{X}^\prime\mathbf{X} \text { and } \hat{\mathbf{S}} = \sum_{i=1}^n \hat{\epsilon_i}^2 \mathbf{x}_i^\prime\mathbf{x}_i$$
In [45]:
xe = x * eps
S = xe.T @ xe / n
Sigma_inv = np.linalg.inv(x.T @ x / n)
cov = 1 / n * (Sigma_inv @ S @ Sigma_inv)
cov.index = cov.columns = x.columns
cov
Out[45]:
Intercept VWMe SMB HML MOM
Intercept 0.001860 -0.000171 0.000079 -0.000157 -0.000154
VWMe -0.000171 0.000133 -0.000060 0.000039 0.000019
SMB 0.000079 -0.000060 0.000297 0.000042 0.000019
HML -0.000157 0.000039 0.000042 0.000441 0.000122
MOM -0.000154 0.000019 0.000019 0.000122 0.000181

Characterizing Parameter Estimation Error

Standard Errors

  • Root of diagonal elements of VCV
In [46]:
pretty(ls.bse)
Out[46]:
Intercept 0.043134
VWMe 0.011547
SMB 0.017224
HML 0.021009
MOM 0.013455

Bootstrapping the Covariance

  • Simulate from data to estimate covariance
  • Randomly sample $n$ observation with replacement $\left(y_i,\mathbf{x}_i\right)$
  • Estimate $\hat{\beta}_b$ from random sample
  • Repeat $B$ times
  • Compute covariance from bootstrapped $\hat{\beta}_b$
In [47]:
betas = []
g = np.random.default_rng(2020)
lhs = ls.model.data.orig_endog
for i in range(1000):
    idx = g.integers(n, size=n)
    xb = x.iloc[idx]
    yb = lhs.iloc[idx]
    beta = sm.OLS(yb, xb).fit().params
    betas.append(beta)
betas = pd.DataFrame(betas, columns=x.columns, index=np.arange(1, len(betas) + 1))
betas.index.name = "b"

Boootstrap $\beta$ estimates

In [48]:
betas.head()
Out[48]:
Intercept VWMe SMB HML MOM
b
1 -0.032329 1.073407 -0.002263 0.747751 -0.059899
2 -0.098095 1.082409 0.047483 0.750626 -0.023940
3 -0.037763 1.085002 0.036935 0.776792 -0.025469
4 -0.019000 1.083847 -0.002470 0.689036 -0.053007
5 -0.052940 1.067186 0.032972 0.783204 -0.034621

Comparing the Bootstrap and the Traditional Estimator

In [49]:
betas.cov()
Out[49]:
Intercept VWMe SMB HML MOM
Intercept 0.001868 -0.000182 0.000124 -0.000100 -0.000134
VWMe -0.000182 0.000135 -0.000062 0.000035 0.000015
SMB 0.000124 -0.000062 0.000313 0.000033 0.000018
HML -0.000100 0.000035 0.000033 0.000405 0.000105
MOM -0.000134 0.000015 0.000018 0.000105 0.000182
In [50]:
ls.cov_params()
Out[50]:
Intercept VWMe SMB HML MOM
Intercept 0.001860 -0.000171 0.000079 -0.000157 -0.000154
VWMe -0.000171 0.000133 -0.000060 0.000039 0.000019
SMB 0.000079 -0.000060 0.000297 0.000042 0.000019
HML -0.000157 0.000039 0.000042 0.000441 0.000122
MOM -0.000154 0.000019 0.000019 0.000122 0.000181

Analysis of Cross-Sectional Data

Wald and $t$-tests

  • Linear Equality Hypotheses
  • Testing a Single Restriction with a $t$-tests
  • The $t$-statistic
  • Multiple Restrictions and the Wald tests
  • The $F$-stats

Hypothesis Testing

  • Null in a Linear Equality Test $$ H_0: \mathbf{R}\beta = r$$
  • Three classes of tests
    • Wald and $t$-test
    • Lagrange Multiplier
    • Likelihood Ratio

Hypothesis Testing

$t$-tests

  • Asymptotically normally distributed
  • Test a single restriction
  • Values outside of $\pm 1.96 \approx \pm 2$ lead to rejection using a 5% size
  • Can be used to test 1-sided hypotheses

Hypothesis Testing

$t$-test Example

Testing the additional total effect is 0

$$H_0: SMB + HML + MOM = 0 $$$$ R = \left[0,0,1,1,1\right], r=0 $$
In [51]:
summary(ls)
coef std err z P>|z| [0.025 0.975]
Intercept -0.0859 0.043 -1.991 0.046 -0.170 -0.001
VWMe 1.0798 0.012 93.514 0.000 1.057 1.102
SMB 0.0019 0.017 0.110 0.912 -0.032 0.036
HML 0.7643 0.021 36.380 0.000 0.723 0.805
MOM -0.0354 0.013 -2.631 0.009 -0.062 -0.009

Hypothesis Testing

$t$-test Example

In [52]:
R = np.array([[0, 0, 1, 1, 1]])
c = ls.cov_params()
h0_vcv = np.squeeze(R @ c @ R.T)
t = (R @ ls.params) / np.sqrt(h0_vcv)
pretty(f"The t-test statistic is {t[0]:0.2f}")
The t-test statistic is 20.39

Hypothesis Testing

$t$-test Example on Industry Portfolios

In [55]:
test_plot(t_tests, title="$H_0: SMB + HML + MOM = 0$", two_sided=True)

Hypothesis Testing

$t$-stats

  • $t$-stat is special case for $H_0:\beta_j=0$
  • Most commonly reported test statistic
  • Asymptotic normal
  • 5% critical values $\pm 1.96 \approx \pm 2$
In [56]:
pretty(ls.tvalues)
Out[56]:
Intercept -1.991463
VWMe 93.513503
SMB 0.109934
HML 36.380381
MOM -2.630803

Hypothesis Testing

Significance in Industry Portfolios

In [58]:
multi_test_plot(t_stats)

Hypothesis Testing

Wald Tests

  • Test multiple hypothesis
  • Exploit properties of multivariate normals
  • $\chi^2_m$ distributed in large samples
  • Test statistic is
$$ W = n\left(\mathbf{R}\hat{\boldsymbol{\beta}}-\mathbf{r}\right)^\prime \left[\mathbf{R}\hat{\boldsymbol{\Sigma}}_{XX}^{-1}\hat{\mathbf{S}}\hat{\boldsymbol{\Sigma}}_{XX}^{-1}\mathbf{R}^\prime\right]^{-1} \left(\mathbf{R}\hat{\boldsymbol{\beta}}-\mathbf{r}\right)$$

Hypothesis Testing

Wald Tests

Testing the CAPM

  • Multiple $\beta$ all zero: $H_0: SMB=HML=MOM=0$
$$ R = \left[\begin{array}{ccccc} 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 1 \\ \end{array}\right], r = \left[\begin{array}{c} 0 \\ 0 \\ 0 \end{array}\right] $$

Hypothesis Testing

Testing the CAPM

In [59]:
R, r = np.zeros((3, 5)), np.zeros(3)
R[0, 2] = R[1, 3] = R[2, 4] = 1
h0_vcv = R @ c @ R.T
h0_vcv.columns = h0_vcv.index = [f"Restr {i}" for i in range(1, 4)]
h0_vcv
Out[59]:
Restr 1 Restr 2 Restr 3
Restr 1 0.000297 0.000042 0.000019
Restr 2 0.000042 0.000441 0.000122
Restr 3 0.000019 0.000122 0.000181
In [60]:
numerator = R @ ls.params - r
wald = numerator @ np.linalg.inv(h0_vcv) @ numerator.T
pretty(f"W={wald:0.1f}")
W=1749.6

Wald Tests

Industry Portfolios

In [62]:
dof = 3
pretty(f"The crit. value is {stats.chi2(dof).ppf(0.95):0.2f} from a $\chi^2_{dof}$")
test_plot(walds, cv=stats.chi2(dof).ppf(0.95), title="Wald Statistics")
The crit. value is 7.81 from a $\chi^2_3$