# Course Structure¶

• Course presented through three channels:
1. Pre-recorded content with a focus on technical aspects of the course
• Designed to be viewed in sequence
• Each module should be short
• Approximately 2 hours of content per week
2. In-person lectures with a focus on applied aspects of the course
• Expected that pre-recorded content has been viewed before the lecture
3. Notes that accompany the lecture content
• Read before or after the lecture or when necessary for additional background
• Slides are primary – material presented during lecturers, either pre-recorded or live is examinable
• Notes are secondary and provide more background for the slides
• Slides are derived from notes so there is a strong correspondence

# Monitoring Your Progress¶

• Self assessment
• Review questions in pre-recorded content
• Multiple choice questions on Canvas made available each week
• Answers available immediately
• Long-form problem distributed each week
• Answers presented in a subsequent class
• Marked Assessment
• Empirical projects applying the material in the lectures
• Both individual and group
• Each empirical assignment will have a written and code component

# Analysis of Cross-Sectional Data¶

## Introduction to Regression Models¶

• Notation
• Factor Models
• Data
• Variable Transformations

# Linear Regression¶

## Scalar Notation¶

$$Y_{i}=\beta_{1}X_{1,i}+\beta_{2}X_{2,i}+\ldots+\beta_{k}X_{k,i}+\epsilon_{i}$$
• $Y_{i}$: Regressand, Dependent Variable, LHS Variable
• $X_{j,i}$: Regressor, also Independent Variable, RHS Variable, Explanatory Variable
• $\epsilon_{i}$: Innovation, also Shock, Error or Disturbance
• $n$ observations, indexed $i=1,2,\ldots,n$
• $k$ regressors, indexed $j=1,2,\ldots,k$

# Linear Regression¶

## Matrix Notation¶

Common to use convenient matrix notation

$$\mathbf{y} = \mathbf{X} \boldsymbol{\beta} + \boldsymbol{\epsilon}$$
• $\mathbf{y}$ is $n$ by $1$
• $\mathbf{X}$ is $n$ by $k$
• $\boldsymbol{\beta}$ is $k$ by $1$
• $\boldsymbol{\epsilon}$ is $n$ by $1$

# Factor Models¶

• Factor models are widely used in finance

• Capital Asset Pricing Model (CAPM)
• Arbitrage Pricing (APT)
• Risk Exposure
• Basic specification $R_{i}=\mathbf{f}_{i}\boldsymbol{\beta}+\epsilon_{i}$

• $R_{i}$: Return on dependent asset, often excess $(R_{i}^{e})$
• $\mathbf{f}_{i}$: $1\times k$ vector of factor innovations
• $\epsilon_{i}$ innovation, $corr(\epsilon_{i},F_{j,i})=0$, $j=1,2,\ldots,k$
• Special Case: CAPM $$R_{i}-R_{i}^{f}=\beta(R_{i}^{m}-R_{i}^{f})+\epsilon_{i}$$ $$R_{i}^{e}=\beta R_{i}^{me}+\epsilon_{i}$$

# Data¶

• Data from the Fama-French 3 factors + Momentum
• $VWM^e$ - Excess return on Value-Weighted-Market
• $SMB$ - Return on the size portfolio
• $HML$ - Return on the value portfolio
• $MOM$ - Return on the momentum portfolio
• Size-Value sorted portfolio return data
• Size
• S: Small
• B: Big
• Value
• H: High BE/ME
• M: Middle BE/ME
• L: Low BE/ME
• 49 Industry Portfolios
• All returns excess except $SMB$, $HML$, $MOM$

# Fama-French Factors¶

## Summary Statistics¶

In [3]:
summ_plot(factors)


# Fama-French Factors¶

## Correlation Structure¶

In [4]:
factors.corr()

Out[4]:
$VWM^e$ $SMB$ $HML$ $MOM$
$VWM^e$ 1.000000 0.300958 -0.226222 -0.149518
$SMB$ 0.300958 1.000000 -0.174962 -0.024014
$HML$ -0.226222 -0.174962 1.000000 -0.195242
$MOM$ -0.149518 -0.024014 -0.195242 1.000000

# Size and Value components¶

In [6]:
summ_plot(components)


# Industry Portfolios¶

In [7]:
summ_plot(subset)


# Variable Transformations¶

• Dummy variables
• 0-1 variables based on an indicator function $$I_{[X_{i,j} > 0]}$$
• Asymmetries at 0
• Monthly Effects
In [9]:
monthly_dummies.head(8)

Out[9]:
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
0 0 0 0 0 0 0 1 0 0 0 0 0
1 0 0 0 0 0 0 0 1 0 0 0 0
2 0 0 0 0 0 0 0 0 1 0 0 0
3 0 0 0 0 0 0 0 0 0 1 0 0
4 0 0 0 0 0 0 0 0 0 0 1 0
5 0 0 0 0 0 0 0 0 0 0 0 1
6 1 0 0 0 0 0 0 0 0 0 0 0
7 0 1 0 0 0 0 0 0 0 0 0 0

# Variable Transformation: Interactions¶

• Interactions dramatically expand the functional forms that can be specified
• Powers and Cross-products: $X_{i,j}^2$, $X_{i,j}X_{i,m}$
• Dummy Interactions to Produce Asymmetries: $X_{i,j} \times I_{[X_{i,j}<0]}$
In [11]:
interactions.tail(10)

Out[11]:
Market Negative Negative Return Squared Returns
2019-11-30 0 0.00 14.9769
2019-12-31 0 0.00 7.6729
2020-01-31 1 -0.11 0.0121
2020-02-29 1 -8.13 66.0969
2020-03-31 1 -13.38 179.0244
2020-04-30 0 0.00 186.3225
2020-05-31 0 0.00 31.1364
2020-06-30 0 0.00 6.0516
2020-07-31 0 0.00 33.2929
2020-08-31 0 0.00 58.0644

# Analysis of Cross-Sectional Data¶

## Parameter Estimation and Model Fit¶

• Parameter Estimation
• Models with Interactions
• Other estimated quantities
• Regression Coefficient in Factor Models

# Parameter Estimation¶

## Least Squares¶

$$\textrm{argmin}_{\beta}\sum_{i=1}^{n}(Y_{i}-\mathbf{x}_{i}\boldsymbol{\beta})^{2}$$
In [13]:
ls = smf.ols("BHe ~ 1 + VWMe + SMB + HML + MOM", data).fit(cov_type="HC0")
summary(ls)

coef std err z P>|z| [0.025 0.975] -0.0859 0.043 -1.991 0.046 -0.170 -0.001 1.0798 0.012 93.514 0.000 1.057 1.102 0.0019 0.017 0.110 0.912 -0.032 0.036 0.7643 0.021 36.380 0.000 0.723 0.805 -0.0354 0.013 -2.631 0.009 -0.062 -0.009

# Parameter Estimation¶

## Least Absolute Deviations¶

$$\textrm{argmin}_{\beta} \sum_{i=1}^{n}|Y_{i}-\mathbf{x}_{i}\boldsymbol{\beta}|$$
In [14]:
lad = smf.quantreg("BHe ~ 1 + VWMe + SMB + HML + MOM", data).fit(q=0.5)
summary(lad)

coef std err t P>|t| [0.025 0.975] -0.0306 0.044 -0.696 0.487 -0.117 0.056 1.0716 0.010 103.257 0.000 1.051 1.092 0.0161 0.015 1.090 0.276 -0.013 0.045 0.7503 0.016 47.702 0.000 0.719 0.781 -0.0272 0.011 -2.581 0.010 -0.048 -0.007

# Estimating Models with Interactions¶

Added an asymmetry and a square of $VWM$ to the 4-factor model

$$Util_i =\beta_1 + \beta_2 VWM_i^e + \beta_3 \left(VWM_i^e\right)^2 + \beta_4 VWM_i^e I_{[VWM_i^e < 0]} + \beta_5 SMB_i + \beta_6 HML_i + \beta_7 MOM_i +\epsilon_i$$
In [15]:
model = f"Util ~ 1 + VWMe + I(VWMe**2) + I(VWMe * (VWMe < 0)) + SMB + HML + MOM"
ls_interact = smf.ols(model, data).fit(cov_type="HC0")
summary(ls_interact)

coef std err z P>|z| [0.025 0.975] 0.2857 0.225 1.268 0.205 -0.156 0.727 0.4594 0.089 5.154 0.000 0.285 0.634 0.0159 0.007 2.240 0.025 0.002 0.030 0.3524 0.188 1.870 0.061 -0.017 0.722 -0.1972 0.048 -4.087 0.000 -0.292 -0.103 0.3470 0.060 5.810 0.000 0.230 0.464 0.0611 0.039 1.578 0.114 -0.015 0.137

# Expected and Fitted Values¶

• Fitted values:
$$\hat{Y}_i = \mathbf{x}_i \hat{\boldsymbol{\beta}}$$
• Expected values:
$$E[Y|X=\mathbf{x}] = \mathbf{x} \hat{\boldsymbol{\beta}}$$

# Expected and Fitted Values¶

In [18]:
plot_market_interactions()


# Typical Regression Coefficients¶

## Factor Components¶

In [20]:
beta_plot(betas, titles)


# Typical Regression Coefficients¶

## Industry Portfolios¶

In [22]:
beta_plot(betas, titles)


# Evidence of Non-linear returns¶

• Add square and asymmetry to 4-factor model
In [24]:
beta_plot(betas, titles)


# Measuring fit¶

• Coefficient of Determination $$R^2 = 1- \frac{SSE}{TSS} = \frac{RSS}{TSS}$$
• Based on a complete decomposition $TSS = SSE + RSS$
• Total Sum of Squares $$TSS = \sum_{i=1}^n (Y_i - \bar{Y})^2$$
• Sum of Squared Errors $$SSE = \sum_{i=1}^n \hat{\epsilon}_i^2$$
• Regression Sum of Squares $$RSS = \sum_{i=1}^n (\hat{Y}_i - \bar{Y})^2 = \sum_{i=1}^n (\mathbf{x}_i\hat{\boldsymbol{\beta}}- \bar{Y})^2$$

# Measuring Fit¶

In [25]:
ls = smf.ols("BHe ~ 1 + VWMe + SMB + HML + MOM", data).fit(cov_type="HC0")
summary(ls, [0])

Dep. Variable: R-squared: BHe 0.954 OLS 0.954

# Measuring Fit¶

## Component and Industry Fits¶

In [27]:
r2_plot()


# Measuring Fit¶

## Shifting variables¶

$$BH_i^e + 99 = \beta_1 + \beta_2 VWM_i^e + \beta_3 SMB_i + \beta_4 HML_i + \beta_5 MOM_i + \epsilon_i$$
In [28]:
shifted_mod = "I(BHe + 99) ~ 1 + VWMe + SMB + HML + MOM"
ls_shift = smf.ols(shifted_mod, data).fit(cov_type="HC0")
summary(ls_shift, [0])

Dep. Variable: R-squared: I(BHe + 99) 0.954 OLS 0.954
In [29]:
summary(ls, [0])

Dep. Variable: R-squared: BHe 0.954 OLS 0.954

# Measuring Fit¶

## Rescaling variables¶

$$\pi BH_i^e = \beta_1 + \beta_2 VWM_i^e + \beta_3 SMB_i + \beta_4 HML_i + \beta_5 MOM_i + \epsilon_i$$
In [30]:
rescaled_mod ="I(np.pi * BHe) ~ 1 + VWMe + SMB + HML + MOM"
ls_scale = smf.ols(rescaled_mod, data).fit(cov_type="HC0")
summary(ls_scale, [0])

Dep. Variable: R-squared: I(np.pi * BHe) 0.954 OLS 0.954
In [31]:
summary(ls, [0])

Dep. Variable: R-squared: BHe 0.954 OLS 0.954

# Measuring Fit¶

## Changing the LHS Variable¶

$$(BH_i^e - VWM^e_i - HML_i) = \beta_1 + \beta_2 VWM_i^e + \beta_3 SMB_i + \beta_4 HML_i + \beta_5 MOM_i + \epsilon_i$$
In [32]:
model = "I(BHe - VWMe - HML) ~ 1 + VWMe + SMB + HML + MOM"
ls_lhs = smf.ols(model, data).fit(cov_type="HC0")
summary(ls_lhs, [0])

Dep. Variable: R-squared: I(BHe - VWMe - HML) 0.382 OLS 0.378
In [33]:
summary(ls, [0])

Dep. Variable: R-squared: BHe 0.954 OLS 0.954

# Measuring fit¶

## Caveats when model excludes the constant¶

$$BH_i^e + 99 = \beta_1 VWM_i^e + \beta_2 SMB_i + \beta_3 HML_i + \beta_4 MOM_i + \epsilon_i$$
In [34]:
ls_p99 = smf.ols("I(BHe + 99) ~ VWMe + SMB + HML + MOM - 1", data).fit(cov_type="HC0")
summary(ls_lhs, [0, 1])

Dep. Variable: R-squared: I(BHe - VWMe - HML) 0.382 OLS 0.378
coef std err z P>|z| [0.025 0.975] -0.0859 0.043 -1.991 0.046 -0.170 -0.001 0.0798 0.012 6.910 0.000 0.057 0.102 0.0019 0.017 0.110 0.912 -0.032 0.036 -0.2357 0.021 -11.219 0.000 -0.277 -0.195 -0.0354 0.013 -2.631 0.009 -0.062 -0.009

# Estimating the residual variance¶

## Small-sample corrected estimator¶

• Variance of shock estimated using model residuals
$$s^2 = \frac{1}{n-k}\sum_{i=1}^n \hat{\epsilon}_i^2 = \frac {\hat{\boldsymbol{\epsilon}}^\prime\hat{\boldsymbol{\epsilon}}}{n-k}$$
In [36]:
s2 = eps.T @ eps / (n - k)
pretty(f"{s2:0.3f}")

1.132

# Estimating the residual variance¶

## Large-sample estimator¶

• Asymptotic results usually use the large-sample version of the variance estimator
$$\hat{\sigma}^2 = \frac{1}{n}\sum_{i=1}^n \hat{\epsilon}_i^2 = \frac {\hat{\boldsymbol{\epsilon}}^\prime\hat{\boldsymbol{\epsilon}}}{n}$$
In [37]:
sigma2 = eps.T @ eps / n
pretty(f"{sigma2:0.3f}")

1.124

# Scores and the first-order condition of OLS¶

• The FOC of a regression is $$\mathbf{X}^\prime \hat{\boldsymbol{\epsilon}} = \sum_{i=1}^n \mathbf{x}^\prime_i \hat{\epsilon}_i = \mathbf{0}$$
• Estimated residuals are always orthogonal with included regressors
• Later we will see these can be used to test models if $\approx \mathbf{0}$
In [39]:
scores

Out[39]:
Scores
Intercept 6.605827e-13
VWMe 1.290656e-11
SMB 5.131937e-13
HML 2.596146e-12
MOM -2.812278e-12

# Analysis of Cross-Sectional Data¶

## Properties of OLS Estimators¶

• Invariance to Affine Transformations
• Asymptotic Distribution
• Feasible Central Limit Theorems
• Bootstrap Estimation of the Covariance

# Variable Transformations¶

## Rescaling by a constant¶

$$\frac{Y_i}{100} = \beta_1 + \beta_2 \frac{X_{i,2}}{100} + \ldots + \beta_k \frac{X_{i,k}}{100} + \epsilon_i$$
In [40]:
model = "BHe ~ 1 + VWMe + SMB + HML + MOM"
rescaled_ls = smf.ols(model, data / 100.0).fit(cov_type="HC0")
show_params(rescaled_ls, ls, columns=["Rescaled", "Orig"])

Out[40]:
Rescaled Orig
Intercept -0.000859 -0.085899
VWMe 1.079785 1.079785
SMB 0.001894 0.001894
HML 0.764300 0.764300
MOM -0.035397 -0.035397

# Variable Transformations¶

## Rescaling single variables¶

$$Y_i = \beta_1 + \beta_2 \left(2 VWM^e_i\right) + \beta_3 SMB_i + \beta_4 \frac{HML_i}{2} + \beta_4 MOM_i + \epsilon_i$$
In [41]:
model = "BHe ~ 1 + I(2 * VWMe) + SMB + I(1/2 * HML) + MOM"
ls_p10 = smf.ols(model, data).fit(cov_type="HC0")
show_params(ls_p10, columns=["Plus 10"])

Out[41]:
Plus 10
Intercept -0.085899
I(2 * VWMe) 0.539893
SMB 0.001894
I(1 / 2 * HML) 1.528600
MOM -0.035397

# Variable Transformations¶

## Affine Transformations¶

$$\left(3 BH^e + 7 \right) = \beta_1 + \beta_2 \left(2 VWM^e_i - 9\right) + \beta_3 \frac{SMB_i}{2} + \beta_4 HML_i + \beta_4 MOM_i + \epsilon_i$$
In [42]:
model = "I(3 * BHe + 7)  ~ 1 + I(2 * VWMe - 9) + I(1/2 *SMB) + HML + MOM"
ls_affine = smf.ols(model, data).fit(cov_type="HC0")
show_params(ls_affine, columns=["Affine"])

Out[42]:
Affine
Intercept 21.319408
I(2 * VWMe - 9) 1.619678
I(1 / 2 * SMB) 0.011361
HML 2.292901
MOM -0.106192

# Characterizing Parameter Estimation Error¶

• Central Limit Theorem
$$\sqrt{n}\left(\hat{\boldsymbol{\beta}}_n-\boldsymbol{\beta}\right) \stackrel{d}{\rightarrow} N\left(\mathbf{0},\boldsymbol{\Sigma}_{XX}^{-1}\mathbf{S}\boldsymbol{\Sigma}_{XX}^{-1}\right)$$
• Covariance components $\boldsymbol{\Sigma}_{XX} = E\left[\mathbf{x}_i^\prime\mathbf{x}_i\right]$ and $\mathbf{S} = \mathrm{p}-\lim_{n\rightarrow\infty} \mathrm{Var}\left[\sqrt{n}\frac{1}{n}\sum_{i=1}^n\mathbf{x}_i^\prime \epsilon_i\right]$.

• In practice

$$\hat{\boldsymbol{\beta}}_n \stackrel{\approx}{\sim}N\left(\boldsymbol{\beta},\frac{\hat{\boldsymbol{\Sigma}}_{XX}^{-1}\hat{\mathbf{S}}\hat{\boldsymbol{\Sigma}}_{XX}^{-1}}{n}\right)$$

# Characterizing Parameter Estimation Error¶

## Parameter Covariance Matrix¶

In [44]:
ls.cov_params()

Out[44]:
Intercept VWMe SMB HML MOM
Intercept 0.001860 -0.000171 0.000079 -0.000157 -0.000154
VWMe -0.000171 0.000133 -0.000060 0.000039 0.000019
SMB 0.000079 -0.000060 0.000297 0.000042 0.000019
HML -0.000157 0.000039 0.000042 0.000441 0.000122
MOM -0.000154 0.000019 0.000019 0.000122 0.000181

# Characterizing Parameter Estimation Error¶

## Estimating the Covariance¶

$$\hat{\boldsymbol{\Sigma}}_{XX} = \frac{1}{n}\mathbf{X}^\prime\mathbf{X} \text { and } \hat{\mathbf{S}} = \sum_{i=1}^n \hat{\epsilon_i}^2 \mathbf{x}_i^\prime\mathbf{x}_i$$
In [45]:
xe = x * eps
S = xe.T @ xe / n
Sigma_inv = np.linalg.inv(x.T @ x / n)
cov = 1 / n * (Sigma_inv @ S @ Sigma_inv)
cov.index = cov.columns = x.columns
cov

Out[45]:
Intercept VWMe SMB HML MOM
Intercept 0.001860 -0.000171 0.000079 -0.000157 -0.000154
VWMe -0.000171 0.000133 -0.000060 0.000039 0.000019
SMB 0.000079 -0.000060 0.000297 0.000042 0.000019
HML -0.000157 0.000039 0.000042 0.000441 0.000122
MOM -0.000154 0.000019 0.000019 0.000122 0.000181

# Characterizing Parameter Estimation Error¶

## Standard Errors¶

• Root of diagonal elements of VCV
In [46]:
pretty(ls.bse)

Out[46]:
Intercept 0.043134
VWMe 0.011547
SMB 0.017224
HML 0.021009
MOM 0.013455

# Bootstrapping the Covariance¶

• Simulate from data to estimate covariance
• Randomly sample $n$ observation with replacement $\left(y_i,\mathbf{x}_i\right)$
• Estimate $\hat{\beta}_b$ from random sample
• Repeat $B$ times
• Compute covariance from bootstrapped $\hat{\beta}_b$
In [47]:
betas = []
g = np.random.default_rng(2020)
lhs = ls.model.data.orig_endog
for i in range(1000):
idx = g.integers(n, size=n)
xb = x.iloc[idx]
yb = lhs.iloc[idx]
beta = sm.OLS(yb, xb).fit().params
betas.append(beta)
betas = pd.DataFrame(betas, columns=x.columns, index=np.arange(1, len(betas) + 1))
betas.index.name = "b"


# Boootstrap $\beta$ estimates¶

In [48]:
betas.head()

Out[48]:
Intercept VWMe SMB HML MOM
b
1 -0.032329 1.073407 -0.002263 0.747751 -0.059899
2 -0.098095 1.082409 0.047483 0.750626 -0.023940
3 -0.037763 1.085002 0.036935 0.776792 -0.025469
4 -0.019000 1.083847 -0.002470 0.689036 -0.053007
5 -0.052940 1.067186 0.032972 0.783204 -0.034621

# Comparing the Bootstrap and the Traditional Estimator¶

In [49]:
betas.cov()

Out[49]:
Intercept VWMe SMB HML MOM
Intercept 0.001868 -0.000182 0.000124 -0.000100 -0.000134
VWMe -0.000182 0.000135 -0.000062 0.000035 0.000015
SMB 0.000124 -0.000062 0.000313 0.000033 0.000018
HML -0.000100 0.000035 0.000033 0.000405 0.000105
MOM -0.000134 0.000015 0.000018 0.000105 0.000182
In [50]:
ls.cov_params()

Out[50]:
Intercept VWMe SMB HML MOM
Intercept 0.001860 -0.000171 0.000079 -0.000157 -0.000154
VWMe -0.000171 0.000133 -0.000060 0.000039 0.000019
SMB 0.000079 -0.000060 0.000297 0.000042 0.000019
HML -0.000157 0.000039 0.000042 0.000441 0.000122
MOM -0.000154 0.000019 0.000019 0.000122 0.000181

# Analysis of Cross-Sectional Data¶

## Wald and $t$-tests¶

• Linear Equality Hypotheses
• Testing a Single Restriction with a $t$-tests
• The $t$-statistic
• Multiple Restrictions and the Wald tests
• The $F$-stats

# Hypothesis Testing¶

• Null in a Linear Equality Test
$$H_0: \mathbf{R}\beta = \mathbf{r}$$
• Three classes of tests
• Wald and $t$-test
• Lagrange Multiplier
• Likelihood Ratio

# Hypothesis Testing¶

## $t$-tests¶

• Asymptotically normally distributed
• Test a single restriction
• Values outside of $\pm 1.96 \approx \pm 2$ lead to rejection using a 5% size
• Can be used to test 1-sided hypotheses

# Hypothesis Testing¶

## $t$-test Example¶

Testing the additional total effect is 0

$$H_0: SMB + HML + MOM = 0$$$$R = \left[0,0,1,1,1\right], r=0$$
In [51]:
summary(ls)

coef std err z P>|z| [0.025 0.975] -0.0859 0.043 -1.991 0.046 -0.170 -0.001 1.0798 0.012 93.514 0.000 1.057 1.102 0.0019 0.017 0.110 0.912 -0.032 0.036 0.7643 0.021 36.380 0.000 0.723 0.805 -0.0354 0.013 -2.631 0.009 -0.062 -0.009

# Hypothesis Testing¶

## $t$-test Example¶

In [52]:
R = np.array([[0, 0, 1, 1, 1]])
c = ls.cov_params()
h0_vcv = np.squeeze(R @ c @ R.T)
t = (R @ ls.params) / np.sqrt(h0_vcv)
pretty(f"The t-test statistic is {t[0]:0.2f}")

The t-test statistic is 20.39

# Hypothesis Testing¶

## $t$-test Example on Industry Portfolios¶

In [55]:
test_plot(t_tests, title="$H_0: SMB + HML + MOM = 0$", two_sided=True)


# Hypothesis Testing¶

## $t$-stats¶

• $t$-stat is special case for $H_0:\beta_j=0$
• Most commonly reported test statistic
• Asymptotic normal
• 5% critical values $\pm 1.96 \approx \pm 2$
In [56]:
pretty(ls.tvalues)

Out[56]:
Intercept -1.991463
VWMe 93.513503
SMB 0.109934
HML 36.380381
MOM -2.630803

# Hypothesis Testing¶

## Significance in Industry Portfolios¶

In [58]:
multi_test_plot(t_stats)


# Hypothesis Testing¶

## Wald Tests¶

• Test multiple hypothesis
• Null is
$$H_0: \mathbf{R}\boldsymbol{\beta} = \mathbf{r}$$
• Exploit properties of multivariate normal CLT
• $\chi^2_m$ distributed in large samples
• Test statistic is
$$W = n\left(\mathbf{R}\hat{\boldsymbol{\beta}}-\mathbf{r}\right)^\prime \left[\mathbf{R}\hat{\boldsymbol{\Sigma}}_{XX}^{-1}\hat{\mathbf{S}}\hat{\boldsymbol{\Sigma}}_{XX}^{-1}\mathbf{R}^\prime\right]^{-1} \left(\mathbf{R}\hat{\boldsymbol{\beta}}-\mathbf{r}\right)$$

# Hypothesis Testing¶

## Wald Tests¶

### Testing the CAPM¶

• Multiple $\beta$ all zero:
$$H_0: SMB=HML=MOM=0$$
• Restriction matrix and value
$$R = \left[\begin{array}{ccccc} 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 1 \\ \end{array}\right], r = \left[\begin{array}{c} 0 \\ 0 \\ 0 \end{array}\right]$$

# Hypothesis Testing¶

## Testing the CAPM¶

In [59]:
R, r = np.zeros((3, 5)), np.zeros(3)
R[0, 2] = R[1, 3] = R[2, 4] = 1
h0_vcv = R @ c @ R.T
h0_vcv.columns = h0_vcv.index = [f"Restr {i}" for i in range(1, 4)]
h0_vcv

Out[59]:
Restr 1 Restr 2 Restr 3
Restr 1 0.000297 0.000042 0.000019
Restr 2 0.000042 0.000441 0.000122
Restr 3 0.000019 0.000122 0.000181
In [60]:
numerator = R @ ls.params - r
wald = numerator @ np.linalg.inv(h0_vcv) @ numerator.T
pretty(f"W={wald:0.1f}")

W=1749.6

# Wald Tests¶

## Industry Portfolios¶

In [62]:
dof = 3
pretty(f"The crit. value is {stats.chi2(dof).ppf(0.95):0.2f} from a $\chi^2_{dof}$")
test_plot(walds, cv=stats.chi2(dof).ppf(0.95), title="Wald Statistics")

The crit. value is 7.81 from a $\chi^2_3$