**Open Review**. We want your feedback to make the book better for you and other students. You may annotate some text by selecting it with the cursor and then click "Annotate" in the pop-up menu. You can also see the annotations of others: click the arrow in the upper right hand corner of the page

## 14.6 Lag Length Selection using Information Criteria

The selection of lag lengths in AR and ADL models can sometimes be guided by economic theory. However, there are statistical methods that are helpful to determine how many lags should be included as regressors. In general, too many lags inflate the standard errors of coefficient estimates and thus imply an increase in the forecast error while omitting lags that should be included in the model may result in an estimation bias.

The order of an AR model can be determined using two approaches:

**The F-test approach**Estimate an AR(\(p\)) model and test the significance of the largest lag(s). If the test indicates that a particular lag(s) is not significant, we can consider removing it from the model. This approach has the tendency to produce models where the order is too large: in a significance test we always face the risk of rejecting a true null hypothesis!

**Relying on an information criterion**To circumvent the issue of producing too large models, one may choose the lag order that minimizes one of the following two information criteria:

The

*Bayes information criterion*(BIC):\[BIC(p) = \log\left(\frac{SSR(p)}{T}\right) + (p + 1) \frac{\log(T)}{T}.\]

The

*Akaike information criterion*(AIC):\[AIC(p) = \log\left(\frac{SSR(p)}{T}\right) + (p + 1) \frac{2}{T}.\]

Both criteria are estimators of the optimal lag length \(p\). The lag order \(\widehat{p}\) that minimizes the respective criterion is called the

*BIC estimate*or the*AIC estimate*of the optimal model order. The basic idea of both criteria is that the \(SSR\) decreases as additional lags are added to the model such that the first term decreases whereas the second increases as the lag order grows. One can show that the the \(BIC\) is a consistent estimator of the true lag order while the AIC is not which is due to the differing factors in the second addend. Nevertheless, both estimators are used in practice where the \(AIC\) is sometimes used as an alternative when the \(BIC\) yields a model with “too few” lags.

The function `dynlm()` does not compute information criteria by default. We will therefore write a short function that reports the \(BIC\) (along with the chosen lag order \(p\) and \(\bar{R}^2\)) for objects of class `dynlm`.

```
# compute BIC for AR model objects of class 'dynlm'
BIC <- function(model) {
ssr <- sum(model$residuals^2)
t <- length(model$residuals)
npar <- length(model$coef)
return(
round(c("p" = npar - 1,
"BIC" = log(ssr/t) + npar * log(t)/t,
"Adj.R2" = summary(model)$adj.r.squared), 4)
)
}
```

Table 14.3 of the book presents a breakdown of how the \(BIC\) is computed for AR(\(p\)) models of GDP growth with order \(p=1,\dots,6\). The final result can easily be reproduced using `sapply()` and the function `BIC()` defined above.

```
# apply the BIC() to an intercept-only model of GDP growth
BIC(dynlm(ts(GDPGR_level) ~ 1))
#> p BIC Adj.R2
#> 0.0000 2.4394 0.0000
# loop BIC over models of different orders
order <- 1:6
BICs <- sapply(order, function(x)
"AR" = BIC(dynlm(ts(GDPGR_level) ~ L(ts(GDPGR_level), 1:x))))
BICs
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> p 1.0000 2.0000 3.0000 4.0000 5.0000 6.0000
#> BIC 2.3486 2.3475 2.3774 2.4034 2.4188 2.4429
#> Adj.R2 0.1099 0.1339 0.1303 0.1303 0.1385 0.1325
```

Note that increasing the lag order increases \(R^2\) because the \(SSR\) decreases as additional lags are added to the model. However, \(\bar{R}^2\) takes into account the number of parameters in the model and adjusts for the increase in \(R^2\) due to adding more variables, but according to the \(BIC\), we should settle for the AR(\(2\)) model instead of the AR(\(5\)) model. It helps us to decide whether the decrease in \(SSR\) is enough to justify adding an additional regressor.

If we had to compare a bigger set of models, a convenient way to select the model with the lowest \(BIC\) is using the function `which.min()`.

```
# select the AR model with the smallest BIC
BICs[, which.min(BICs[2, ])]
#> p BIC Adj.R2
#> 2.0000 2.3475 0.1339
```

The \(BIC\) may also be used to select lag lengths in time series regression models with multiple predictors. In a model with \(K\) coefficients, including the intercept, we have \[\begin{align*} BIC(K) = \log\left(\frac{SSR(K)}{T}\right) + K \frac{\log(T)}{T}. \end{align*}\] Notice that choosing the optimal model according to the \(BIC\) can be computationally demanding because there may be many different combinations of lag lengths when there are multiple predictors.

To give an example, we estimate ADL(\(p\),\(q\)) models of GDP growth where, as above, the additional variable is the term spread between short-term and long-term bonds. We impose the restriction that \(p=q_1=\dots=q_k\) so that only \(p_{max}\) models (\(p=1,\dots,p_{max}\)) need to be estimated. In the example below we choose \(p_{max} = 12\).

```
# loop 'BIC()' over multiple ADL models
order <- 1:12
BICs <- sapply(order, function(x)
BIC(dynlm(GDPGrowth_ts ~ L(GDPGrowth_ts, 1:x) + L(TSpread_ts, 1:x),
start = c(1962, 1), end = c(2012, 4))))
BICs
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
#> p 2.0000 4.0000 6.0000 8.0000 10.0000 12.0000 14.0000 16.0000 18.0000
#> BIC 2.3411 2.3408 2.3813 2.4181 2.4568 2.5048 2.5539 2.6029 2.6182
#> Adj.R2 0.1332 0.1692 0.1704 0.1747 0.1773 0.1721 0.1659 0.1586 0.1852
#> [,10] [,11] [,12]
#> p 20.0000 22.0000 24.0000
#> BIC 2.6646 2.7205 2.7664
#> Adj.R2 0.1864 0.1795 0.1810
```

From the definition of `BIC()`, for ADL models with \(p=q\) it follows that `p` reports the number of estimated coefficients *excluding* the intercept. Thus the lag order is obtained by dividing `p` by 2.

```
# select the ADL model with the smallest BIC
BICs[, which.min(BICs[2, ])]
#> p BIC Adj.R2
#> 4.0000 2.3408 0.1692
```

The \(BIC\) is in favor of the ADL(\(2\),\(2\)) model (14.5) we have estimated before.