15.4 HAC Standard Errors

This book is in Open Review. We want your feedback to make the book better for you and other students. You may annotate some text by selecting it with the cursor and then click "Annotate" in the pop-up menu. You can also see the annotations of others: click the arrow in the upper right hand corner of the page

The error term \(u_t\) in the distributed lag model (15.2) may be serially correlated due to serially correlated determinants of \(Y_t\) that are not included as regressors. When these factors are not correlated with the regressors included in the model, serially correlated errors do not violate the assumption of exogeneity such that the OLS estimator remains unbiased and consistent.

However, autocorrelated standard errors render the usual homoskedasticity-only and heteroskedasticity-robust standard errors invalid and may cause misleading inference. HAC errors are a remedy.

Key Concept 15.2

HAC Standard errors

Problem:

If the error term \(u_t\) in the distributed lag model (15.2) is serially correlated, statistical inference that rests on usual (heteroskedasticity-robust) standard errors can be strongly misleading.

Solution:

Heteroskedasticity- and autocorrelation-consistent (HAC) estimators of the variance-covariance matrix circumvent this issue. There are R functions like vcovHAC() from the package sandwich which are convenient for computation of such estimators.

The package sandwich also contains the function NeweyWest(), an implementation of the HAC variance-covariance estimator proposed by Newey and West (1987).

Consider the distributed lag regression model with no lags and a single regressor \(X_t\) \[\begin{align*} Y_t = \beta_0 + \beta_1 X_t + u_t, \end{align*}\] with autocorrelated errors. A brief derivation of \[\begin{align} \overset{\sim}{\sigma}^2_{\widehat{\beta}_1} = \widehat{\sigma}^2_{\widehat{\beta}_1} \widehat{f}_t, \tag{15.4} \end{align}\] the so-called Newey-West variance estimator for the variance of the OLS estimator of \(\beta_1\) is presented in Chapter 15.4 of the book. \(\widehat{\sigma}^2_{\widehat{\beta}_1}\) in (15.4) is the heteroskedasticity-robust variance estimate of \(\widehat{\beta}_1\) and \[\begin{align} \widehat{f}_t = 1 + 2 \sum_{j=1}^{m-1} \left(\frac{m-j}{m}\right) \overset{\sim}{\rho}_j \tag{15.5} \end{align}\] is a correction factor that adjusts for serially correlated errors and involves estimates of \(m-1\) autocorrelation coefficients \(\overset{\sim}{\rho}_j\). As it turns out, using the sample autocorrelation as implemented in acf() to estimate the autocorrelation coefficients renders (15.4) inconsistent, see pp. 650-651 of the book for a detailed argument. Therefore, we use a somewhat different estimator. For a time series \(X\) we have \[ \ \overset{\sim}{\rho}_j = \frac{\sum_{t=j+1}^T \hat v_t \hat v_{t-j}}{\sum_{t=1}^T \hat v_t^2}, \ \text{with} \ \hat v= (X_t-\overline{X}) \hat u_t. \] We implement this estimator in the function acf_c() below.

\(m\) in (15.5) is a truncation parameter to be chosen. A rule of thumb for choosing \(m\) is \[\begin{align} m = \left \lceil{0.75 \cdot T^{1/3}}\right\rceil. \tag{15.6} \end{align}\]

We simulate a time series that, as stated above, follows a distributed lag model with autocorrelated errors and then show how to compute the Newey-West HAC estimate of \(SE(\widehat{\beta}_1)\) using R. This is done via two separate but, as we will see, identical approaches: at first we follow the derivation presented in the book step-by-step and compute the estimate “manually”. We then show that the result is exactly the estimate obtained when using the function NeweyWest().

# function that computes rho tilde
acf_c <- function(x, j) {
  return(
    t(x[-c(1:j)]) %*% na.omit(Lag(x, j)) / t(x) %*% x
  )
}

# simulate time series with serially correlated errors
set.seed(1)

N <- 100

eps <- arima.sim(n = N, model = list(ma = 0.5))
X <- runif(N, 1, 10)
Y <- 0.5 * X + eps

# compute OLS residuals
res <- lm(Y ~ X)$res

# compute v
v <- (X - mean(X)) * res

# compute robust estimate of beta_1 variance
var_beta_hat <- 1/N * (1/(N-2) * sum((X - mean(X))^2 * res^2) ) / 
                        (1/N * sum((X - mean(X))^2))^2

# rule of thumb truncation parameter
m <- floor(0.75 * N^(1/3))

# compute correction factor
f_hat_T <- 1 + 2 * sum(
  (m - 1:(m-1))/m * sapply(1:(m - 1), function(i) acf_c(x = v, j = i))
  ) 

# compute Newey-West HAC estimate of the standard error 
sqrt(var_beta_hat * f_hat_T)
#> [1] 0.04036208

For the code to be reusable in other applications, we use sapply() to estimate the \(m-1\) autocorrelations \(\overset{\sim}{\rho}_j\).

# Using NeweyWest():
NW_VCOV <- NeweyWest(lm(Y ~ X), 
              lag = m - 1, prewhite = F, 
              adjust = T)

# compute standard error
sqrt(diag(NW_VCOV))[2]
#>          X 
#> 0.04036208

By choosing lag = m-1 we ensure that the maximum order of autocorrelations used is \(m-1\) — just as in equation (15.5). Notice that we set the arguments prewhite = F and adjust = T to ensure that the formula (15.4) is used and finite sample adjustments are made.

We find that the computed standard errors coincide. Of course, a variance-covariance matrix estimate as computed by NeweyWest() can be supplied as the argument vcov in coeftest() such that HAC \(t\)-statistics and \(p\)-values are provided by the latter.

example_mod <- lm(Y ~ X)
coeftest(example_mod, vcov = NW_VCOV)
#> 
#> t test of coefficients:
#> 
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 0.542310   0.235423  2.3036  0.02336 *  
#> X           0.423305   0.040362 10.4877  < 2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

References

Newey, Whitney K., and Kenneth D. West. 1987. “A Simple, Positive Semi-Definite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix.” Econometrica 55: 703–8.