**Open Review**. We want your feedback to make the book better for you and other students. You may annotate some text by selecting it with the cursor and then click the on the pop-up menu. You can also see the annotations of others: click the in the upper right hand corner of the page

## 10.3 Fixed Effects Regression

Consider the panel regression model

\[Y_{it} = \beta_0 + \beta_1 X_{it} + \beta_2 Z_i + u_{it}\]
where the \(Z_i\) are unobserved time-invariant heterogeneities across the entities \(i=1,\dots,n\). We aim to estimate \(\beta_1\), the effect on \(Y_i\) of a change in \(X_i\) holding constant \(Z_i\). Letting \(\alpha_i = \beta_0 + \beta_2 Z_i\) we obtain the model
\[\begin{align}
Y_{it} = \alpha_i + \beta_1 X_{it} + u_{it} \tag{10.1}.
\end{align}\]
Having individual specific intercepts \(\alpha_i\), \(i=1,\dots,n\), where each of these can be understood as the fixed effect of entity \(i\), this model is called the *fixed effects model*.
The variation in the \(\alpha_i\), \(i=1,\dots,n\) comes from the \(Z_i\). (10.1) can be rewritten as a regression model containing \(n-1\) dummy regressors and a constant:
\[\begin{align}
Y_{it} = \beta_0 + \beta_1 X_{it} + \gamma_2 D2_i + \gamma_3 D3_i + \cdots + \gamma_n Dn_i + u_{it} \tag{10.2}.
\end{align}\]
Model (10.2) has \(n\) different intercepts — one for every entity. (10.1) and (10.2) are equivalent representations of the fixed effects model.

The fixed effects model can be generalized to contain more than just one determinant of \(Y\) that is correlated with \(X\) and changes over time. Key Concept 10.2 presents the generalized fixed effects regression model.

### Key Concept 10.2

### The Fixed Effects Regression Model

The fixed effects regression model is

\[\begin{align} Y_{it} = \beta_1 X_{1,it} + \cdots + \beta_k X_{k,it} + \alpha_i + u_{it} \tag{10.3} \end{align}\]

with \(i=1,\dots,n\) and \(t=1,\dots,T\). The \(\alpha_i\) are entity-specific intercepts that capture heterogeneities across entities. An equivalent representation of this model is given by

\[\begin{align} Y_{it} = \beta_0 + \beta_1 X_{1,it} + \cdots + \beta_k X_{k,it} + \gamma_2 D2_i + \gamma_3 D3_i + \cdots + \gamma_n Dn_i + u_{it} \tag{10.4} \end{align}\]

where the \(D2_i,D3_i,\dots,Dn_i\) are dummy variables.

### Estimation and Inference

Software packages use a so-called “entity-demeaned” OLS algorithm which is computationally more efficient than estimating regression models with \(k+n\) regressors as needed for models (10.3) and (10.4).

Taking averages on both sides of (10.1) we obtain \[\begin{align*} \frac{1}{n} \sum_{i=1}^n Y_{it} =& \, \beta_1 \frac{1}{n} \sum_{i=1}^n X_{it} + \frac{1}{n} \sum_{i=1}^n a_i + \frac{1}{n} \sum_{i=1}^n u_{it} \\ \overline{Y} =& \, \beta_1 \overline{X}_i + \alpha_i + \overline{u}_i. \end{align*}\] Subtraction from (10.1) yields \[\begin{align} \begin{split} Y_{it} - \overline{Y}_i =& \, \beta_1(X_{it}-\overline{X}_i) + (u_{it} - \overline{u}_i) \\ \overset{\sim}{Y}_{it} =& \, \beta_1 \overset{\sim}{X}_{it} + \overset{\sim}{u}_{it}. \end{split} \tag{10.5} \end{align}\] In this model, the OLS estimate of the parameter of interest \(\beta_1\) is equal to the estimate obtained using (10.2) — without the need to estimate \(n-1\) dummies and an intercept.

We conclude that there are two ways of estimating \(\beta_1\) in the fixed effects regression:

OLS of the dummy regression model as shown in (10.2)

OLS using the entity demeaned data as in (10.5)

Provided the fixed effects regression assumptions stated in Key Concept 10.3 hold, the sampling distribution of the OLS estimator in the fixed effects regression model is normal in large samples. The variance of the estimates can be estimated and we can compute standard errors, \(t\)-statistics and confidence intervals for coefficients. In the next section, we see how to estimate a fixed effects model using `R` and how to obtain a model summary that reports heteroskedasticity-robust standard errors. We leave aside complicated formulas of the estimators. See Chapter 10.5 and Appendix 10.2 of the book for a discussion of theoretical aspects.

### Application to Traffic Deaths

Following Key Concept 10.2, the simple fixed effects model for estimation of the relation between traffic fatality rates and the beer taxes is \[\begin{align} FatalityRate_{it} = \beta_1 BeerTax_{it} + StateFixedEffects + u_{it}, \tag{10.6} \end{align}\] a regression of the traffic fatality rate on beer tax and 48 binary regressors — one for each federal state.

We can simply use the function `lm()` to obtain an estimate of \(\beta_1\).

```
<- lm(fatal_rate ~ beertax + state - 1, data = Fatalities)
fatal_fe_lm_mod
fatal_fe_lm_mod#>
#> Call:
#> lm(formula = fatal_rate ~ beertax + state - 1, data = Fatalities)
#>
#> Coefficients:
#> beertax stateal stateaz statear stateca stateco statect statede statefl statega stateid stateil statein stateia
#> -0.6559 3.4776 2.9099 2.8227 1.9682 1.9933 1.6154 2.1700 3.2095 4.0022 2.8086 1.5160 2.0161 1.9337
#> stateks stateky statela stateme statemd statema statemi statemn statems statemo statemt statene statenv statenh
#> 2.2544 2.2601 2.6305 2.3697 1.7712 1.3679 1.9931 1.5804 3.4486 2.1814 3.1172 1.9555 2.8769 2.2232
#> statenj statenm stateny statenc statend stateoh stateok stateor statepa stateri statesc statesd statetn statetx
#> 1.3719 3.9040 1.2910 3.1872 1.8542 1.8032 2.9326 2.3096 1.7102 1.2126 4.0348 2.4739 2.6020 2.5602
#> stateut statevt stateva statewa statewv statewi statewy
#> 2.3137 2.5116 2.1874 1.8181 2.5809 1.7184 3.2491
```

As discussed in the previous section, it is also possible to estimate \(\beta_1\) by applying OLS to the demeaned data, that is, to run the regression

\[\overset{\sim}{FatalityRate} = \beta_1 \overset{\sim}{BeerTax}_{it} + u_{it}. \]

```
# obtain demeaned data
<- with(Fatalities,
Fatalities_demeaned data.frame(fatal_rate = fatal_rate - ave(fatal_rate, state),
beertax = beertax - ave(beertax, state)))
# estimate the regression
summary(lm(fatal_rate ~ beertax - 1, data = Fatalities_demeaned))
```

The function `ave` is convenient for computing group averages. We use it to obtain state specific averages of the fatality rate and the beer tax.

Alternatively one may use `plm()` from the package with the same name.

```
# install and load the 'plm' package
## install.packages("plm")
library(plm)
```

As for `lm()` we have to specify the regression formula and the data to be used in our call of `plm()`. Additionally, it is required to pass a vector of names of entity and time ID variables to the argument `index`. For `Fatalities`, the ID variable for entities is named `state` and the time id variable is `year`. Since the fixed effects estimator is also called the *within estimator*, we set `model = “within”`. Finally, the function `coeftest()` allows to obtain inference based on robust standard errors.

```
# estimate the fixed effects regression with plm()
<- plm(fatal_rate ~ beertax,
fatal_fe_mod data = Fatalities,
index = c("state", "year"),
model = "within")
# print summary using robust standard errors
coeftest(fatal_fe_mod, vcov. = vcovHC, type = "HC1")
#>
#> t test of coefficients:
#>
#> Estimate Std. Error t value Pr(>|t|)
#> beertax -0.65587 0.28880 -2.271 0.02388 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

The estimated coefficient is again \(-0.6559\). Note that `plm()` uses the entity-demeaned OLS algorithm and thus does not report dummy coefficients. The estimated regression function is

\[\begin{align} \widehat{FatalityRate} = -\underset{(0.29)}{0.66} \times BeerTax + StateFixedEffects. \tag{10.7} \end{align}\]

The coefficient on \(BeerTax\) is negative and significant. The interpretation is that the estimated reduction in traffic fatalities due to an increase in the real beer tax by \(\$1\) is \(0.66\) per \(10000\) people, which is still pretty high. Although including state fixed effects eliminates the risk of a bias due to omitted factors that vary across states but not over time, we suspect that there are other omitted variables that vary over time and thus cause a bias.