**Open Review**. We want your feedback to make the book better for you and other students. You may annotate some text by selecting it with the cursor and then click the on the pop-up menu. You can also see the annotations of others: click the in the upper right hand corner of the page

## 13.5 Exercises

The subsequent exercises guide you in reproducing some of the results presented in one of the most famous DID studies by Card and Krueger (1994). The authors use geography as the “as if” random treatment assignment to study the effect on employment in fast food restaurants caused by an increase in the state minimum wage in New Jersey in the year of 1992, see Chapter 13.4.

The study is based on survey data collected in February 1992 and in November 1992, after New Jersey’s minimum wage rose by \(\$0.80\) from \(\$4.25\) to \(\$5.05\) in April 1992.

Estimating the effect of the wage increase simply by computing the change in employment in New Jersey (as you are asked to do in Exercise 3) would fail to control for omitted variables. By using Pennsylvania as a control in a difference-in-differences (DID) model one can control for variables with a common influence on New Jersey (treatment group) and Pennsylvania (control group). This reduces the risk of omitted variable bias enormously and even works when these variables are unobserved.

For the DID approach to work we must assume that New Jersey and Pennsylvania have parallel trends over time, i.e., we assume that the (unobserved) factors influence employment in Pennsylvania and New Jersey in the same manner. This allows to interpret an observed change in employment in Pennsylvania as the change New Jersey would have experienced if there was no increase in minimum wage (and vice versa).

Against to what standard economic theory would suggest, the authors did not find evidence that the increased minimum wage induced an increase in unemployment in New Jersey using the DID approach: quite the contrary, their results suggest that the \(\$0.80\) minimum wage increase in New Jersey led to a 2.75 full-time equivalent (FTE) increase in employment.

#### 1. The Data from Card & Krueger (1994)

`fastfood.dat`, the dataset used by Card & Krueger (1994) can be downloaded here. See this link for a detailed explanation of the variables.

This exercise asks you to import the dataset in `R` and to perform some formatting necessary for the subsequent analysis. This can be tedious using base `R` functions but is easily done using the `dplyr` package introducted in Chapter 3.6.

The URL to the dataset is saved in `data_URL`.

**Instructions:**

Attach the packages

`dplyr`and`foreign`.Read in the dataset

`fastfood.dta`using`data_URL`and assign it to a`data.frame`named`dat`.

In their study, Card & Krueger (1994) measure employment in full time equivalents which they define as the number of full time employees (`empft` and `empft2`) plus the number of managers (`nmgrs` and `nmgrs2`) plus 0.5 times the number part-time employees (`emppt` / `emppt2`).
- Define full-time employment before (`FTE`) and after the wage increase (`FTE2`) and add both variables to `dat`

**Hints:**

`read.dta()`from the`foreign`package reads`.dta`files, a format used by the statistical software package*STATA*.`mutate()`generates new columns using existing ones.

#### 2. State Specific Estimates of Full-Time Employment — I

This exercise asks you to perform a quick calculation of state specific sample means in order to check whether our data on full-time employment is in alignment with the data used by Card & Krueger (1994).

**Instructions:**

Generate subsets of

`dat`to seperate observations for New Jersey and Pennsylvania. Save them as`dat_NJ`and`dat_PA`.Compute sample means of full-time employment equivalents for New Jersey and Pennsylvania both before and after the minium wage increase in New Jersey. It suffices if your code prints the correct values to the console.

**Hints:**

- You may use
`group_by()`in conjunction with`summarise()`to compute groupwise means. Both function come with the`dplyr`package.

#### 3. State Specific Estimates of Full-Time Employment — II

A naive approach to investigate the impact of the minimum wage increase on employment is to use the estimated difference in mean employment before and after the wage increase for New Jersey fast food restaurants.

This exercise asks you to do the aforementioned and further to test if the estimated difference is significantly different from zero using a *robust* \(t\)-test.

The subsets `dat_NJ` and `dat_PA` from the previous exercise are available in your working environment.

**Instructions:**

- Use
`dat_NJ`for a robust test of the hypothesis that there is no difference in full-time employment before and after the wage hike in New Jersey at the level of \(5\%\).

**Hints:**

- The testing problem amounts to a two-sample \(t\)-test which is conveniently done using
`t.test()`.

#### 4. Preparing the Data for Regression

The estimations done in Exercise 3 and the difference-in-differences approach we are working towards can be shown to produce the same results OLS applied to specific regression models, see Chapters 13.1 and 3.6.

This exercise asks you to construct a dataset which is more convenient for this purpose than the dataset `dat`.

**Instructions:**

Generate the dataset `reg_dat` from `dat` in *long format*, i.e., make sure that for each restaurant (identified by `sheet`) one observation before and one after the minimum wage increase (identified by `D`) are included.

Only consider the following variables:

`id`: sheet number (unique store id)`chain`: chain 1=Burger King; 2=KFC; 3=Roy Rogers; 4=Wendys`state`: 1 if New Jersey; 0 if Pennsylvania`empl`: measure of full-time employment (`FTE`/`FTE2`)`D`: dummy indicating if the observation was made before or after the minimum wage increase in New Jersey.

**Hints:**

The original dataset

`dat`has 410 observations of 48 variables (check this using`dim(dat)`). The dataset`reg_dat`you are asked to generate must consist of 820 observations of the variables listed above.It is straightforward to generate a

`data.frame`from the columns of another`data.frame`using`data.frame(…)`.Use

`rbind()`to combine two objects of type`data.frame`by row.

#### 5. A Difference Estimate using Data from Card & Krueger (1994) — II

`reg_dat` from Exercise 4 is a *panel dataset* as it has two observations for each fast food restaurant \(i=1,\dots,410\), at time periods \(t=0,1\).

Thus we may write down the simple regression model

\[employment_{i,t} = \beta_0 + \beta_1 D_t + \varepsilon_{i,t},\]

where \(D_t\) is a dummy variable which equals \(0\) if the observation was made before the minimum wage change (\(t=0\)) and \(1\) after the minimum wage change (\(t=1\)), i.e.,

\[\begin{align*} D_t = \begin{cases} 0, & \, \text{if $t=0$ (before wage change),} \\ 1, & \, \text{if $t=1$ (after wage change)} \end{cases} \end{align*}\]

and assume that observations for *New Jersey restaurants only* are used in computing \(\hat\beta_1\), the OLS estimator of \(\beta_1\), which is also called the *differences estimator*.

The dataset `reg_dat` from Exercise 4 and the New Jersey subset `dat_NJ` are available in your working environment.

**Instructions:**

Estimate \(\beta_1\) in the model above using OLS. Save the estimated model to

`emp_mod`.Obtain a robust summary of the results and interpret your findings.

**Hints:**

Remember that dependencies of the

`AER`package include functions for robust inference on regression models.The argument

`subset`in`lm()`takes a logical vector which identifies observations used for estimation.

#### 6. A Difference Estimate using Data from Card & Krueger (1994) — II

The estimate obtained using `t.test()` on the New Jersey subset in Exercise 3 and the OLS estimate of \(\hat\beta_1\) in Exercise 5 are numercially the same. This also holds for the reported \(t\)-statistics if the same standard error formulas are used (`t.test(…, var.equal = T)` and `coeftest(…, vcov. = vcovHC, type = “HC1”)`).

This exercise asks you to check that the above statement is true.

The data from the previous exercises, the result of `t.test(…)` from Exercise 3 as well as the regression model object `emp_mod` from Exercise 5 are available in your working environment. The `AER` package has been attached.

*No submission correctness tests are run.*

**Instructions:**

Check that the estimate of \(\beta_1\) in Exercise 5 is equal to the estimated difference in mean employment of New Jersey fastfood restaurants before and after the minimum wage increase from Exercise 3.

Convince yourself that the \(t\)-statistics reported by

`coeftest(…)`in Exercise 5 and`t.test(…)`in Exercise 3 match.

#### 7. A Difference-in-Differences Estimate — II

As mentioned in Chapter 3.6, the approach discussed in Exercises 5 and 6 is naive: \(\hat\beta_1\) is a biased estimate of the average effect of the minimum wage increase on employment because we cannot control for other determinants of employment that correlate with \(D_t\). As an example, think about macro-economic developments which have a positive impact on the labor market such that employment is higher in the period after the minimum wage increase. It is likely that \(D_t\) is positively correlated with the error term such that \(\hat\beta_1\) *overestimates* the effect of the wage hike on employment.

This motivates usage of the difference-in-differences (DID) estimator outlined in Chapter 3.6.

Consider the liner regression model

\[employment_{i,t} = \beta_0 + \beta_1 D_t + \beta_2 state_i + \beta_3 (D_t \times state_i) + \varepsilon_{i,t},\]

where we use indices \(i\) and \(t\) just as in the simple regression model in Exercise 5. In this model, \(\beta_3\) is the coefficient we are interested in as it is interpreted as the average difference in employment of New Jersey fastfood restaurants before and after the wage increase after controlling for unobservables which are common to New Jersey and Pennsylvania, the control group. The OLS estimator of \(\beta_3\) is called a DID estimator.

**Instructions:**

Estimate the above model using OLS and obtain a robust summary.

Interpret your findings.

### References

*The American Economic Review*84 (4): 772–93.