**Open Review**. We want your feedback to make the book better for you and other students. You may annotate some text by selecting it with the cursor and then click "Annotate" in the pop-up menu. You can also see the annotations of others: click the arrow in the upper right hand corner of the page

## 12.4 Application to the Demand for Cigarettes

Are the general sales tax and the cigarette-specific tax valid instruments? If not, TSLS is not helpful to estimate the demand elasticity for cigarettes discussed in Chapter 12.2. As discussed in Chapter 12.1, both variables are likely to be relevant but whether they are exogenous is a different question.

The book argues that cigarette-specific taxes could be endogenous because there might be state specific historical factors like economic importance of the tobacco farming and cigarette production industry that lobby for low cigarette specific taxes. Since it is plausible that tobacco growing states have higher rates of smoking than others, this would lead to endogeneity of cigarette specific taxes. If we had data on the size of the tobacco and cigarette industry, we could solve this potential issue by including the information in the regression. Unfortunately, this is not the case.

However, since the role of the tobacco and cigarette industry is a factor that can be assumed to differ across states but not over time we may exploit the panel structure of `CigarettesSW` instead: as shown in Chapter 10.2, regression using data on *changes* between two time periods eliminates such state specific and time invariant effects. Following the book we consider changes in variables between 1985 and 1995. That is, we are interested in estimating the *long-run elasticity* of the demand for cigarettes.

The model to be estimated by TSLS using the general sales tax and the cigarette-specific sales tax as instruments hence is

\[\begin{align} \begin{split} \log(Q_{i,1995}^{cigarettes}) - \log(Q_{i,1985}^{cigarettes}) =& \, \beta_0 + \beta_1 \left[\log(P_{i,1995}^{cigarettes}) - \log(P_{i,1985}^{cigarettes}) \right] \\ &+ \beta_2 \left[\log(income_{i,1995}) - \log(income_{i,1985})\right] + u_i. \end{split}\tag{12.10} \end{align}\]

We first create differences from 1985 to 1995 for the dependent variable, the regressors and both instruments.

```
# subset data for year 1985
c1985 <- subset(CigarettesSW, year == "1985")
# define differences in variables
packsdiff <- log(c1995$packs) - log(c1985$packs)
pricediff <- log(c1995$price/c1995$cpi) - log(c1985$price/c1985$cpi)
incomediff <- log(c1995$income/c1995$population/c1995$cpi) -
log(c1985$income/c1985$population/c1985$cpi)
salestaxdiff <- (c1995$taxs - c1995$tax)/c1995$cpi -
(c1985$taxs - c1985$tax)/c1985$cpi
cigtaxdiff <- c1995$tax/c1995$cpi - c1985$tax/c1985$cpi
```

We now perform three different IV estimations of (12.10) using `ivreg()`:

TSLS using only the difference in the sales taxes between 1985 and 1995 as the instrument.

TSLS using only the difference in the cigarette-specific sales taxes 1985 and 1995 as the instrument.

TSLS using both the difference in the sales taxes 1985 and 1995 and the difference in the cigarette-specific sales taxes 1985 and 1995 as instruments.

```
# estimate the three models
cig_ivreg_diff1 <- ivreg(packsdiff ~ pricediff + incomediff | incomediff +
salestaxdiff)
cig_ivreg_diff2 <- ivreg(packsdiff ~ pricediff + incomediff | incomediff +
cigtaxdiff)
cig_ivreg_diff3 <- ivreg(packsdiff ~ pricediff + incomediff | incomediff +
salestaxdiff + cigtaxdiff)
```

As usual we use `coeftest()` in conjunction with `vcovHC()` to obtain robust coefficient summaries for all models.

```
# robust coefficient summary for 1.
coeftest(cig_ivreg_diff1, vcov = vcovHC, type = "HC1")
#>
#> t test of coefficients:
#>
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.117962 0.068217 -1.7292 0.09062 .
#> pricediff -0.938014 0.207502 -4.5205 4.454e-05 ***
#> incomediff 0.525970 0.339494 1.5493 0.12832
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# robust coefficient summary for 2.
coeftest(cig_ivreg_diff2, vcov = vcovHC, type = "HC1")
#>
#> t test of coefficients:
#>
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.017049 0.067217 -0.2536 0.8009
#> pricediff -1.342515 0.228661 -5.8712 4.848e-07 ***
#> incomediff 0.428146 0.298718 1.4333 0.1587
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# robust coefficient summary for 3.
coeftest(cig_ivreg_diff3, vcov = vcovHC, type = "HC1")
#>
#> t test of coefficients:
#>
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.052003 0.062488 -0.8322 0.4097
#> pricediff -1.202403 0.196943 -6.1053 2.178e-07 ***
#> incomediff 0.462030 0.309341 1.4936 0.1423
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

We proceed by generating a tabulated summary of the estimation results using `stargazer()`.

```
# gather robust standard errors in a list
rob_se <- list(sqrt(diag(vcovHC(cig_ivreg_diff1, type = "HC1"))),
sqrt(diag(vcovHC(cig_ivreg_diff2, type = "HC1"))),
sqrt(diag(vcovHC(cig_ivreg_diff3, type = "HC1"))))
# generate table
stargazer(cig_ivreg_diff1, cig_ivreg_diff2,cig_ivreg_diff3,
header = FALSE,
type = "html",
omit.table.layout = "n",
digits = 3,
column.labels = c("IV: salestax", "IV: cigtax", "IVs: salestax, cigtax"),
dep.var.labels.include = FALSE,
dep.var.caption = "Dependent Variable: 1985-1995 Difference in Log per Pack Price",
se = rob_se)
```

Dependent variable: 1985-1995 difference in log per pack price | |||

IV: salestax | IV: cigtax | IVs: salestax, cigtax | |

(1) | (2) | (3) | |

pricediff | -0.938^{***} | -1.343^{***} | -1.202^{***} |

(0.208) | (0.229) | (0.197) | |

incomediff | 0.526 | 0.428 | 0.462 |

(0.339) | (0.299) | (0.309) | |

Constant | -0.118^{*} | -0.017 | -0.052 |

(0.068) | (0.067) | (0.062) | |

Observations | 48 | 48 | 48 |

R^{2} | 0.550 | 0.520 | 0.547 |

Adjusted R^{2} | 0.530 | 0.498 | 0.526 |

Residual Std. Error (df = 45) | 0.091 | 0.094 | 0.091 |

Note: | ^{*}p<0.1; ^{**}p<0.05; ^{***}p<0.01 |

Table 12.1: TSLS Estimates of the Long-Term Elasticity of the Demand for Cigarettes using Panel Data

Table 12.1 reports negative estimates of the coefficient on `pricediff` that are quite different in magnitude. Which one should we trust? This hinges on the validity of the instruments used. To assess this we compute \(F\)-statistics for the first-stage regressions of all three models to check instrument relevance.

```
# first-stage regressions
mod_relevance1 <- lm(pricediff ~ salestaxdiff + incomediff)
mod_relevance2 <- lm(pricediff ~ cigtaxdiff + incomediff)
mod_relevance3 <- lm(pricediff ~ incomediff + salestaxdiff + cigtaxdiff)
```

```
# check instrument relevance for model (1)
linearHypothesis(mod_relevance1,
"salestaxdiff = 0",
vcov = vcovHC, type = "HC1")
#> Linear hypothesis test
#>
#> Hypothesis:
#> salestaxdiff = 0
#>
#> Model 1: restricted model
#> Model 2: pricediff ~ salestaxdiff + incomediff
#>
#> Note: Coefficient covariance matrix supplied.
#>
#> Res.Df Df F Pr(>F)
#> 1 46
#> 2 45 1 28.445 3.009e-06 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

```
# check instrument relevance for model (2)
linearHypothesis(mod_relevance2,
"cigtaxdiff = 0",
vcov = vcovHC, type = "HC1")
#> Linear hypothesis test
#>
#> Hypothesis:
#> cigtaxdiff = 0
#>
#> Model 1: restricted model
#> Model 2: pricediff ~ cigtaxdiff + incomediff
#>
#> Note: Coefficient covariance matrix supplied.
#>
#> Res.Df Df F Pr(>F)
#> 1 46
#> 2 45 1 98.034 7.09e-13 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

```
# check instrument relevance for model (3)
linearHypothesis(mod_relevance3,
c("salestaxdiff = 0", "cigtaxdiff = 0"),
vcov = vcovHC, type = "HC1")
#> Linear hypothesis test
#>
#> Hypothesis:
#> salestaxdiff = 0
#> cigtaxdiff = 0
#>
#> Model 1: restricted model
#> Model 2: pricediff ~ incomediff + salestaxdiff + cigtaxdiff
#>
#> Note: Coefficient covariance matrix supplied.
#>
#> Res.Df Df F Pr(>F)
#> 1 46
#> 2 44 2 76.916 4.339e-15 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

We also conduct the overidentifying restrictions test for model three which is the only model where the coefficient on the difference in log prices is overidentified (\(m=2\), \(k=1\)) such that the \(J\)-statistic can be computed. To do this we take the residuals stored in `cig_ivreg_diff3` and regress them on both instruments and the presumably exogenous regressor `incomediff`. We again use `linearHypothesis()` to test whether the coefficients on both instruments are zero which is necessary for the exogeneity assumption to be fulfilled. Note that with `test = “Chisq”` we obtain a chi-squared distributed test statistic instead of an \(F\)-statistic.

```
# compute the J-statistic
cig_iv_OR <- lm(residuals(cig_ivreg_diff3) ~ incomediff + salestaxdiff + cigtaxdiff)
cig_OR_test <- linearHypothesis(cig_iv_OR,
c("salestaxdiff = 0", "cigtaxdiff = 0"),
test = "Chisq")
cig_OR_test
#> Linear hypothesis test
#>
#> Hypothesis:
#> salestaxdiff = 0
#> cigtaxdiff = 0
#>
#> Model 1: restricted model
#> Model 2: residuals(cig_ivreg_diff3) ~ incomediff + salestaxdiff + cigtaxdiff
#>
#> Res.Df RSS Df Sum of Sq Chisq Pr(>Chisq)
#> 1 46 0.37472
#> 2 44 0.33695 2 0.037769 4.932 0.08492 .
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

**Caution**: In this case the \(p\)-Value reported by `linearHypothesis()` is wrong because the degrees of freedom are set to \(2\). This differs from the degree of overidentification (\(m-k=2-1=1\)) so the \(J\)-statistic is \(\chi^2_1\) distributed instead of following a \(\chi^2_2\) distribution as assumed by default in `linearHypothesis()`. We may compute the correct \(p\)-Value using `pchisq()`.

```
# compute correct p-value for J-statistic
pchisq(cig_OR_test[2, 5], df = 1, lower.tail = FALSE)
#> [1] 0.02636406
```

Since this value is smaller than \(0.05\) we reject the hypothesis that both instruments are exogenous at the level of \(5\%\). This means one of the following:

- The sales tax is an invalid instrument for the per-pack price.
- The cigarettes-specific sales tax is an invalid instrument for the per-pack price.
- Both instruments are invalid.

The book argues that the assumption of instrument exogeneity is more likely to hold for the general sales tax (see Chapter 12.4 of the book) such that the IV estimate of the long-run elasticity of demand for cigarettes we consider the most trustworthy is \(-0.94\), the TSLS estimate obtained using the general sales tax as the only instrument.

The interpretation of this estimate is that over a 10-year period, an increase in the average price per package by one percent is expected to decrease consumption by about \(0.94\) percentage points. This suggests that, in the long run, price increases can reduce cigarette consumption considerably.