**Open Review**. We want your feedback to make the book better for you and other students. You may annotate some text by selecting it with the cursor and then click "Annotate" in the pop-up menu. You can also see the annotations of others: click the arrow in the upper right hand corner of the page

## 5.7 Exercises

#### 1. Testing Two Null Hypotheses Separately

Consider the estimated regression model

\[ \widehat{TestScore} = \underset{(23.96)}{567.43} - \underset{(0.85)}{7.15} \times STR, \, R^2 = 0.8976, \, SER=15.19 \]

with standard errors in parentheses.

**Instructions:**

- Compute the \(p\)-value for a \(t\)-test of the hypothesis that the intercept is zero against the two-sided alternative that it is non-zero. Save the result to
`p_int` - Compute the \(p\)-value for a \(t\)-test of the hypothesis that the coefficient of
`STR`is zero against the two-sided alternative that it is non-zero. Save the result to`p_STR`

**Hint:**

Both hypotheses can be tested individually using a two-sided test. Use `pnorm()` to obtain cumulated probabilities for standard normally distributed outcomes.

#### 2. Two Null Hypotheses You Cannot Reject, Can You?

Consider again the estimated regression model

\[\widehat{TestScore} = \underset{(23.96)}{567.43} - \underset{(0.85)}{7.15} \times STR, \, R^2 = 0.8976, \,SER=15.19\]

Can you reject the null hypotheses discussed in the previous code exercise using individual \(t\)-tests at the \(5\%\) significance level?

The variables `t_int` and `t_STR` are the \(t\)-statistics. Both are available in your working environment.

**Instructions:**

- Gather
`t_int`and`t_STR`in a vector`test`and use logical operators to check whether the corresponding rejection rule applies.

**Hints:**

- Both tests are two-sided \(t\)-tests. Key Concept 5.2 recaps how a two-sided \(t\)-test is conducted.
- Use
`qnorm()`to obtain standard normal critical values.

#### 3. Confidence Intervals

`mod`, the object of class `lm` which contains the estimated regression model \[\widehat{TestScore} = \underset{(23.96)}{567.43} - \underset{(0.85)}{7.15} \times STR, \, R^2 = 0.8976, \,SER=15.19\] is available in your working environment.

**Instructions:**

Compute \(90\%\) confidence intervals for both coefficients.

**Hint:**

Use the function `confint()`, see `?confint`. The argument `level` sets the confidence level to be used.

#### 4. A Confidence Interval for the Mean I

Consider the regression model \[Y_i = \beta_1 + u_i\] where \(Y_i \sim \mathcal{N}(\mu, \sigma^2)\). Following the discussion preceding equation (5.1), a \(95\%\) confidence interval for the mean of the \(Y_i\) can be computed as

\[CI^{\mu}_{0.95} = \left[\hat\mu - 1.96 \times \frac{\sigma}{\sqrt{n}}; \, \hat\mu + 1.96 \times \frac{\sigma}{\sqrt{n}} \right].\]

**Instructions:**

- Sample \(n=100\) observations from a normal distribution with variance \(100\) and mean \(10\).
- Use the sample to estimate \(\beta_1\). Save the estimate in
`mu_hat`. - Assume that \(\sigma^2 = 100\) is known. Replace the
`NA`s in the code below to obtain a \(95\%\) confidence interval for the mean of the \(Y_i\).

**Hint:**

Use the function `confint()`, see `?confint`. The argument `level` sets the confidence level.

#### 5. A Confidence Interval for the Mean II

For historical reasons, some `R` functions which we use to obtain inference on model parameters, among them `confint()` and `summary()`, rely on the \(t\)-distribution instead of using the large-sample normal approximation. This is why for small sample sizes (and hence small degrees of freedom), \(p\)-values and confidence intervals reported by these functions deviate from those computed using critical values or cumulative probabilities of the standard normal distribution.

The \(95\%\) confidence interval for the mean in the previous exercise is \([9.13, 13.05]\).

**Instructions:**

100 observations sampled from a normal distribution with \(\mu=10\) and \(\sigma^2=100\) have been assigned to the vector `s` which is available in your environment.

Set up a suitable regression model to estimate the mean of the observations in `s`. Then use `confint()` to compute a \(95\%\) confidence interval for the mean.

(Check that the result is different from the interval reported above.)

#### 6. Regression on a Dummy Variable I

Chapter 5.3 discusses regression when \(X\) is a dummy variable. We have used a `for()` loop to generate a binary variable indicating whether a schooling district in the `CASchools` data set has a student-teacher ratio below \(20\). Though it is instructive to use a loop for this, there are alternate ways to achieve the same with fewer lines of code.

A `data.frame` `DF` with \(100\) observations of a variable `X` is available in your working environment.

**Instructions:**

Use

`ifelse()`to generate a binary vector`dummy`indicating whether the observations in`X`are*positive*.Append

`dummy`to the`data.frame``DF`.

#### 7. Regression on a Dummy Variable II

A `data.frame` `DF` with 100 observations on `Y` and the binary variable `D` from the previous exercise is available in your working environment.

**Instructions:**

Compute the group-specific sample means of the observations in

`Y`: save the mean of observations in`Y`where`D == 1`to`mu_Y_D1`and assign the mean of those observations with`D == 0`to`mu_Y_D0`.Use

`lm()`to regress`Y`on`D`, i.e., estimate the coefficients in the model \[Y_i = \beta_0 + \beta_1 \times D_i + u_i.\]

Also check that the estimates of the coefficients \(\beta_0\) and \(\beta_1\) reflect specific sample means. Can you tell which (no code submission needed)?

#### 8. Regression on a Dummy Variable III

In this exercise, you have to visualize some of the results from the dummy regression model \[\widehat{Y}_i = -0.66 + 1.43 \times D_i\] estimated in the previous exercise.

A `data.frame` `DF` with 100 observations on `Y` and the binary variable `dummy` as well as the model object `dummy_mod` from the previous exercise are available in your working environment.

**Instructions:**

Start by drawing a visually appealing plot of the observations on \(Y\) and \(D\) based on the code chunk provided in

`Script.R`. Replace the`???`by the correct expressions!Add the regression line to the plot.

#### 9. Gender Wage Gap I

The cross-section data set `CPS1985` is a subsample from the May 1985 *Current Population Survey* conducted by the *US Census Bureau* which contains observations on, among other things, wage and the gender of employees.

`CPS1985` is part of the package `AER`.

**Instructions:**

Attach the package

`AER`and load the data set`CPS1985`.Estimate the dummy regression model \[wage_i = \beta_0 + \beta_1 \cdot gender_i + u_i\] where

\[\begin{align*} gender_i = \begin{cases} 1, & \text{if employee} \, i \, \text{is female,} \\ 0, & \text{if employee} \, i \, \text{is male.} \end{cases} \end{align*}\]

Save the result in

`dummy_mod`.

#### 10. Gender Wage Gap II

The wage regression from the previous exercise yields \[\widehat{wage}_i = 9.995 - 2.116 \cdot gender_i.\]

The model object `dummy_mod` is available in your working environment.

**Instructions:**

- Test the hypothesis that the coefficient on \(gender_i\) is zero against the alternative that it is non-zero. The null hypothesis implies that there is no gender wage gap. Use the heteroskedasticity-robust estimator proposed by White (1980).

**Hints:**

`vcovHC()`computes heteroskedasticity-robust estimates of the covariance matrix of the coefficient estimators for the model supplied. The estimator proposed by White (1980) is computed if you set`type = “HC0”`.The function

`coeftest()`performs significance tests for the coefficients in model objects. A covariance matrix can be supplied using the argument`vcov.`.

#### 11. Computation of Heteroskedasticity-Robust Standard Errors

In the simple regression model, the covariance matrix of the coefficient estimators is denoted

\[\text{Var} \begin{pmatrix} \hat\beta_0 \ \hat\beta_1 \end{pmatrix} = \begin{pmatrix} \text{Var}(\hat\beta_0) & \text{Cov}(\hat\beta_0,\hat\beta_1) \\ \text{Cov}(\hat\beta_0,\hat\beta_1) & \text{Var}(\hat\beta_1) \end{pmatrix}\]

The function `vcovHC` can be used to obtain estimates of this matrix for a model object of interest.

`dummy_mod`, a model object containing the wage regression dealt with in Exercises 9 and 10 is available in your working environment.

**Instructions:**

- Compute robust standard errors of the type
`HC1`for the coefficients estimators in the model object`dummy_mod`. Store the standard errors in a vector named`rob_SEs`.

**Hints**

- The standard errors we seek can be obtained by taking the square root of the diagonal elements of the estimated covariance matrix.
`diag(A)`returns the diagonal elements of the matrix`A`.

#### 12. Robust Confidence Intervals

The function `confint()` computes confidence intervals for regression models using homoskedasticity-only standard errors so this function is not an option when there is heteroskedasticity.

The function `Rob_CI()` in `script.R` is meant to compute and report heteroskedasticity-robust confidence intervals for both model coefficients in a simple regression model.

`dummy_mod`, a model object containing the wage regression dealt with in the previous exercises is available in your working environment.

**Instructions:**

Complete the code of

`Rob_CI()`given in`Script.R`such that lower and upper bounds of \(95\%\) robust confidence intervals are returned. Use standard errors of the type`HC1`.Use the function

`Rob_CI()`to obtain \(95\%\) confidence intervals for the model coefficients in`dummy_mod`.

#### 13. A Small Simulation Study — I

Consider the data generating process (DGP) \[\begin{align} X_i \sim& \, \mathcal{U}[2,10], \notag \\ e_i \sim& \, \mathcal{N}(0, X_i), \notag \\ Y_i =& \, \beta_1 X_i + e_i, \tag{5.4} \end{align}\] where \(\mathcal{U}[2,10]\) denotes the uniform distribution on the interval \([2,10]\) and \(\beta_1=2\).

Notice that the errors \(e_i\) are heteroskedastic since the variance of the \(e_i\) is a function of \(X_i\).

**Instructions:**

- Write a function
`DGP_OLS`that generates a sample \((X_i,Y_i)\), \(i=1,…,100\) using the DGP above and returns the OLS estimate of \(\beta_1\) based on this sample.

**Hint:**

`runif()` can be used to obtain random samples from a uniform distribution, see `?runif`.

#### 14. A Small Simulation Study — II

The function `DGP_OLS()` from the previous exercise is available in your working environment.

**Instructions:**

Use

`replicate()`to generate a sample of \(1000\) OLS estimates \(\widehat{\beta}_1\) using the function`DGP_OLS`. Store the estimates in a vector named`estimates`.Next, estimate the variance of \(\widehat{\beta}_1\) in (5.4): compute the sample variance of the \(1000\) OLS estimates in

`estimates`. Store the result in`est_var_OLS`.

#### 15. A Small Simulation Study — III

According to the the Gauss-Markov theorem, the OLS estimator in linear regression models is no longer the most efficient estimator among the conditionally unbiased linear estimators when there is heteroskedasticity. In other words, the OLS estimator loses the BLUE property when the assumption of homoskedasticity is violated.

It turns out that OLS applied to the weighted observations \((w_i X_i, w_i Y_i)\) where \(w_i=\frac{1}{\sigma_i}\) is the BLUE estimator under heteroskedasticity. This estimator is called the *weighted least squares* (WLS) estimator. Thus, when there is heteroskedasticity, the WLS estimator has lower variance than OLS.

The function `DGP_OLS()` and the estimated variance `est_var_OLS` from the previous exercises are available in your working environment.

**Instructions:**

Write a function

`DGP_WLS()`that generates \(100\) samples using the DGP introduced in Exercise 13 and returns the WLS estimate of \(\beta_1\). Treat \(\sigma_i\) as known, i.e., set \(w_i=\frac{1}{\sqrt{X_i}}\).Repeat exercise 14 using

`DGP_WLS()`. Store the variance estimate in`est_var_WLS`.Compare the estimated variances

`est_var_OLS`and`est_var_WLS`using logical operators (`<`or`>`).

**Hints:**

`DGP_WLS()`can be obtained using a modified code of`DGP_OLS()`.Remember that functions are objects and you may print the code of a function to the console.

### References

*Econometrica*48 (4): pp. 817–838.