This book is in Open Review. We want your feedback to make the book better for you and other students. You may annotate some text by selecting it with the cursor and then click "Annotate" in the pop-up menu. You can also see the annotations of others: click the arrow in the upper right hand corner of the page

9.5 Exercises

1. Simulation Study: Misspecification of Functional Form

As stated in Chapter 9.2, misspecification of the regression function violates assumption 1 of Key Concept 6.3 so that the OLS estimator will be biased and inconsistent. We have illustrated the bias of \(\hat{\beta}_0\) for the example of the quadratic population regression function \[Y_i = X_i^2 \] and the linear model \[Y_i = \beta_0 + \beta_1 X_i + u_i, \, u_i \sim \mathcal{N}(0,1)\] using 100 randomly generated observations. Strictly speaking, this finding could be just a coincidence because we consider just one estimate obtained using a single data set.

In this exercise, you have to generate simulation evidence for the bias of \(\hat{\beta}_0\) in the model \[Y_i = \beta_0 + \beta_1 X_i + u_i\] if the population regression function is \[Y_i = X_i^2.\]


Make sure to use the definitions suggested in the skeleton code in script.R to complete the following tasks:

  • Generate 1000 OLS estimates of \(\beta_0\) in the model above using a for() loop where \(X_i \sim \mathcal{U}[-5,5]\), \(u_i \sim \mathcal{N}(0,1)\) using samples of size \(100\). Save the estimates in beta_hats.

  • Compare the sample mean of the estimates to the true parameter using the == operator.


You can generate random numbers from a uniform distribution using runif().

2. Simulation Study: Errors-in-Variables Bias

Consider again the application of the classical measurement error model introduced in Chapter 9.2:

The single regressor \(X_i\) is measured with error so that \(\overset{\sim}{X}_i\) is observed instead. Thus one estimates \(\beta_1\) in \[\begin{align*} Y_i =& \, \beta_0 + \beta_1 \overset{\sim}{X}_i + \underbrace{\beta_1 (X_i -\overset{\sim}{X}_i) + u_i}_{=v_i} \\ Y_i =& \, \beta_0 + \beta_1 \overset{\sim}{X}_i + v_i \end{align*}\] instead of \[Y_i = \beta_0 + \beta_1 X_i + u_i,\]

with the zero mean error \(w_i\) being uncorrelated with \(X_i\) and \(u_i\). Then \(\beta_1\) is inconsistently estimated by OLS: \[\begin{equation} \widehat{\beta}_1 \xrightarrow{p}{\frac{\sigma_{X}^2}{\sigma_{X}^2 + \sigma_{w}^2}} \beta_1 \end{equation}\]

Let \[(X, Y) \sim \mathcal{N}\left[\begin{pmatrix}50\\ 100\end{pmatrix},\begin{pmatrix}10 & 5 \\ 5 & 10 \end{pmatrix}\right].\] Recall from (9.2) that \(E(Y_i\vert X_i) = 75 + 0.5 X_i\) in this case. Further Assume that \(\overset{\sim}{X_i} = X_i + w_i\) with \(w_i \overset{i.i.d}{\sim} \mathcal{N}(0,10)\).

As mentioned in Exercise 1, Chapter 9.2 discusses the consequences of the measurement error for the OLS estimator of \(\beta_1\) in this setting based on a single sample and and thus just one estimate. Strictly speaking, the conclusion made could be wrong because the oberseved bias may be due to random variation. A Monte Carlo simulation is more appropriate here.


Show that \(\beta_1\) is estimated with a bias using a simulation study. Make sure to use the definitions suggested in the skeleton code in script.R to complete the following tasks:

  • Generate 1000 estimates of \(\beta_1\) in the simple regression model \[Y_i = \beta_0 + \beta_1 X_i + u_i.\] Use rmvnorm() to generate samples of 100 random observations from the bivariate normal distribution stated above.

  • Save the estimates in beta_hats.

  • Compute the sample mean of the estimates.