This book is in Open Review. We want your feedback to make the book better for you and other students. You may annotate some text by selecting it with the cursor and then click the on the pop-up menu. You can also see the annotations of others: click the in the upper right hand corner of the page

## 4.6 Exercises

#### 1. Class Sizes and Test Scores

A researcher wants to analyze the relationship between class size (measured by the student-teacher ratio) and the average test score. Therefore he measures both variables in $10$ different classes and ends up with the following results.

 Class Size 23 19 30 22 23 29 35 36 33 25 Test Score 430 430 333 410 390 377 325 310 328 375

Instructions:

• Create the vectors cs (the class size) and ts (the test score), containing the observations above.

• Draw a scatterplot of the results using plot().

#### 2. Mean, Variance, Covariance and Correlation

The vectors cs and ts are available in the working environment (you can check this: type their names into the console and press enter).

Instructions:

• Compute the mean, the sample variance and the sample standard deviation of ts.

• Compute the covariance and the correlation coefficient for ts and cs.

Hint: Use the R functions presented in this chapter: mean(), sd(), cov(), cor() and var().

#### 3. Simple Linear Regression

The vectors cs and ts are available in the working environment.

Instructions:

• The function lm() is part of the package AER. Attach the package using library().

• Use lm() to estimate the regression model $TestScore_i = \beta_0 + \beta_1 STR_i + u_i.$ Assign the result to mod.

• Obtain a statistical summary of the model.

#### 4. The Model Object

Let us see how an object of class lm is structured.

The vectors cs and ts as well as the model object mod from the previous exercise are available in your workspace.

Instructions:

• Use class() to learn about the class of the object mod.
• mod is an object of type list with named entries. Check this using the function is.list().
• See what information you can obtain from mod using names().

#### 10. Regression Output: No Constant Case — Ctd.

In Exercises 8 and 9 you have dealt with a model without intercept. The estimated regression function was

$\widehat{TestScore_i} = \underset{(1.36)}{12.65} \times STR_i.$

The coefficient matrix coef from Exercise 9 contains the estimated coefficient on $STR$, its standard error, the $t$-statistic of the significance test and the corresponding $p$-value.

Instructions:

• Print the contents of coef to the console.
• Convince yourself that the reported $t$-statistic is correct: use the entries of coef to compute the $t$-statistic and save it to t_stat.

The matrix coef from the previous exercise is available in your working environment.

Hints:

• X[a,b] returns the [a,b] element of the matrix X.

• The $t$-statistic for a test of the hypothesis $H_0: \beta_1 = 0$ is computed as $t = \frac{\hat{\beta}_1}{SE(\hat{\beta}_1)}.$

#### 11. Two Regressions, One Plot

The two estimated regression models from the previous exercises are

$\widehat{TestScore_i} = \underset{(1.36)}{12.65} \times STR_i$

and

$\widehat{TestScore_i} = \underset{(23.96)}{567.4272} \underset{(0.85)}{-7.1501} \times STR_i.$

You are provided with the code line plot(cs, ts) which creates a scatterplot of ts and cs. Note that this line must be executed before calling abline()! You may color the regression lines by using, e.g., col = “red” or col = “blue” as an additional argument to abline() for better distinguishability.

The vectors cs and ts as well as the list objects mod and mod_ni from previous exercises are available in your working environment.

Instructions:

Generate a scatterplot of ts and cs and add the estimated regression lines of mod and mod_ni.

#### 12. $TSS$ and $SSR$

If graphical inspection does not help, researchers resort to analytic techniques in order to detect if a model fits the data at hand good or better than another model.

Let us go back to the simple regression model including an intercept. The estimated regression line for mod was

$\widehat{TestScore_i} = 567.43 - 7.15 \times STR_i, \, R^2 = 0.8976, \, SER=15.19.$

You can check this as mod and the vectors cs and ts are available in your working environment.

Instructions:

• Compute $SSR$, the sum of squared residuals, and save it to ssr.
• Compute $TSS$, the total sum of squares, and save it to tss.

#### 15. The Estimated Covariance Matrix

As has been discussed in Chapter 4.4, the OLS estimators $\widehat{\beta}_0$ and $\widehat{\beta}_1$ are functions of the random error term. Therefore, they are random variables themselves. For two or more random variables, their covariances and variances are summarized by a variance-covariance matrix (which is often simply called the covariance matrix). Taking the square root of the diagonal elements of the estimated covariance matrix obtains $SE(\widehat\beta_0)$ and $SE(\widehat\beta_1)$, the standard errors of $\widehat{\beta}_0$ and $\widehat{\beta}_1$.

summary() computes an estimate of this matrix. The respective entry in the output of summary (remember that summary() produces a list) is called cov.unscaled. The model object mod is available in your workspace.

Instructions:

• Use summary() to obtain the covariance matrix estimate for the regression of test scores on student-teacher ratios stored in the model object mod. Save the result to cov_matrix.

• Obtain the diagonal elements of cov_matrix, compute their square root and assign the result to the variable SEs.

Hint: diag(A) returns a vector containing the diagonal elements of the matrix A.