**Open Review**. We want your feedback to make the book better for you and other students. You may annotate some text by selecting it with the cursor and then click the on the pop-up menu. You can also see the annotations of others: click the in the upper right hand corner of the page

## 7.3 Joint Hypothesis Testing Using the F-Statistic

The estimated model is

\[ \widehat{TestScore} = \underset{(15.21)}{649.58} -\underset{(0.48)}{0.29} \times size - \underset{(0.04)}{0.66} \times english + \underset{(1.41)}{3.87} \times expenditure. \]

Now, can we reject the hypothesis that the coefficient on \(size\) *and* the coefficient on \(expenditure\) are zero? To answer this, we have to resort to joint hypothesis tests. A joint hypothesis imposes restrictions on multiple regression coefficients. This is different from conducting individual \(t\)-tests where a restriction is imposed on a single coefficient. Chapter 7.2 of the book explains why testing hypotheses about the model coefficients one at a time is different from testing them jointly.

The homoskedasticity-only \(F\)-Statistic is given by

\[ F = \frac{(SSR_{\text{restricted}} - SSR_{\text{unrestricted}})/q}{SSR_{\text{unrestricted}} / (n-k-1)} \]

with \(SSR_{restricted}\) being the sum of squared residuals from the restricted regression, i.e., the regression where we impose the restriction. \(SSR_{unrestricted}\) is the sum of squared residuals from the full model, \(q\) is the number of restrictions under the null and \(k\) is the number of regressors in the unrestricted regression.

It is fairly easy to conduct \(F\)-tests in `R`. We can use the function `linearHypothesis()`contained in the package `car`.

```
# estimate the multiple regression model
lm(score ~ size + english + expenditure, data = CASchools)
model <-
# execute the function on the model object and provide both linear restrictions
# to be tested as strings
linearHypothesis(model, c("size=0", "expenditure=0"))
#> Linear hypothesis test
#>
#> Hypothesis:
#> size = 0
#> expenditure = 0
#>
#> Model 1: restricted model
#> Model 2: score ~ size + english + expenditure
#>
#> Res.Df RSS Df Sum of Sq F Pr(>F)
#> 1 418 89000
#> 2 416 85700 2 3300.3 8.0101 0.000386 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

The output reveals that the \(F\)-statistic for this joint hypothesis test is about \(8.01\) and the corresponding \(p\)-value is \(0.0004\). Thus, we can reject the null hypothesis that both coefficients are zero at any level of significance commonly used in practice.

A heteroskedasticity-robust version of this \(F\)-test (which leads to the same conclusion) can be conducted as follows.

```
# heteroskedasticity-robust F-test
linearHypothesis(model, c("size=0", "expenditure=0"), white.adjust = "hc1")
#> Linear hypothesis test
#>
#> Hypothesis:
#> size = 0
#> expenditure = 0
#>
#> Model 1: restricted model
#> Model 2: score ~ size + english + expenditure
#>
#> Note: Coefficient covariance matrix supplied.
#>
#> Res.Df Df F Pr(>F)
#> 1 418
#> 2 416 2 5.4337 0.004682 **
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

The standard output of a model summary also reports an \(F\)-statistic and the corresponding \(p\)-value. The null hypothesis belonging to this \(F\)-test is that *all* of the population coefficients in the model except for the intercept are zero, so the hypotheses are \[H_0: \beta_1=0, \ \beta_2 =0, \ \beta_3 =0 \quad \text{vs.} \quad H_1: \beta_j \neq 0 \ \text{for at least one} \ j=1,2,3.\]

This is also called the *overall regression \(F\)-statistic* and the null hypothesis is obviously different from testing if only \(\beta_1\) and \(\beta_3\) are zero.

We now check whether the \(F\)-statistic belonging to the \(p\)-value listed in the model’s summary coincides with the result reported by `linearHypothesis()`.

```
# execute the function on the model object and provide the restrictions
# to be tested as a character vector
linearHypothesis(model, c("size=0", "english=0", "expenditure=0"))
#> Linear hypothesis test
#>
#> Hypothesis:
#> size = 0
#> english = 0
#> expenditure = 0
#>
#> Model 1: restricted model
#> Model 2: score ~ size + english + expenditure
#>
#> Res.Df RSS Df Sum of Sq F Pr(>F)
#> 1 419 152110
#> 2 416 85700 3 66410 107.45 < 2.2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Access the overall F-statistic from the model's summary
summary(model)$fstatistic
#> value numdf dendf
#> 107.4547 3.0000 416.0000
```

The entry `value` is the overall \(F\)-statistics and it equals the result of `linearHypothesis()`. The \(F\)-test rejects the null hypothesis that the model has no power in explaining test scores. It is important to know that the \(F\)-statistic reported by `summary` is *not* robust to heteroskedasticity!