7.3 Joint Hypothesis Testing using the F-Statistic

This book is in Open Review. We want your feedback to make the book better for you and other students. You may annotate some text by selecting it with the cursor and then click "Annotate" in the pop-up menu. You can also see the annotations of others: click the arrow in the upper right hand corner of the page

The estimated model is

\[ \widehat{TestScore} = \underset{(15.21)}{649.58} -\underset{(0.48)}{0.29} \times STR - \underset{(0.04)}{0.66} \times english + \underset{(1.41)}{3.87} \times expenditure. \]

Now, can we reject the hypothesis that the coefficient on \(size\) and the coefficient on \(expenditure\) are zero? To answer this, we have to resort to joint hypothesis tests. A joint hypothesis imposes restrictions on multiple regression coefficients. This is different from conducting individual \(t\)-tests where a restriction is imposed on a single coefficient. Chapter 7.2 of the book explains why testing hypotheses about the model coefficients one at a time is different from testing them jointly.

The homoskedasticity-only \(F\)-Statistic is given by

\[ F = \frac{(SSR_{\text{restricted}} - SSR_{\text{unrestricted}})/q}{SSR_{\text{unrestricted}} / (n-k-1)} \]

with \(SSR_{restricted}\) being the sum of squared residuals from the restricted regression, i.e., the regression where we impose the restriction. \(SSR_{unrestricted}\) is the sum of squared residuals from the full model, \(q\) is the number of restrictions under the null and \(k\) is the number of regressors in the unrestricted regression.

It is fairly easy to conduct \(F\)-tests in R. We can use the function linearHypothesis()contained in the package car.

# estimate the multiple regression model
model <- lm(score ~ STR + english + expenditure, data = CASchools)

# execute the function on the model object and provide both linear restrictions 
# to be tested as strings
linearHypothesis(model, c("STR=0", "expenditure=0"))
#> Linear hypothesis test
#> 
#> Hypothesis:
#> STR = 0
#> expenditure = 0
#> 
#> Model 1: restricted model
#> Model 2: score ~ STR + english + expenditure
#> 
#>   Res.Df   RSS Df Sum of Sq      F   Pr(>F)    
#> 1    418 89000                                 
#> 2    416 85700  2    3300.3 8.0101 0.000386 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The output reveals that the \(F\)-statistic for this joint hypothesis test is about \(8.01\) and the corresponding \(p\)-value is \(0.0004\). Thus, we can reject the null hypothesis that both coefficients are zero at any level of significance commonly used in practice.

A heteroskedasticity-robust version of this \(F\)-test (which leads to the same conclusion) can be conducted as follows:

# heteroskedasticity-robust F-test
linearHypothesis(model, c("STR=0", "expenditure=0"), white.adjust = "hc1")
#> Linear hypothesis test
#> 
#> Hypothesis:
#> STR = 0
#> expenditure = 0
#> 
#> Model 1: restricted model
#> Model 2: score ~ STR + english + expenditure
#> 
#> Note: Coefficient covariance matrix supplied.
#> 
#>   Res.Df Df      F   Pr(>F)   
#> 1    418                      
#> 2    416  2 5.4337 0.004682 **
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The standard output of a model summary also reports an \(F\)-statistic and the corresponding \(p\)-value. The null hypothesis belonging to this \(F\)-test is that all of the population coefficients in the model except for the intercept are zero, so the hypotheses are \[H_0: \beta_1=0, \ \beta_2 =0, \ \beta_3 =0 \quad \text{vs.} \quad H_1: \beta_j \neq 0 \ \text{for at least one} \ j=1,2,3.\]

This is also called the overall regression \(F\)-statistic and the null hypothesis is obviously different from testing if only \(\beta_1\) and \(\beta_3\) are zero.

We now check whether the \(F\)-statistic belonging to the \(p\)-value listed in the model’s summary coincides with the result reported by linearHypothesis().

# execute the function on the model object and provide the restrictions 
# to be tested as a character vector
linearHypothesis(model, c("STR=0", "english=0", "expenditure=0"))
#> Linear hypothesis test
#> 
#> Hypothesis:
#> STR = 0
#> english = 0
#> expenditure = 0
#> 
#> Model 1: restricted model
#> Model 2: score ~ STR + english + expenditure
#> 
#>   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
#> 1    419 152110                                  
#> 2    416  85700  3     66410 107.45 < 2.2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Access the overall F-statistic from the model's summary
summary(model)$fstatistic
#>    value    numdf    dendf 
#> 107.4547   3.0000 416.0000

The entry value is the overall \(F\)-statistics and it equals the result of linearHypothesis(). The \(F\)-test rejects the null hypothesis that the model has no power in explaining test scores. It is important to know that the \(F\)-statistic reported by summary is not robust to heteroskedasticity.