11.4 Application to the Boston HMDA Data

This book is in Open Review. We want your feedback to make the book better for you and other students. You may annotate some text by selecting it with the cursor and then click "Annotate" in the pop-up menu. You can also see the annotations of others: click the arrow in the upper right hand corner of the page

Models (11.6) and (11.7) indicate that denial rates are higher for African American applicants holding constant the payment-to-income ratio. Both results could be subject to omitted variable bias. In order to obtain a more trustworthy estimate of the effect of being black on the probability of a mortgage application denial we estimate a linear probability model as well as several Logit and Probit models. We thereby control for financial variables and additional applicant characteristics which are likely to influence the probability of denial and differ between black and white applicants.

Sample averages as shown in Table 11.1 of the book can be easily reproduced using the functions mean() (as usual for numeric variables) and prop.table() (for factor variables).

# Mean P/I ratio
mean(HMDA$pirat)
#> [1] 0.3308136

# inhouse expense-to-total-income ratio
mean(HMDA$hirat)
#> [1] 0.2553461

# loan-to-value ratio
mean(HMDA$lvrat)
#> [1] 0.7377759

# consumer credit score
mean(as.numeric(HMDA$chist))
#> [1] 2.116387

# mortgage credit score
mean(as.numeric(HMDA$mhist))
#> [1] 1.721008

# public bad credit record
mean(as.numeric(HMDA$phist)-1)
#> [1] 0.07352941

# denied mortgage insurance
prop.table(table(HMDA$insurance))
#> 
#>         no        yes 
#> 0.97983193 0.02016807

# self-employed
prop.table(table(HMDA$selfemp))
#> 
#>        no       yes 
#> 0.8836134 0.1163866

# single
prop.table(table(HMDA$single))
#> 
#>        no       yes 
#> 0.6067227 0.3932773

# high school diploma
prop.table(table(HMDA$hschool))
#> 
#>         no        yes 
#> 0.01638655 0.98361345

# unemployment rate
mean(HMDA$unemp)
#> [1] 3.774496

# condominium
prop.table(table(HMDA$condomin))
#> 
#>        no       yes 
#> 0.7117647 0.2882353

# black
prop.table(table(HMDA$black))
#> 
#>       no      yes 
#> 0.857563 0.142437

# deny
prop.table(table(HMDA$deny))
#> 
#>         0         1 
#> 0.8802521 0.1197479

See Chapter 11.4 of the book or use R’s help function for more on variables contained in the HMDA dataset.

Before estimating the models we transform the loan-to-value ratio (lvrat) into a factor variable, where

\[\begin{align*} lvrat = \begin{cases} \text{low} & \text{if} \ \ lvrat < 0.8, \\ \text{medium} & \text{if} \ \ 0.8 \leq lvrat \leq 0.95, \\ \text{high} & \text{if} \ \ lvrat > 0.95 \end{cases} \end{align*}\]

and convert both credit scores to numeric variables.

# define low, medium and high loan-to-value ratio
HMDA$lvrat <- factor(
  ifelse(HMDA$lvrat < 0.8, "low",
  ifelse(HMDA$lvrat >= 0.8 & HMDA$lvrat <= 0.95, "medium", "high")),
  levels = c("low", "medium", "high"))

# convert credit scores to numeric
HMDA$mhist <- as.numeric(HMDA$mhist)
HMDA$chist <- as.numeric(HMDA$chist)

Next we reproduce the estimation results presented in Table 11.2 of the book.

# estimate all 6 models for the denial probability
lpm_HMDA <- lm(deny ~ black + pirat + hirat + lvrat + chist + mhist + phist 
               + insurance + selfemp, data = HMDA)

logit_HMDA <- glm(deny ~ black + pirat + hirat + lvrat + chist + mhist + phist 
                  + insurance + selfemp, 
                  family = binomial(link = "logit"), 
                  data = HMDA)

probit_HMDA1 <- glm(deny ~ black + pirat + hirat + lvrat + chist + mhist + phist 
                     + insurance + selfemp, 
                     family = binomial(link = "probit"), 
                     data = HMDA)

probit_HMDA2 <- glm(deny ~ black + pirat + hirat + lvrat + chist + mhist + phist 
                     + insurance + selfemp + single + hschool + unemp, 
                     family = binomial(link = "probit"), 
                     data = HMDA)

probit_HMDA3 <- glm(deny ~ black + pirat + hirat + lvrat + chist + mhist 
                     + phist + insurance + selfemp + single + hschool + unemp
                     +condomin + I(mhist==3) + I(mhist==4) + I(chist==3) 
                     + I(chist==4) + I(chist==5)+ I(chist==6), 
                     family = binomial(link = "probit"), 
                     data = HMDA)

probit_HMDA4 <- glm(deny ~ black * (pirat + hirat) + lvrat + chist + mhist + phist 
                     + insurance + selfemp + single + hschool + unemp, 
                     family = binomial(link = "probit"), 
                     data = HMDA)

Just as in previous chapters, we store heteroskedasticity-robust standard errors of the coefficient estimators in a list which is then used as the argument se in stargazer().

rob_se <- list(sqrt(diag(vcovHC(lpm_HMDA, type = "HC1"))),
               sqrt(diag(vcovHC(logit_HMDA, type = "HC1"))),
               sqrt(diag(vcovHC(probit_HMDA1, type = "HC1"))),
               sqrt(diag(vcovHC(probit_HMDA2, type = "HC1"))),
               sqrt(diag(vcovHC(probit_HMDA3, type = "HC1"))),
               sqrt(diag(vcovHC(probit_HMDA4, type = "HC1"))))

stargazer(lpm_HMDA, logit_HMDA, probit_HMDA1, 
          probit_HMDA2, probit_HMDA3, probit_HMDA4,  
          digits = 3,
          type = "latex", 
          header = FALSE,
          se = rob_se,
          model.numbers = FALSE,
          column.labels = c("(1)", "(2)", "(3)", "(4)", "(5)", "(6)"))


	Dependent Variable: Mortgage Application Denial

	deny
	OLS	logistic	probit
	(1)	(2)	(3)	(4)	(5)	(6)

blackyes	0.084^***	0.688^***	0.389^***	0.371^***	0.363^***	0.246
	(0.023)	(0.183)	(0.099)	(0.100)	(0.101)	(0.479)

pirat	0.449^***	4.764^***	2.442^***	2.464^***	2.622^***	2.572^***
	(0.114)	(1.332)	(0.673)	(0.654)	(0.665)	(0.728)

hirat	-0.048	-0.109	-0.185	-0.302	-0.502	-0.538
	(0.110)	(1.298)	(0.689)	(0.689)	(0.715)	(0.755)

lvratmedium	0.031^**	0.464^***	0.214^***	0.216^***	0.215^**	0.216^***
	(0.013)	(0.160)	(0.082)	(0.082)	(0.084)	(0.083)

lvrathigh	0.189^***	1.495^***	0.791^***	0.795^***	0.836^***	0.788^***
	(0.050)	(0.325)	(0.183)	(0.184)	(0.185)	(0.185)

chist	0.031^***	0.290^***	0.155^***	0.158^***	0.344^***	0.158^***
	(0.005)	(0.039)	(0.021)	(0.021)	(0.108)	(0.021)

mhist	0.021^*	0.279^**	0.148^**	0.110	0.162	0.111
	(0.011)	(0.138)	(0.073)	(0.076)	(0.104)	(0.077)

phistyes	0.197^***	1.226^***	0.697^***	0.702^***	0.717^***	0.705^***
	(0.035)	(0.203)	(0.114)	(0.115)	(0.116)	(0.115)

insuranceyes	0.702^***	4.548^***	2.557^***	2.585^***	2.589^***	2.590^***
	(0.045)	(0.576)	(0.305)	(0.299)	(0.306)	(0.299)

selfempyes	0.060^***	0.666^***	0.359^***	0.346^***	0.342^***	0.348^***
	(0.021)	(0.214)	(0.113)	(0.116)	(0.116)	(0.116)

singleyes				0.229^***	0.230^***	0.226^***
				(0.080)	(0.086)	(0.081)

hschoolyes				-0.613^***	-0.604^**	-0.620^***
				(0.229)	(0.237)	(0.229)

unemp				0.030^*	0.028	0.030
				(0.018)	(0.018)	(0.018)

condominyes					-0.055
					(0.096)

I(mhist == 3)					-0.107
					(0.301)

I(mhist == 4)					-0.383
					(0.427)

I(chist == 3)					-0.226
					(0.248)

I(chist == 4)					-0.251
					(0.338)

I(chist == 5)					-0.789^*
					(0.412)

I(chist == 6)					-0.905^*
					(0.515)

blackyes:pirat						-0.579
						(1.550)

blackyes:hirat						1.232
						(1.709)

Constant	-0.183^***	-5.707^***	-3.041^***	-2.575^***	-2.896^***	-2.543^***
	(0.028)	(0.484)	(0.250)	(0.350)	(0.404)	(0.370)


Observations	2,380	2,380	2,380	2,380	2,380	2,380
R²	0.266
Adjusted R²	0.263
Log Likelihood		-635.637	-636.847	-628.614	-625.064	-628.332
Akaike Inf. Crit.		1,293.273	1,295.694	1,285.227	1,292.129	1,288.664
Residual Std. Error	0.279 (df = 2369)
F Statistic	85.974^*** (df = 10; 2369)

Note:	^p<0.1; ^p<0.05; ^**p<0.01

Table 11.1: HMDA Data: LPM, Probit and Logit Models

In Table 11.1, models (1), (2) and (3) are baseline specifications that include several financial control variables. They differ only in the way they model the denial probability. Model (1) is a linear probability model, model (2) is a Logit regression and model (3) uses the Probit approach.

In the linear model (1), the coefficients have direct interpretation. For example, an increase in the consumer credit score by \(1\) unit is estimated to increase the probability of a loan denial by about \(0.031\) percentage points. Having a high loan-to-value ratio is detriment for credit approval: the coefficient for a loan-to-value ratio higher than \(0.95\) is \(0.189\) so clients with this property are estimated to face an almost \(19\%\) larger risk of denial than those with a low loan-to-value ratio, ceteris paribus. The estimated coefficient on the race dummy is \(0.084\), which indicates the denial probability for African Americans is \(8.4\%\) larger than for white applicants with the same characteristics except for race. Apart from the housing-expense-to-income ratio and the mortgage credit score, all coefficients are significant.

Models (2) and (3) provide similar evidence that there is racial discrimination in the U.S. mortgage market. All coefficients except for the housing expense-to-income ratio (which is not significantly different from zero) are significant at the \(1\%\) level. As discussed above, the nonlinearity makes the interpretation of the coefficient estimates more difficult than for model (1). In order to make a statement about the effect of being black, we need to compute the estimated denial probability for two individuals that differ only in race. For the comparison we consider two individuals that share mean values for all numeric regressors. For the qualitative variables we assign the property that is most representative for the data at hand. For example, consider self-employment: we have seen that about \(88\%\) of all individuals in the sample are not self-employed such that we set selfemp = no. Using this approach, the estimate for the effect on the denial probability of being African American of the Logit model (2) is about \(4\%\). The next code chunk shows how to apply this approach for models (1) to (7) using R.

# comppute regressor values for an average black person
new <- data.frame(
  "pirat" = mean(HMDA$pirat),
  "hirat" = mean(HMDA$hirat),
  "lvrat" = "low",
  "chist" = mean(HMDA$chist),
  "mhist" = mean(HMDA$mhist),
  "phist" = "no",
  "insurance" = "no",
  "selfemp" = "no",
  "black" = c("no", "yes"),
  "single" = "no",
  "hschool" = "yes",
  "unemp" = mean(HMDA$unemp),
  "condomin" = "no")

# differnce predicted by the LPM
predictions <- predict(lpm_HMDA, newdata = new)
diff(predictions)
#>          2 
#> 0.08369674

# differnce predicted by the logit model
predictions <- predict(logit_HMDA, newdata = new, type = "response")
diff(predictions)
#>          2 
#> 0.04042135

# difference predicted by probit model (3)
predictions <- predict(probit_HMDA1, newdata = new, type = "response")
diff(predictions)
#>          2 
#> 0.05049716

# difference predicted by probit model (4)
predictions <- predict(probit_HMDA2, newdata = new, type = "response")
diff(predictions)
#>          2 
#> 0.03978918

# difference predicted by probit model (5)
predictions <- predict(probit_HMDA3, newdata = new, type = "response")
diff(predictions)
#>          2 
#> 0.04972468

# difference predicted by probit model (6)
predictions <- predict(probit_HMDA4, newdata = new, type = "response")
diff(predictions)
#>          2 
#> 0.03955893

The estimates of the impact on the denial probability of being black are similar for models (2) and (3). It is interesting that the magnitude of the estimated effects is much smaller than for Probit and Logit models that do not control for financial characteristics (see section 11.2). This indicates that these simple models produce biased estimates due to omitted variables.

Regressions (4) to (6) use regression specifications that include different applicant characteristics and credit rating indicator variables as well as interactions. However, most of the corresponding coefficients are not significant and the estimates of the coefficient on black obtained for these models as well as the estimated difference in denial probabilities do not differ much from those obtained for the similar specifications (2) and (3).

An interesting question related to racial discrimination can be investigated using the Probit model (6) where the interactions blackyes:pirat and blackyes:hirat are added to model (4). If the coefficient on blackyes:pirat was different from zero, the effect of the payment-to-income ratio on the denial probability would be different for black and white applicants. Similarly, a non-zero coefficient on blackyes:hirat would indicate that loan officers weight the risk of bankruptcy associated with a high loan-to-value ratio differently for black and white mortgage applicants. We can test whether these coefficients are jointly significant at the \(5\%\) level using an \(F\)-Test.

linearHypothesis(probit_HMDA4,
                 test = "F",
                 c("blackyes:pirat=0", "blackyes:hirat=0"),
                 vcov = vcovHC, type = "HC1")
#> Linear hypothesis test
#> 
#> Hypothesis:
#> blackyes:pirat = 0
#> blackyes:hirat = 0
#> 
#> Model 1: restricted model
#> Model 2: deny ~ black * (pirat + hirat) + lvrat + chist + mhist + phist + 
#>     insurance + selfemp + single + hschool + unemp
#> 
#> Note: Coefficient covariance matrix supplied.
#> 
#>   Res.Df Df      F Pr(>F)
#> 1   2366                 
#> 2   2364  2 0.2473 0.7809

Since \(p\text{-value} \approx 0.77\) for this test, the null cannot be rejected. Nonetheless, we can reject the hypothesis that there is no racial discrimination at all since the corresponding \(F\)-test has a \(p\text{-value}\) of about \(0.002\).

linearHypothesis(probit_HMDA4,
                 test = "F",
                 c("blackyes=0", "blackyes:pirat=0", "blackyes:hirat=0"),
                 vcov = vcovHC, type = "HC1")
#> Linear hypothesis test
#> 
#> Hypothesis:
#> blackyes = 0
#> blackyes:pirat = 0
#> blackyes:hirat = 0
#> 
#> Model 1: restricted model
#> Model 2: deny ~ black * (pirat + hirat) + lvrat + chist + mhist + phist + 
#>     insurance + selfemp + single + hschool + unemp
#> 
#> Note: Coefficient covariance matrix supplied.
#> 
#>   Res.Df Df      F   Pr(>F)   
#> 1   2367                      
#> 2   2364  3 4.7774 0.002534 **
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Summary

Models (1) to (6) provide evidence that there is an effect of being African American on the probability of a mortgage application denial: in all specifications, the effect is estimated to be positive (ranging from \(4\%\) to \(5\%\)) and is significantly different from zero at the \(1\%\) level. While the linear probability model seems to slightly overestimate this effect, it still can be used as an approximation to an intrinsically nonlinear relationship.

See Chapters 11.4 and 11.5 of the book for a discussion of external and internal validity of this study and some concluding remarks on regression models where the dependent variable is binary.