**Open Review**. We want your feedback to make the book better for you and other students. You may annotate some text by selecting it with the cursor and then click "Annotate" in the pop-up menu. You can also see the annotations of others: click the arrow in the upper right hand corner of the page

## 3.8 Exercises

#### 1. Biased …

Consider the following alternative estimator for \(\mu_Y\), the mean of the \(Y_i\)

\[\widetilde{Y}=\frac{1}{n-1}\sum\limits_{i=1}^n Y_i\]

In this exercise we will illustrate that this estimator is a biased estimator for \(\mu_Y\).

**Instructions:**

Define a function

`Y_tilde`that implements the estimator above.Randomly draw 5 observations from the \(\mathcal{N}(10, 25)\) distribution and compute an estimate using

`Y_tilde()`. Repeat this procedure 10000 times and store the results in`est_biased`.Plot a histogram of

`est_biased`.Add a red vertical line at \(\mu=10\) using the function

`abline()`.

**Hints:**

To compute the sum of a vector you can use

`sum()`, to get the length of a vector you can use`length()`.Use the function

`replicate()`to compute repeatedly estimates of random samples. With the arguments`expr`and`n`you can specify the operation and how often it has to be replicated.A histogram can be plotted with the function

`hist()`.The point on the x-axis as well as the color for the vertical line can be specified via the arguments

`v`and`col`.

#### 2. … but consistent estimator

Consider again the estimator from the previous exercise. It is available in your environment as the function `Y_tilde()`. You are requested to do the same procedure as in the previous exercise. This time, however, increase the number of observations to draw from 5 to 1000.

What do you notice? What can you say about this estimator?

**Instructions:**

Randomly draw 1000 observations from the \(\mathcal{N}(10, 25)\) distribution and compute an estimate of the mean using

`Y_tilde()`. Repeat this procedure 10000 times and store the results in`est_consistent`.Plot a histogram of

`est_consistent`.Add a red vertical line at \(\mu=10\) using the function

`abline()`.

**Hints:**

Use the function

`replicate()`to compute estimates of repeatedly drawn random samples. Using the arguments`expr`and`n`you may specify the operation and how often it will be replicated.A histogram can be plotted with the function

`hist()`.The position on the x-axis as well as the color for the vertical line can be specified via the arguments

`v`and`col`.

#### 3. Efficiency of an Estimator

In this exercise we want to illustrate the result that the sample mean

\[\hat{\mu}_Y=\sum\limits_{i=1}^{n}a_iY_i\] with the equal weighting scheme \(a_i=\frac{1}{n}\) for \(i=1,...,n\) is the best linear unbiased estimator (BLUE) of \(\mu_Y\).

As an alternative, consider the estimator

\[\tilde{\mu}_Y=\sum\limits_{i=1}^{n}b_iY_i\]

where \(b_i\) gives the first \(\frac{n}{2}\) observations a higher weighting than the second \(\frac{n}{2}\) observations (we assume that \(n\) is even for simplicity).

The vector of weights `w` has been defined already and is available in your working environment.

**Instructions:**

Verify that \(\tilde{\mu}\) is an unbiased estimator of \(\mu_Y\), the mean of the \(Y_i\).

Implement the alternative estimator of \(\mu_Y\) as a function

`mu_tilde()`.Randomly draw 100 observations from the \(\mathcal{N}(5, 100)\) distribution and compute estimates with both estimators. Repeat this procedure 10000 times and store the results in

`est_bar`and`est_tilde`.Compute the sample variances of

`est_bar`and`est_tilde`. What can you say about both estimators?

**Hints:**

In order for \(\tilde{\mu}\) to be an unbiased estimator all weights have to sum up to 1.

Use the function

`replicate()`to compute estimates of repeatedly drawn samples. With the arguments`expr`and`n`you can specify the operation and how often it is replicated.You may use

`var()`the compute the sample variance.

#### 4. Hypothesis Test — \(t\)-statistic

Consider the CPS dataset from Chapter 3.6 again. The dataset `cps` is available in your working environment.

We suppose that the average hourly earnings (in prices of 2012) `ahe12` exceed 23.50 \(\$/h\) and wish to test this hypothesis at a significance level of \(\alpha=0.05\). Please do the following:

**Instructions:**

Compute the test statistic by hand and assign it to

`tstat`.Use

`tstat`to accept or reject the null hypothesis. Please do so using the normal approximation.

**Hints:**

We test \(H_0:\mu_{Y_{ahe}}\leq 23.5\) vs. \(H_1:\mu_{Y_{ahe}}>23.5\). That is, we conduct a right-sided test.

The \(t\)-statistic is defined as \(\frac{\bar{Y}-\mu_{Y,0}}{s_{Y}/\sqrt{n}}\) where \(s_Y\) denotes the sample variance.

To decide whether the null hypothesis is accepted or rejected you can compare the \(t\)-statistic with the respective quantile of the standard normal distribution. Use logical operators.

#### 5. Hypothesis Test — \(p\)-value

Reconsider the test situation from the previous exercise. The dataset `cps` as well as the vector `tstat` are available in your working environment.

Instead of using the \(t\)-statistic as decision criterion you may also use the \(p\)-value. Now please do the following:

**Instructions:**

Compute the \(p\)-value by hand and assign it to

`pval`.Use

`pval`to accept or reject the null hypothesis.

**Hints:**

The \(p\)-value for a right-sided test can be computed as \(p=P(t>t^{act}|H_0)\).

We reject the null if \(p<\alpha\). Use logical operators to check for this.

#### 6. Hypothesis Test — One Sample \(t\)-test

In the last two exercises we discussed two ways of conducting a hypothesis test. These approaches are somewhat cumbersome to apply by hand which is why `R` provides the function `t.test()`. It does most of the work automatically. `t.test()` provides \(t\)-statistics, \(p\)-values and even confidence intervals (more on the latter in later exercises). Note that `t.test()` uses the \(t\)-distribution instead of the normal distribution which becomes important when the sample size is small.

The dataset `cps` and the variable `pval` from Exercise 3.4 are available in your working environment.

**Instructions:**

Conduct the hypothesis test from previous exercises using the function

`t.test()`.Extract the \(t\)-statistic and the \(p\)-value from the list created by

`t.test()`. Assign them to the variables`tstat`and`pvalue`.Verify that using the normal approximation here is valid as well by computing the difference between both \(p\)-values.

**Hints:**

The type of the test as well as the null hypothesis can be specified via the arguments

`alternative`and`mu`.The \(t\)-statistic and the \(p\)-value can be obtained via

`$statistic`and`$p.value`, respectively.

#### 7. Hypothesis Test — Two Sample \(t\)-test

Consider the annual maximum sea levels at Port Pirie (Southern Australia) and Fremantle (Western Australia) for the last 30 years.

The observations are made available as vectors `portpirie` and `fremantle` in your working environment.

**Instructions:**

- Test whether there is a significant difference in the annual maximum sea levels at a significance level of \(\alpha=0.05\).

**Hints:**

We test \(H_0:\mu_{P}-\mu_{F}=0\) vs. \(H_1:\mu_{P}-\mu_{F}\ne 0\). That is, we conduct a two sample \(t\)-test.

For a two sample \(t\)-test the function

`t.test()`expects two vectors containing the data.

#### 8. Confidence Interval

Reconsider the test situation concerning the annual maximum sea levels at Port Pirie and Fremantle.

The variables `portpirie` and `fremantle` are again available in your working environment.

**Instructions:**

- Construct a \(95\%\)-confidence interval for the difference in the sea levels using
`t.test()`.

**Hint:**

- The function
`t.test()`computes a \(95\%\) confidence interval by default. This is accessible via`$conf.int`.

#### 9. (Co)variance and Correlation I

Consider a random sample \((X_i, Y_i)\) for \(i=1,...,100\).

The respective vectors `X` and `Y` are available in your working environment.

**Instructions:**

Compute the variance of \(X\) using the function

`cov()`.Compute the covariance of \(X\) and \(Y\).

Compute the correlation between \(X\) and \(Y\).

**Hints:**

The variance is a special case of the covariance.

`cov()`as well as`cor()`expect a vector for each variable.

#### 10. (Co)variance and Correlation II

In this exercise we want to examine the limitations of the correlation as a dependency measure.

Once the session has initialized you will see the plot of 100 realizations from two random variables \(X\) and \(Y\).

The respective observations are available in the vectors `X` and `Y` in your working environment.

**Instructions:**

- Compute the correlation between \(X\) and \(Y\). Interpret your result critically.

**Hint:**

`cor()`expects a vector for each variable.