Sampling II

Section 7 Sampling II

Subsection 7.1 The Central Limit Theorem

Recall that \(\Phi\) denotes the cumulative distribution function for the standard normal distribution.

Proposition 7.1. The Central Limit Theorem.

Let \(X_1,X_2,X_3,\ldots\) be a sequence of sample variables for a random variable \(X\) with \(E(X)=\mu\) and \(\var(X)=\sigma^2\text{.}\) Let \(S_n\) denote the variable \(S_n=X_1+X_2+\cdots + X_n\text{,}\) so that we have \(E(S_n)=n\mu\) and \(\var{S_n}=n\sigma^2\text{.}\) Let \(T_n=(S_n-n\mu)/(\sqrt{n}\sigma)\text{.}\) We have

\begin{equation*} \lim_{n\to\infty} F_{T_n}(x) = \Phi(x) \end{equation*}

for all \(x \in \R\text{.}\)

🔗

Checkpoint 7.2.

Do the exercises in in Section 1.8.3 1.4 in [3].

🔗

Subsection 7.2 Estimating population parameters

Let \(X\) be a random variable with \(E(X)=\mu\) and \(\var(X)=\sigma^2\text{.}\) Let \(\overline{X}=\frac{1}{n}\sum_{i=1}^n X_i\) be the average of a sample of size \(n\) of \(X\text{.}\) For \(\alpha\) in the range \(0\lt \alpha \lt 1\text{,}\) let \(z_\alpha\) be the value of a standard normal variable such that \(P(|z|\leq z_\alpha)=\alpha \text{.}\) The Central Limit Theorem says that

\begin{equation} P(|\overline{X}-\mu| \leq z_\alpha \sigma_{\overline{X}}) \approx \alpha.\tag{7.1} \end{equation}

Equation (7.1) motivates the following procedure for estimating an unknown population parameter \(\mu\) from sample data. Intuitively, \(\overline{X}\) estimates \(\mu\) with an error estimated by \(\sigma_{\overline{X}}\text{.}\) If the population variance \(\sigma^2\) is unknown, we can use the sample SD \(s\) to estimate \(\sigma\text{,}\) and we can use \(s/\sqrt{n}\) to estimate \(\sigma_{\overline{X}}\text{.}\) Inspired by (7.1), we say that the interval

\begin{equation} \overline{X}\pm z_\alpha s/\sqrt{n}\tag{7.2} \end{equation}

is a \(100z_\alpha\%\) confidence interval for the population mean \(\mu\text{.}\) Notice that the confidence interval is a random quantity, that is, it depends on the sample. Roughly speaking, we estimate that approximately \(100z_\alpha\%\) of all \(100z_\alpha\%\) confident intervals will contain \(\mu\text{,}\) and approximately \(100(1-z_\alpha)\%\) of these intervals will fail to contain \(\mu\text{.}\)

🔗

Checkpoint 7.3.

Show that \(z_\alpha = \Phi^{-1}((\alpha+1)/2)\text{.}\)

🔗
Explain why (7.1) holds.

🔗
Read Section 5.2 and do odd numbered exercises at the end of the section in [7].

🔗

🔗

Subsection 7.3 Hypothesis Testing

Hypothesis testing is using sample data to assign numbers for how skeptical you should be about certain claims. In this subsection we describe a hypothesis test called the 1-tail \(z\) test. We will illustrate using the following example.

🔗

Suppose that you are assigned to check the accuracy of a machine that is supposed to make identical widgets with a weight of \(\mu_0\) grams. Your sample of \(n\) widgets has an average weight of \(\overline{X}\) grams, and your weighings have a sample SD of \(s\) grams. There is a difference between your sample average \(\overline{X}\) and \(\mu_0\text{.}\) How do you decide whether the difference is significant? In a hypothesis test, we play “what-if?”, in the following way. We suppose that the machine actually performs according to the claimed standards, so that the average widget has a weight of \(\mu_0\text{.}\) Now we ask, what is the probability that a random sample of size \(n\) would be as far away or farther from \(\mu_0\) as the value \(\overline{X}\) that we observed? By the Central Limit Theorem, the answer is

\begin{equation*} 1-\Phi\left( \left|\frac{\overline{X}-\mu_0}{s/\sqrt{n}}\right|\right). \end{equation*}

If this probability is small, we feel skeptical of the claim that the machine produced widgets with an average weight of \(\mu_0\text{.}\) If this probability is large, we do not feel skeptical. The threshold probability value for skepticism is adjustable. The most common default value is \(0.05\text{.}\) The formal structure of this type of hypothesis test, called a 1-tail \(z\) test is this.

A null hypothesis, denoted \(H_0\text{,}\) is made. This is a claim that a population average for a certain random variable \(X\) has a certain value \(\mu_0\text{.}\)

🔗
A sample of \(X\) is taken.

🔗
A threshold probability \(\alpha\) is chosen.

🔗
The test statistic
\begin{equation*} P=1-\Phi\left( \left|\frac{\overline{X}-\mu_0}{s/\sqrt{n}}\right|\right) \end{equation*}
is calculated.

🔗
A conclusion is made: if \(P\lt \alpha\text{,}\) the data is called significant. The null hypothesis is rejected. If \(P\gt \alpha\text{,}\) the data is called not significant. The null hypothesis is not rejected.

🔗

🔗

Checkpoint 7.4.

Read Section 5.2 in [7].

🔗

Exercises 7.4 Exercises

1.

Do the exercises in Section 1.8.3 of [3].

🔗

2.

Work the odd numbered exercises at the end of Section 5.2 in [7].

🔗

Prev Top Next