Discrete Random Variables

Section 4 Discrete Random Variables

A random variable is a function \(X\colon \Omega\to Y\text{,}\) where \((\Omega,P)\) is a probability model and \(Y\) is a set.

For discrete probability models, this is the end of the definition; for general (possibly uncountable) probability models, further technical specifications are required to define random variables. Fortunately, the theory of random variables on discrete probability spaces will provide practical ways to think about and use continuous random variables without having to be concerned about every detail of general probability spaces.

A random variable is called quantitative if its image is any subset of the real numbers; otherwise, a random variable is called qualitative. A random variable is called discrete if its image is finite or countably infinite; otherwise, a random variable is called continuous. In this section, we will develop vocabulary and facts for quantitative random variables whose domains whose underlying probability models are discrete.

🔗

Subsection 4.1 Definitions and properties

Definition 4.1. Events defined by random variables.

Let \((\Omega,P)\) be a discrete probability model and let \(X\colon \Omega\to Y\) be a random variable. Given a subset \(A\subseteq Y\text{,}\) we define the event \(\text{"}X\in A\text{"}\) by

\begin{equation} \text{"}X\in A\text{"} = X^{-1}(A)=\{\omega \in \Omega\colon X(\omega)\in A\}.\tag{4.1} \end{equation}

In particular, if \(X\) is a quantitative random variable and \(\lambda\) is a real number, the event \(\text{"} X\leq \lambda\text{"}\) is the event

\begin{equation} \text{"}X\leq \lambda\text{"} = (X^{-1}((-\infty,\lambda])= \{\omega \in \Omega\colon X(\omega)\leq \lambda\}.\tag{4.2} \end{equation}

🔗

Checkpoint 4.2.

Make up one or more examples of a qualitative random variable and a quantitative random variable on a finite probability with five or so elements. For both variables, choose some subsets \(A\) of the codomain and find the events \(X\in A\text{.}\)

🔗

Definition 4.3.

Let \(X\) be discrete random variable. The (cumulative) distribution function (or c.d.f.) for \(X\) is the function \(F_X\colon \R\to [0,1]\) defined by

\begin{equation} F_X(\lambda)= P(X\leq \lambda).\tag{4.3} \end{equation}

🔗

Checkpoint 4.4.

Make up one or more examples of a quantitative random variable on a finite probability space with five or so elements and sketch a graph of the distribution function.

🔗

Proposition 4.5. Properties of distribution functions.

Let \(X\) be a discrete random variable. The distribution function \(F_X\) has the following properties.

\(F_X\) is nondecreasing, that is, if \(a\leq b\) then \(F_X(a)\leq F_X(b)\text{.}\)

🔗
\(F_X\) is continuous from the right, that is, \(\lim_{x\to c^+}F_X(x)=F_X(c)\) for all \(c\in \R\text{.}\)

🔗
\(\lim_{x\to-\infty}F_X(x)=0\) and \(\lim_{x\to\infty}F_X(x)=1\text{.}\)

🔗

🔗

Checkpoint 4.6.

Prove the properties in Proposition 4.5. Illustrate how the properties look in the examples you made up in the Checkpoints above.

🔗

Vocabulary related to the distribution function. The distribution function provides a ranking of the values of a random variable, for which we use the following vocabulary. Let \(\lambda\) be a real number, and let \(p=F_X(\lambda)\text{.}\) We say that \(\lambda\) has the quantile rank \(p\text{,}\) and we say that \(\lambda\) has \(100p\)-th percentile rank. These terms are used whether or not \(\lambda\) is an actual value \(\lambda=X(\omega)\) of \(X\) for some \(\omega\text{.}\)

🔗

If \(F_X\) were invertible, then we could choose any \(p\in (0,1)\) and solve the equation \(F_X(\lambda)=p\text{.}\) It would be natural to say that the solution \(\lambda\) is the \(100p\)-th percentile value of \(X\text{.}\) But distribution functions of discrete random variables are not invertible. The set of solutions to an equation \(F_X(\lambda)=p\) is either empty or consists of an interval \([u,v)\text{.}\) This means that we have to make slightly artificial definitions if we wish to refer to “the \(100p\)-th percentile” value for \(X\text{.}\) If \(F_X^{-1}(p)=\emptyset\text{,}\) then the \(100p\)-th percentile value of \(X\) is defined to the be smallest number \(\xi\) such that \(F_X(\xi)\gt p\text{.}\) If \(F_X^{-1}(p)=[u,v)\text{,}\) then the \(100p\)th percentile value of \(X\) is defined to be \((u+v)/2\text{.}\) With these definitions, the median is the \(50\)-th percentile value. The upper quartile is the \(75\)-th percentile value, and the lower quartile is the \(25\)-th percentile value. The interquartile range (IQR) is the upper quartile minus the lower quartile.

🔗

Checkpoint 4.7.

Explain and give examples about the statements regarding the possibilities for \(F_X^{-1}(p)\) for a discrete random variable.

🔗

Box plots. A box plot (or a box and whiskers plot) is a visual representation of some basic features of a random variable. One dimension of the box (it can be horizontal width or the vertical height, let’s say it is the horizontal width) is arbitrary; the other dimension (the vertical height, in this description) is equal to the interquartile range. A vertical scale is drawn on one side or the other of the box. A horizontal line from one vertical side of the box to the other is drawn at the height of the median. Vertical extensions (the “whiskers”) from the top and bottom of the box extend to the maximum and minimum values of the random variable.

🔗

Checkpoint 4.8.

Find examples of box plots. Pick a style you like and make a bunch for yourself.

🔗

Subsection 4.2 Histograms

Let \(F_X\) be a distribution function for a random variable \(X\text{.}\) Let \([a,b]\) be a closed interval of the real line, and let \(\mathcal{P}=\{x_0,x_1,\ldots, x_n\}\) be a partition \([a,b]\text{,}\) that is, we have

\begin{equation*} a=x_0\lt x_1\lt x_2 \lt \cdots \lt x_n=b. \end{equation*}

Let \(I_k\) denote the interval \((x_{k-1},x_k]\text{.}\) Let \((\Delta x)_k\) denote the width \((\Delta x)_k=x_k-x_{k-1}\) of \(I_k\text{,}\) and let \(P_k=P(X\in I_k)=F_X(x_k)-F_X(x_{k-1})\text{.}\) We define \(R_k\) to be a rectangular region

\begin{equation} R_k = I_k \times [0,P_k/(\Delta x)_k]\tag{4.4} \end{equation}

so that the area of \(R_k\) is \(P_k\text{.}\) The histogram for \(X\) on \([a,b]\) with partition \(\mathcal{P}\) is a collection of \(n\) rectangular regions \(R_1,R_2,\ldots,R_n\text{.}\) The intervals \(I_k\) are called the class intervals for the histogram.

🔗

Checkpoint 4.9.

A frequently used alternative convention for histograms is to switch the closed and open ends of the class intervals, that is, to use \(I_k=[x_{k-1},x_k)\text{.}\) When the distinction has to be made clear, we say that \(I_k=[x_{k-1},x_k)\) uses the left endpoint convention, while the usual definition \(I_k=(x_{k-1},x_k]\) uses the right endpoint convention. In the case of the left endpoint convention, \(P_k\) is still defined to be \(P_k=P(X\in I_k)\text{.}\) Write an expression for the left endpoint convention version \(P_k\) in terms of the distribution function \(F_X\text{.}\)

🔗
What would be wrong about using \(I_k=[x_{k-1},x_k]\text{?}\)

🔗
For a random variable \(X\) whose values are whole numbers only, it is common to use class intervals with edges on half-integers (that is, \(x_k=n_k+1/2\) for some integer \(n_k\text{,}\) for every \(k\)). Why is this better than using class intervals with edge points on whole numbers?

🔗
Generate data (a list of numbers) in the interval \([50,150]\text{.}\) Choose three different partitions: one partition with 3 intervals, another partition with 5 intervals, and another partition with 7 intervals. Draw the histogram for each partition.

🔗

🔗

Subsection 4.3 Expectation

Here is a simple example that motivates the notion of expected value. Suppose you play a dice game in which you win two dollars every time your dice roll comes up showing the six face, and you lose a dollar if you roll something different from a six. In 600 rolls, you would expect to roll a six about 100 times. From this you would gain 200 dollars. You would expect to roll something different from a six about 500 times. From this you would lose 500 dollars. Your net gain is \(200-500=-300\) dollars, which averages to \(-.50\) dollars per roll. You could have found this by the calculation

\begin{equation*} \text{average net gain per win }= 2\cdot 1/6 +(-1)\cdot 5/6. \end{equation*}

This is a sum of the form \(\sum (\text{value})(\text{probability})\text{,}\) where “value” is the value of a random variable — in this case, win/loss per roll. Here is the formal mathematical definition.

🔗

Let \(X\) be a random variable on a discrete probability space \((\Omega,P)\) with probability function \(p\text{.}\) The expected value of \(X\), denoted \(E(X)\text{,}\) is defined to be

\begin{equation} E(X) = \sum_{\omega\in\Omega}X(\omega) p(\omega).\tag{4.5} \end{equation}

The expected value of a random variable is also called its average or mean, and we write \(\mu_X\) (or just \(\mu\text{,}\) if \(X\) is clear from context) for \(E(X)\text{.}\) It is sometimes useful to group the summands in (4.5) as follows.

\begin{equation} E(X) = \sum_{x\in X(\Omega)}x P(X=x).\tag{4.6} \end{equation}

🔗

Checkpoint 4.10.

Make up several random variables on small finite probability spaces. Calculate their expected values.

🔗
Let \(X\) be a random variable and let \(Y=aX+b\text{.}\) Show that \(E(Y)=aE(X)+b\text{.}\)

🔗
Justify (4.6).

🔗

🔗

Variance and standard deviation are measures of the spread of a random variable about its mean. The variance of \(X\text{,}\) denoted \(\var(X)\) or \(\sigma_X^2\) (or just \(\sigma\text{,}\) if \(X\) is understood), is

\begin{equation} \var(X)=E((X-\mu_X)^2)=E(X^2)-(E(X))^2.\tag{4.7} \end{equation}

The standard deviation of \(X\text{,}\) denoted \(\SD(X)\) or \(\sigma_X\) (or just \(\sigma\)), is the square root of the variance.

🔗

Checkpoint 4.11.

Calculate the variance for each of the examples you made up in the previous Checkpoint.

🔗
Let \(X\) be a random variable and let \(Y=aX+b\text{.}\) Show that \(\var(Y)=a^2\var(X)\text{.}\)

🔗
Show the two expressions for variance (4.7) are equal.

🔗

🔗

Standardized random variables. A random variable \(X\) is said to be standardized if \(E(X)=0\) and \(\var(X)=1\text{.}\) If \(X\) is any random variable with \(E(X)=\mu_X\) and \(\var(X)=\sigma_X^2\text{,}\) then the variable \(X' = \frac{X-\mu_X}{\sigma_X}\) is standardized.

🔗

Checkpoint 4.12.

Verify the last claim above.

🔗

Subsection 4.4 One basic example

In a later section, we will introduce several important random variables that arise naturally from samples taken from probability spaces. Many of these sample variables begin with the simplest possible probability with just two outcomes. Here are the definitions.

🔗

Definition 4.13.

A Bernoulli variable is a discrete random variable \(X\) that has exactly two values, \(0\) and \(1\text{.}\) It is traditional to use the symbols \(p,q\) to denote the probabilities \(p=P(X=1)\) and \(q=1-p = P(X=0)\text{.}\)

🔗

In the definition of Bernoulli variable, the probability space is not specified. The simplest possible probability space for a Bernoulli variable is a 2-element set \(\Omega=\{A,B\}\) (where \(A,B\) might represent outcomes heads and tails, or win and lose, or yes and no, etc.), with probability function \(p(a)=p\text{,}\) \(p(b)=q=1-p\text{.}\) However, the sample space for a Bernoulli variable could have any number of elements.

🔗

Checkpoint 4.14.

Show that \(E(X)=p\) and \(\var(X)=pq\) for a Bernoulli variable \(X\text{.}\)

🔗

Find a 10-element probability space and probability function for a Bernoulli variable \(X\) with \(p(X=1)=.6\text{.}\) Find an infinite probability space and probability function for a Bernoulli variable with \(P(X=1)=.6\text{.}\)

🔗

Subsection 4.5 Independence and covariance

Definition 4.15.

Let \(X,Y\colon \Omega\to \R\) be random variables on the same sample space. The joint distribution function \(F_{XY}\colon \R^2\to \R\) is given by

\begin{equation} F_{XY}(\lambda,\mu) = P(X\leq \lambda\text{ and }Y\leq \mu).\tag{4.8} \end{equation}

Variables \(X,Y\) are called independent if \(F_{XY}=F_XF_Y\text{.}\) If \(X,Y\) are not independent, they are called dependent. The covariance of \(X\) and \(Y\text{,}\) denoted \(\covar(X,Y)\) is

\begin{equation} \covar(X,Y)=E(XY)-E(X)E(Y).\tag{4.9} \end{equation}

🔗

Proposition 4.16.

If \(X,Y\) are independent, then \(\covar(X,Y)=0\text{.}\)

🔗

Checkpoint 4.17.

Show by example that the converse of Proposition 4.16 is false.

🔗
Show that two discrete random variables on a finite sample space are independent if and only if \(P(X\in A\text{ and }Y\in B)=P(X\in A)P(Y\in B)\) for every pair of sets \(A,B\subseteq \R\text{.}\) This demonstrates why we use the term independent.

🔗

🔗

Prev Top Next