We will briefly review some definitions and concepts in probability and statistics that will be helpful for the remainder of the class.

Just like we reviewed computational tools (R and packages), we will now do the same for probability and statistics.

Note: This is not meant to be comprehensive. I am assuming you already know this and maybe have forgotten a few things.

https://xkcd.com/892/

Alternative text: “Hell, my eighth grade science class managed to conclusively reject it just based on a classroom experiment. It’s pretty sad to hear about million-dollar research teams who can’t even manage that.”

1 Random Variables and Probability

Definition 1.1 A random variable is a function that maps sets of all possible outcomes of an experiment (sample space \(\Omega\)) to \(\mathbb{R}\).

Example 1.1

Example 1.2

Example 1.3

Types of random variables –

Discrete take values in a countable set.

Continuous take values in an uncountable set (like \(\mathbb{R}\))

1.1 Distribution and Density Functions

Definition 1.2 The probability mass function (pmf) of a random variable \(X\) is \(f_X\) defined by \[ f_X(x) = P(X = x) \] where \(P(\cdot)\) denotes the probability of its argument.

There are a few requirements of a valid pmf

Example 1.4 Let \(\Omega =\) all possible values of a roll of a single die \(= \{1, \dots, 6\}\) and \(X\) be the outcome of a single roll of one die \(\in \{1, \dots, 6\}\).

A pmf is defined for discrete variables, but what about continuous? Continuous variables do not have positive probability pass at any single point.

Definition 1.3 The probability density function (pdf) of a random variable \(X\) is \(f_X\) defined by \[ P(X \in A) = \int\limits_{x \in A} f_X(x) dx. \]

\(X\) is a continuous random variable if there exists this function \(f_X \ge 0\) such that for all \(x \in \mathbb{R}\), this probability exists.

For \(f_X\) to be a valid pdf,

There are many named pdfs and cdfs that you have seen in other class, e.g.

Example 1.5 Let \[ f(x) = \begin{cases} c(4x - 2x^2) & 0 < x < 2 \\ 0 & \text{otherwise} \end{cases} \]

Find \(c\) and then find \(P(X > 1)\)

Definition 1.4 The cumulative distribution function (cdf) for a random variable \(X\) is \(F_X\) defined by \[ F_X(x) = P(X \le x), \quad x \in \mathbb{R}. \]

The cdf has the following properties

A random variable \(X\) is continuous if \(F_X\) is a continuous function and discrete if \(F_X\) is a step function.

Example 1.6 Find the cdf for the previous example.

Note \(f(x) = F'(x) = \frac{dF(x)}{dx}\) in the continuous case.

Recall an indicator function is defined as

\[ \mathbb{1}_{\{A\}} = \begin{cases} 1 & \text{if } A \text{ is true} \\ 0 & \text{otherwise} \end{cases}. \]

Example 1.7

Example 1.8 If \(X \sim N(0,1)\), the pdf is \(f(x) = \frac{1}{\sqrt{2\pi}}\exp\left(-\frac{x^2}{2}\right)\) for \(-\infty < x < \infty\).

If \(f(x) = \frac{c}{\sqrt{2\pi}} \exp\left(-\frac{x^2}{2}\right)\mathbb{1}_{\{x > 0\}}\), what is \(c\)?

1.2 Two Continuous Random Variables

Definition 1.5 The joint pdf of the continuous vector \((X,Y)\) is defined as \[ P((X, Y) \in A) = \iint\limits_{A} f_{X,Y}(x, y) dx dy \] for any set \(X \subset \mathbb{R}^2\).

Joint pdfs have the following properties

and a support defined to be \(\{(x, y):f_{X,Y}(x,y) > 0\}\).

Example 1.9

The marginal densities of \(X\) and \(Y\) are given by

\[ f_X(x) = \int\limits_\infty^\infty f_{X,Y}(x,y) dy \qquad\text{and}\qquad f_Y(y) = \int\limits_\infty^\infty f_{X,Y}(x,y) dx; \]

Example 1.10 (From Devore (2008) Example 5.3, pg. 187) A bank operates both a drive-up facility and a walk-up window. On a randomly selected day, let \(X\) be the proportion of time that the drive-up facility is in use and \(Y\) is the proportion of time that the walk-up window is in use.

The the set of possible values for \((X, Y)\) is the square \(D = \{(x, y): 0 \le x \le 1, 0 \le y \le 1\}\). Suppose the joint pdf is given by \[ f_{X, Y}(x, y) = \begin{cases} \frac{6}{5}(x + y^2) & x \in [0,1], y \in [0,1] \\ 0 & \text{otherwise} \end{cases} \]

Evaluate the probability that both the drive-up and the walk-up windows are used a quarter of the time or less.

Find the marginal densities for \(X\) and \(Y\).

Compute the probability that the drive-up facility is used a quarter of the time or less.

2 Expected Value and Variance

Definition 2.1 The expected value (average or mean) of a random variable \(X\) with pdf or pmf \(f_X\) is defined as \[ E[X] = \begin{cases} \sum\limits_{x \in \mathcal{X}} x f_X(x_i) & X \text{ is discrete} \\ \int\limits_{x \in \mathcal{X}} x f_X(x) dx & X \text{ is continuous.} \\ \end{cases} \] Where \(\mathcal{X} = \{x: f_X(x) > 0\}\) is the support of \(X\).

This is a weighted average of all possible values \(\mathcal{X}\) by the probability distribution.

Example 2.1 Let \(X \sim \text{Bernoulli}(p)\). Find \(E[X]\).

Example 2.2 Let \(X \sim \text{Exp}(\lambda)\). Find \(E[X]\).

Definition 2.2 Let \(g(X)\) be a function of a continuous random variable \(X\) with pdf \(f_X\). Then, \[ E[g(X)] = \int_{x \in \mathcal{X}} g(x) f_X(x) dx. \]

Definition 2.3 The variance (a measure of spread) is defined as \[\begin{align*} Var[X] &= E\left[(X - E[X])^2\right] \\ &= E[X^2] - \left(E[X]\right)^2 \end{align*}\]

Example 2.3 Let \(X\) be the number of cylinders in a car engine. The following is the pmf function for the size of car engines.

x	4.0	6.0	8.0
f	0.5	0.3	0.2

Find

\(E[X]\)

\(Var[X]\)

Covariance measures how two random variables vary together (their linear relationship).

Definition 2.4 The covariance of \(X\) and \(Y\) is defined by \[\begin{align*} Cov[X,Y] &= E\left[(X - E[X])(Y - E[Y])\right] \\ &= E[XY] - E[X]E[Y] \end{align*}\] and the correlation of \(X\) and \(Y\) is defined as \[ \rho(X, Y) = \frac{Cov[X, Y]}{\sqrt{Var[X]Var[Y]}}. \]

Two variables \(X\) and \(Y\) are uncorrelated if \(\rho(X,Y) = 0\).

3 Independence and Conditional Probability

In classical probability, the conditional probability of an event \(A\) given that event \(B\) has occured is \[ P(A|B) = \frac{P(A\cap B)}{P(B)}. \]

Definition 3.1 Two events \(A\) and \(B\) are independent if \(P(A|B) = P(A)\). The converse is also true, so \[ A \text{ and } B \text{ are independent} \Leftrightarrow P(A | B) = P(A) \Leftrightarrow P(A \cap B) = \]

Theorem 3.1 (Bayes’ Theorem) Let \(A\) and \(B\) be events. Then, \[ P(A|B) = \frac{P(A \cap B)}{P(B)} = \]

3.1 Random variables

The same ideas hold for random variables. If \(X\) and \(Y\) have joint pdf \(f_{X,Y}(x,y)\), then the conditional density of \(X\) given \(Y = y\) is \[ f_{X|Y = y}(x) = \frac{f_{X,Y}(x,y)}{f_{Y}(y)}. \]

Thus, two random variables \(X\) and \(Y\) are independent if and only if \[ f_{X,Y}(x,y) = f_X(x)f_Y(y). \]

Also, if \(X\) and \(Y\) are independent, then \[ f_{X|Y = y}(x) = \qquad\qquad\qquad\qquad\qquad\qquad\qquad \]

4 Properties of Expected Value and Variance

Suppose that \(X\) and \(Y\) are random variables, and \(a\) and \(b\) are constants. Then the following hold:

\(E[aX + b] =\)
\(E[X + Y] =\)
If \(X\) and \(Y\) are independent, then \(E[XY] =\)
\(Var[b] =\)
\(Var[aX + b] =\)
If \(X\) and \(Y\) are independent, \(Var[X + Y] =\)

5 Random Samples

Definition 5.1 Random variables \(\{X_1, \dots, X_n\}\) are defined as a random sample from \(f_X\) if \(X_1, \dots, X_n \stackrel{iid}{\sim}f_X\).

Example 5.1

Theorem 5.1 If \(X_1, \dots, X_n \stackrel{iid}{\sim}f_X\), then \[ f(x_1, \dots, x_n) = \prod\limits_{i = 1}^n f_X(x_i). \]

Example 5.2 Let \(X_1, \dots, X_n\) be iid. Derive the expected value and variance of the sample mean \(\overline{X}_n = \frac{1}{n}\sum\limits_{i = 1}^n X_i\).

6 `R` Tips

From here on in the course we will be dealing with a lot of randomness. In other words, running our code will return a random result.

But what about reproducibility??

When we generate “random” numbers in R, we are actually generating numbers that look random, but are pseudo-random (not really random). The vast majority of computer languages operate this way.

This means all is not lost for reproducibility!

set.seed(400)

Before running our code, we can fix the starting point (seed) of the pseudorandom number generator so that we can reproduce results.

Speaking of generating numbers, we can generate numbers (also evaluate densities, distribution functions, and quantile functions) from named distributions in R.

rnorm(100)
dnorm(x)
pnorm(x)
qnorm(y)

7 Limit Theorems

Motivation

For some new statistics, we may want to derive features of the distribution of the statistic.

When we can’t do this analytically, we need to use statistical computing methods to approximate them.

We will return to some basic theory to motivate and evaluate the computational methods to follow.

7.1 Laws of Large Numbers

Limit theorems describe the behavior of sequences of random variables as the sample size increases (\(n \rightarrow \infty\)).

Often we describe these limits in terms of how close the sequence is to the truth.

We can evaluate this distance in several ways.

Some modes of convergence –

Laws of large numbers –

7.2 Central Limit Theorem

Theorem 7.1 (Central Limit Theorem (CLT)) Let \(X_1, \dots, X_n\) be a random sample from a distribution with mean \(\mu\) and finite variance \(\sigma^2 > 0\), then the limiting distribution of \(Z_n = \frac{\overline{X}_n - \mu}{\sigma/\sqrt{n}}\) is \(N(0, 1)\).

Interpretation:

Note that the CLT doesn’t require the population distribution to be Normal.

8 Estimates and Estimators

Let \(X_1, \dots, X_n\) be a random sample from a population.

Let \(T_n = T(X_1, \dots, X_n)\) be a function of the sample.

Statistics estimate parameters.

Example 8.1

Definition 8.1 An estimator is a rule for calculating an estimate of a given quantity.

Definition 8.2 An estimate is the result of applying an estimator to observed data samples in order to estimate a given quantity.

We need to be careful not to confuse the above ideas:

\(\overline{X}_n\)

\(\overline{x}_n\)

\(\mu\)

We can make any number of estimators to estimate a given quantity. How do we know the “best” one?

9 Evaluating Estimators

There are many ways we can describe how good or bad (evaluate) an estimator is.

9.1 Bias

Definition 9.1 Let \(X_1, \dots, X_n\) be a random sample from a population, \(\theta\) a parameter of interest, and \(\hat{\theta}_n = T(X_1, \dots, X_n)\) an estimator. Then the bias of \(\hat{\theta}_n\) is defined as \[ bias(\hat{\theta}_n) = E[\hat{\theta}_n] - \theta. \]

Definition 9.2 An unbiased estimator is defined to be an estimator \(\hat{\theta}_n = T(X_1, \dots, X_n)\) where

Example 9.1

Example 9.2

Example 9.3

9.2 Mean Squared Error (MSE)

Definition 9.3 The mean squared error (MSE) of an estimator \(\hat{\theta}_n\) for parameter \(\theta\) is defined as \[\begin{align*} MSE(\hat{\theta}_n) &= E\left[(\theta - \hat{\theta}_n)^2\right] \\ &= Var(\hat{\theta}_n) + \left(bias(\hat{\theta}_n)\right)^2. \end{align*}\]

Generally, we want estimators with

Sometimes an unbiased estimator \(\hat{\theta}_n\) can have a larger variance than a biased estimator \(\tilde{\theta}_n\).

Example 9.4 Let’s compare two estimators of \(\sigma^2\). \[ s^2 = \frac{1}{n-1}\sum(X_i - \overline{X}_n)^2 \qquad \hat{\sigma}^2 = \frac{1}{n}\sum(X_i - \overline{X}_n)^2 \]

9.3 Standard Error

Definition 9.4 The standard error of an estimator \(\hat{\theta}_n\) of \(\theta\) is defined as \[ se(\hat{\theta}_n) = \sqrt{Var(\hat{\theta}_n)}. \]

We seek estimators with small \(se(\hat{\theta}_n)\).

Example 9.5

10 Comparing Estimators

We typically compare statistical estimators based on the following basic properties:

Example 10.1 Let us consider the efficiency of estimates of the center of a distribution. A measure of central tendency estimates the central or typical value for a probability distribution.

Mean and median are two measures of central tendency. They are both unbiased, which is more efficient?

set.seed(400)

times <- 10000 # number of times to make a sample
n <- 100 # size of the sample
uniform_results <- data.frame(mean = numeric(times), median = numeric(times))
normal_results <- data.frame(mean = numeric(times), median = numeric(times))

for(i in 1:times) {
  x <- runif(n)
  y <- rnorm(n)
  uniform_results[i, "mean"] <- mean(x)
  uniform_results[i, "median"] <- median(x)
  normal_results[i, "mean"] <- mean(y)
  normal_results[i, "median"] <- median(y)
}

uniform_results %>%
  gather(statistic, value, everything()) %>%
  ggplot() +
  geom_density(aes(value, lty = statistic)) +
  ggtitle("Unif(0, 1)") +
  theme(legend.position = "bottom")

normal_results %>%
  gather(statistic, value, everything()) %>%
  ggplot() +
  geom_density(aes(value, lty = statistic)) +
  ggtitle("Normal(0, 1)") +
  theme(legend.position = "bottom")

Next Up In Ch. 5, we’ll look at a method that produces unbiased estimators of \(E(g(X))\)!

Chapter 2: Probability for Statistical Computing