We will briefly review some definitions and concepts in probability and statistics that will be helpful for the remainder of the class.
Just like we reviewed computational tools (R
and packages), we will now do the same for probability and statistics.
Note: This is not meant to be comprehensive. I am assuming you already know this and maybe have forgotten a few things.
Alternative text: “Hell, my eighth grade science class managed to conclusively reject it just based on a classroom experiment. It’s pretty sad to hear about million-dollar research teams who can’t even manage that.”
1 Random Variables and Probability
Types of random variables –
Discrete take values in a countable set.
Continuous take values in an uncountable set (like \(\mathbb{R}\))
1.1 Distribution and Density Functions
There are a few requirements of a valid pmf
A pmf is defined for discrete variables, but what about continuous? Continuous variables do not have positive probability pass at any single point.
\(X\) is a continuous random variable if there exists this function \(f_X \ge 0\) such that for all \(x \in \mathbb{R}\), this probability exists.
For \(f_X\) to be a valid pdf,
There are many named pdfs and cdfs that you have seen in other class, e.g.
Example 1.5 Let \[ f(x) = \begin{cases} c(4x - 2x^2) & 0 < x < 2 \\ 0 & \text{otherwise} \end{cases} \]
Find \(c\) and then find \(P(X > 1)\)
The cdf has the following properties
A random variable \(X\) is continuous if \(F_X\) is a continuous function and discrete if \(F_X\) is a step function.
Note \(f(x) = F'(x) = \frac{dF(x)}{dx}\) in the continuous case.
Recall an indicator function is defined as
\[ \mathbb{1}_{\{A\}} = \begin{cases} 1 & \text{if } A \text{ is true} \\ 0 & \text{otherwise} \end{cases}. \]
Example 1.8 If \(X \sim N(0,1)\), the pdf is \(f(x) = \frac{1}{\sqrt{2\pi}}\exp\left(-\frac{x^2}{2}\right)\) for \(-\infty < x < \infty\).
If \(f(x) = \frac{c}{\sqrt{2\pi}} \exp\left(-\frac{x^2}{2}\right)\mathbb{1}_{\{x > 0\}}\), what is \(c\)?
1.2 Two Continuous Random Variables
Joint pdfs have the following properties
and a support defined to be \(\{(x, y):f_{X,Y}(x,y) > 0\}\).
The marginal densities of \(X\) and \(Y\) are given by
\[ f_X(x) = \int\limits_\infty^\infty f_{X,Y}(x,y) dy \qquad\text{and}\qquad f_Y(y) = \int\limits_\infty^\infty f_{X,Y}(x,y) dx; \]
Example 1.10 (From Devore (2008) Example 5.3, pg. 187) A bank operates both a drive-up facility and a walk-up window. On a randomly selected day, let \(X\) be the proportion of time that the drive-up facility is in use and \(Y\) is the proportion of time that the walk-up window is in use.
The the set of possible values for \((X, Y)\) is the square \(D = \{(x, y): 0 \le x \le 1, 0 \le y \le 1\}\). Suppose the joint pdf is given by \[ f_{X, Y}(x, y) = \begin{cases} \frac{6}{5}(x + y^2) & x \in [0,1], y \in [0,1] \\ 0 & \text{otherwise} \end{cases} \]Evaluate the probability that both the drive-up and the walk-up windows are used a quarter of the time or less.
Find the marginal densities for \(X\) and \(Y\).
Compute the probability that the drive-up facility is used a quarter of the time or less.
2 Expected Value and Variance
This is a weighted average of all possible values \(\mathcal{X}\) by the probability distribution.
x | 4.0 | 6.0 | 8.0 |
f | 0.5 | 0.3 | 0.2 |
Find
\(E[X]\)
\(Var[X]\)
Covariance measures how two random variables vary together (their linear relationship).
Two variables \(X\) and \(Y\) are uncorrelated if \(\rho(X,Y) = 0\).
3 Independence and Conditional Probability
In classical probability, the conditional probability of an event \(A\) given that event \(B\) has occured is \[ P(A|B) = \frac{P(A\cap B)}{P(B)}. \]3.1 Random variables
The same ideas hold for random variables. If \(X\) and \(Y\) have joint pdf \(f_{X,Y}(x,y)\), then the conditional density of \(X\) given \(Y = y\) is \[ f_{X|Y = y}(x) = \frac{f_{X,Y}(x,y)}{f_{Y}(y)}. \]
Thus, two random variables \(X\) and \(Y\) are independent if and only if
\[
f_{X,Y}(x,y) = f_X(x)f_Y(y).
\]
Also, if \(X\) and \(Y\) are independent, then \[ f_{X|Y = y}(x) = \qquad\qquad\qquad\qquad\qquad\qquad\qquad \]
4 Properties of Expected Value and Variance
Suppose that \(X\) and \(Y\) are random variables, and \(a\) and \(b\) are constants. Then the following hold:
- \(E[aX + b] =\)
- \(E[X + Y] =\)
- If \(X\) and \(Y\) are independent, then \(E[XY] =\)
- \(Var[b] =\)
- \(Var[aX + b] =\)
- If \(X\) and \(Y\) are independent, \(Var[X + Y] =\)
5 Random Samples
6 R
Tips
From here on in the course we will be dealing with a lot of randomness. In other words, running our code will return a random result.
But what about reproducibility??
When we generate “random” numbers in R
, we are actually generating numbers that look random, but are pseudo-random (not really random). The vast majority of computer languages operate this way.
This means all is not lost for reproducibility!
Before running our code, we can fix the starting point (seed
) of the pseudorandom number generator so that we can reproduce results.
Speaking of generating numbers, we can generate numbers (also evaluate densities, distribution functions, and quantile functions) from named distributions in R
.
7 Limit Theorems
Motivation
For some new statistics, we may want to derive features of the distribution of the statistic.
When we can’t do this analytically, we need to use statistical computing methods to approximate them.
We will return to some basic theory to motivate and evaluate the computational methods to follow.
7.1 Laws of Large Numbers
Limit theorems describe the behavior of sequences of random variables as the sample size increases (\(n \rightarrow \infty\)).
Often we describe these limits in terms of how close the sequence is to the truth.
We can evaluate this distance in several ways.
Some modes of convergence –
Laws of large numbers –
7.2 Central Limit Theorem
Interpretation:
Note that the CLT doesn’t require the population distribution to be Normal.
8 Estimates and Estimators
Let \(X_1, \dots, X_n\) be a random sample from a population.
Let \(T_n = T(X_1, \dots, X_n)\) be a function of the sample.
Statistics estimate parameters.
We need to be careful not to confuse the above ideas:
\(\overline{X}_n\)
\(\overline{x}_n\)
\(\mu\)
We can make any number of estimators to estimate a given quantity. How do we know the “best” one?
9 Evaluating Estimators
There are many ways we can describe how good or bad (evaluate) an estimator is.
9.1 Bias
9.2 Mean Squared Error (MSE)
Generally, we want estimators with
Sometimes an unbiased estimator \(\hat{\theta}_n\) can have a larger variance than a biased estimator \(\tilde{\theta}_n\).
9.3 Standard Error
We seek estimators with small \(se(\hat{\theta}_n)\).
10 Comparing Estimators
We typically compare statistical estimators based on the following basic properties:
Example 10.1 Let us consider the efficiency of estimates of the center of a distribution. A measure of central tendency estimates the central or typical value for a probability distribution.
Mean and median are two measures of central tendency. They are both unbiased, which is more efficient?set.seed(400)
times <- 10000 # number of times to make a sample
n <- 100 # size of the sample
uniform_results <- data.frame(mean = numeric(times), median = numeric(times))
normal_results <- data.frame(mean = numeric(times), median = numeric(times))
for(i in 1:times) {
x <- runif(n)
y <- rnorm(n)
uniform_results[i, "mean"] <- mean(x)
uniform_results[i, "median"] <- median(x)
normal_results[i, "mean"] <- mean(y)
normal_results[i, "median"] <- median(y)
}
uniform_results %>%
gather(statistic, value, everything()) %>%
ggplot() +
geom_density(aes(value, lty = statistic)) +
ggtitle("Unif(0, 1)") +
theme(legend.position = "bottom")
normal_results %>%
gather(statistic, value, everything()) %>%
ggplot() +
geom_density(aes(value, lty = statistic)) +
ggtitle("Normal(0, 1)") +
theme(legend.position = "bottom")
Next Up In Ch. 5, we’ll look at a method that produces unbiased estimators of \(E(g(X))\)!