Monte Carlo methods may refer to any method in statistical inference or numerical analysis were simulation is used.
We have so far learned about Monte Carlo methods for estimation.
We will now look at Monte Carlo methods to estimate coverage probability for confidence intervals, Type I error of a test procedure, and power of a test.
In statistical inference there is uncertainty in an estimate. We will use repeated sampling (Monte Carlo methods) from a given probability model to investigate this uncertainty.
1 Monte Carlo Estimate of Coverage
1.1 Confidence Intervals
Recall from your intro stats class that a \(95\%\) confidence interval for \(\mu\) (when \(\sigma\) is known and \(X_1, \dots, X_n \stackrel{iid}{\sim} N(\mu, \sigma^2)\)) is of the form
Interpretation:
Comments:
Mathematical interpretation:
In general,
So, if we have formulas for \(L\) and \(U\), we can use Monte Carlo integration to estimate \(\alpha\).
An estimate of \(1 - \alpha\) tells us about the behavior of our estimator \([L, U]\) in practice.
1.2 Vocabulary
We say \(P(L < \theta < U) = P(\text{CI contains } \theta) = 1 - \alpha.\)
\(1 - \alpha =\)
\(1 - \hat{\alpha} =\)
1.3 Algorithm
Let \(X \sim F_X\) and \(\theta\) is the parameter of interest.
Consider a confidence interval for \(\theta\), \(C = [L, U]\).
Then, a Monte Carlo Estimator of Coverage could be obtained with the following algorithm.
1.4 Motivation
Why do we want empirical and nominal coverage to match?
Your Turn
We want to examine empirical coverage for confidence intervals of the mean.
Coverage for CI for \(\mu\) when \(\sigma\) is known, \(\left(\overline{x} - z_{1 - \frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}}, \overline{x} + z_{1 - \frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}}\right)\).
Simulate \(X_1, \dots, X_n \stackrel{iid}{\sim} N(0, 1)\). Compute the empirical coverage for a \(95\%\) confidence interval for \(n = 5\) using \(m = 1000\) MC samples.
Plot 100 confidence intervals using
geom_segment()and add a line indicating the true value for \(\mu = 0\). Color your intervals by if they contain \(\mu\) or not.Repeat the Monte Carlo estimate of coverage 100 times. Plot the distribution of the results. This is the Monte Carlo estimate of the distribution of the coverage.
Repeat part 1 but without \(\sigma\) known. Now you will plug in an estimage for \(\sigma\) (using
sd()) when you estimate the CI using the same formula that assumes \(\sigma\) known. What happens to the empirical coverage? What can we do to improve the coverage? Now increase \(n\). What happens to coverage?Repeat 2a. when the data are distributed \(\text{Unif}[-1, 1]\) and variance unknown. What happens to the coverage? What can we do to improve coverage in this case and why?
2 Monte Carlo Methods for Hypothesis Tests
There are two aspects of hypothesis tests that we will investigate through the use of Monte Carlo methods: Type I error and Power.
Example 2.1 Assume we want to test the following hypotheses
\[\begin{align*} H_0&: \mu = 5 \\ H_a&: \mu > 5 \end{align*}\]
with the test statistic
\[ T^* = \frac{\overline{x} - 5}{s/\sqrt{n}}. \]
This leads to the following decision rule:What are we assuming about \(X\)?
2.1 Types of Errors
Type I error:
Type II error:
Usually we set \(\alpha = 0.05\) or \(0.10\), and choose a sample size such that power = \(1 - \beta \ge 0.80\).
For simple cases, we can find formulas for \(\alpha\) and \(\beta\).
2.2 MC Estimator of \(\alpha\)
Assume \(X_1, \dots, X_n \sim F(\theta_0)\) (i.e., assume \(H_0\) is true).
Then, we have the following hypothesis test –
\[\begin{align*} H_0&: \theta = \theta_0 \\ H_a&: \theta > \theta_0 \end{align*}\]
and the statistics \(T^*\), which is a test statistic computed from data. Then we reject \(H_0\) if \(T^* >\) the critical value from the distribution of the test statistic.
This leads to the following algorithm to estimate the Type I error of the test (\(\alpha\))
Your Turn
Example 2.2 (Pearson’s moment coefficient of skewness) Let \(X \sim F\) where \(E(X) = \mu\) and \(Var(X) = \sigma^2\). Let \[ \sqrt{\beta_1} = E\left[\left(\frac{X - \mu}{\sigma}\right)^3\right]. \]
Then for a
- symmetric distribution, \(\sqrt{\beta_1} = 0\),
- positively skewed distribution, \(\sqrt{\beta_1} > 0\), and
- negatively skewed distribution, \(\sqrt{\beta_1} < 0\).
The following is an estimator for skewness
\[ \sqrt{b_1} = \frac{\dfrac{1}{n}\sum\limits_{i = 1}^n (X_i - \overline{X})^3}{\left[\dfrac{1}{n}\sum\limits_{i = 1}^n (X_i - \overline{X})^2\right]^{3/2}}. \]
It can be shown by Statistical theory that if \(X_1, \dots, X_n \sim N(\mu, \sigma^2)\), then as \(n \rightarrow \infty\), \[ \sqrt{b_1} \stackrel{\cdot}{\sim} N\left(0, \frac{6}{n}\right). \]
Thus we can test the following hypothesis \[\begin{align*} H_0&: \sqrt{\beta_1} = 0 \\ H_a&: \sqrt{\beta_1} \not= 0 \end{align*}\] by comparing \(\frac{\sqrt{b_1}}{\sqrt{\frac{6}{n}}}\) to a critical value from a \(N(0, 1)\) distribution.
In practice, convergence of \(\sqrt{b_1}\) to a \(N\left(0, \frac{6}{n}\right)\) is slow.
library(tidyverse)
# compare a symmetric and skewed distribution
data.frame(x = seq(0, 1, length.out = 1000)) %>%
mutate(skewed = dbeta(x, 6, 2),
symmetric = dbeta(x, 5, 5)) %>%
gather(type, dsn, -x) %>%
ggplot() +
geom_line(aes(x, dsn, colour = type, lty = type))## write a skewness function based on a sample x
skew <- function(x) {
}
## check skewness of some samples
n <- 100
a1 <- rbeta(n, 6, 2)
a2 <- rbeta(n, 2, 6)
## two symmetric samples
b1 <- rnorm(100)
b2 <- rnorm(100)
## fill in the skewness values
ggplot() + geom_histogram(aes(a1)) + xlab("Beta(6, 2)") + ggtitle(paste("Skewness = "))
ggplot() + geom_histogram(aes(a2)) + xlab("Beta(2, 6)") + ggtitle(paste("Skewness = "))
ggplot() + geom_histogram(aes(b1)) + xlab("N(0, 1)") + ggtitle(paste("Skewness = "))
ggplot() + geom_histogram(aes(b2)) + xlab("N(0, 1)") + ggtitle(paste("Skewness = "))Assess the Type I error rate of a skewness test using the finite sample correction variance.
2.3 Power
Consider a hypothesis test about the parameter \(\theta\): \[\begin{align*} H_0: \theta &= \theta_0 \\ H_a: \theta &> \theta_0 \end{align*}\] We let \(\beta = P(\text{fail to reject } H_0 | H_0 \text{ is false}) = P(\text{Type II error})\), then Power = \(P(\text{reject } H_0 | H_0 \text{ is false}) = 1 - \beta\).
Power depends on the distance between the hypothesized value of the parameter \(\theta_0\) and the actual value \(\theta_1\), so we can write \(1 - \beta(\theta_1)\).
Why is power important?
For a few simple cases, you can derive a closed form expression of power.
So power is a function of
2.4 MC Estimator of \(1 - \beta\)
Assume \(X_1, \dots, X_n \sim F(\theta_0)\) (i.e., assume \(H_0\) is true).
Then, we have the following hypothesis test –
\[\begin{align*} H_0&: \theta = \theta_0 \\ H_a&: \theta > \theta_0 \end{align*}\]
and the statistics \(T^*\), which is a test statistic computed from data. Then we reject \(H_0\) if \(T^* >\) the critical value from the distribution of the test statistic.
This leads to the following algorithm to estimate the power of the test (\(1 - \beta\))
Your Turn
Consider data generated from the following mixture distribution: \[ f(x) = (1 - \epsilon)f_1(x) + \epsilon f_2(x), \quad x \in \mathbb{R} \] where \(f_1\) is the pdf of a \(N(0, 1)\) distribution, \(f_2\) is the pdf of a \(N(0, 100)\) distribution, and \(\epsilon \in [0, 1]\).
r_noisy_normal <- function(n, epsilon) {
z <- rbinom(n, 1, 1 - epsilon)
z*rnorm(n, 0, 1) + (1 - z)*rnorm(n, 0, 10)
}
n <- 100
data.frame(e = 0, sample = r_noisy_normal(n, 0)) %>%
rbind(data.frame(e = 0.1, sample = r_noisy_normal(n, 0.1))) %>%
rbind(data.frame(e = 0.6, sample = r_noisy_normal(n, 0.6))) %>%
rbind(data.frame(e = 0.9, sample = r_noisy_normal(n, 0.9))) %>%
ggplot() +
geom_histogram(aes(sample)) +
facet_wrap(.~e, scales = "free")
We will compare the power of various tests of normality. Let \(F_X\) be the distribution of a random variable \(X\). We will consider the following hypothesis test, \[ H_0: F_x \in N \qquad \text{vs.} \qquad H_a: F_x \not\in N, \] where \(N\) denotes the family of univariate Normal distributions.
Recall Pearson’s moment coefficient of skewness (See Example 2.2 ).
We will compare Monte Carlo estimates of power for different levels of contamination (\(0 \le \epsilon \le 1\)). We will use \(\alpha = 0.1\), \(n = 100\), and \(m = 100\).
# skewness statistic function
skew <- function(x) {
xbar <- mean(x)
num <- mean((x - xbar)^3)
denom <- mean((x - xbar)^2)
num/denom^1.5
}
# setup for MC
alpha <- .1
n <- 100
m <- 100
epsilon <- seq(0, 1, length.out = 200)
var_sqrt_b1 <- 6*(n - 2)/((n + 1)*(n + 3)) # adjusted variance for skewness test
crit_val <- qnorm(1 - alpha/2, 0, sqrt(var_sqrt_b1)) #crit value for the test
empirical_pwr <- rep(NA, length(epsilon)) #storage
# estimate power for each value of epsilon
for(j in 1:length(epsilon)) {
# perform MC to estimate empirical power
## Your turn
}
## store empirical se
empirical_se <- "Your Turn: fill this in"
## plot results --
## x axis = epsilon values
## y axis = empirical power
## use lines + add band of estimate +/- seCompare the power with \(n = 100\) to the power with \(n = 10\). Make a plot to compare the two for many values of \(\epsilon\).