Monte Carlo integration is a statistical method based on random sampling in order to approximate integrals. This section could alternatively be titled,
“Integrals are hard, how can we avoid doing them?”
1 A Tale of Two Approaches
Consider a one-dimensional integral.
The value of the integral can be derived analytically only for a few functions, \(f\). For the rest, numerical approximations are often useful.
Why is integration important to statistics?
1.1 Numerical Integration
Idea: Approximate \(\int_a^bf(x) dx\) via the sum of many polygons under the curve \(f(x)\).
To do this, we could partition the interval \([a, b]\) into \(m\) subintervals \([x_i, x_{i+1}]\) for \(i = 0, \dots, m - 1\) with \(x_0 = a\) and \(x_m = b\).
Within each interval, insert \(k + 1\) nodes, so for \([x_i, x_{i+1}]\) let \(x^*_{ij}\) for \(j = 0, \dots, k\), then \[ \int\limits_a^b f(x)dx = \sum\limits_{i = 0}^{m - 1}\int\limits_{x_i}^{x_{i + 1}} f(x) dx \approx \sum\limits_{i = 0}^{m - 1}\sum\limits_{j = 0}^{k}A_{ij}f(x_{ij}^*) \] for some set of constants, \(A_{ij}\).
1.2 Monte Carlo Integration
How do we compute the mean of a distribution?
x <- seq(0, 1, length.out = 1000)
f <- function(x, a, b) 1/(b - a)
ggplot() +
geom_line(aes(x, f(x, 0, 1))) +
ylim(c(0, 1.5)) +
ggtitle("Uniform(0, 1)")
y <- seq(10, 20, length.out = 1000)
ggplot() +
geom_line(aes(y, f(y, 10, 20))) +
ylim(c(0, 1.5)) +
ggtitle("Uniform(10, 20)")
Theory
1.2.1 Notation
\(\theta\)
\(\hat{\theta}\)
Distribution of \(\hat{\theta}\)
\(E[\hat{\theta}]\)
\(Var(\hat{\theta})\)
\(\hat{E}[{\hat{\theta}}]\)
\(\hat{Var}({\hat{\theta}})\)
\(se(\hat{\theta})\)
\(\hat{se}(\hat{\theta})\)
1.2.2 Monte Carlo Simulation
What is Monte Carlo simulation?
1.2.3 Monte Carlo Integration
To approximate \(\theta = E[X] = \int x f(x) dx\), we can obtain an iid random sample \(X_1, \dots, X_n\) from \(f\) and then approximate \(\theta\) via the sample average
Now consider \(E[g(X)]\).
\[ \theta = E[g(X)] = \int\limits_{-\infty}^{\infty}g(x)f(x) dx. \]
The Monte Carlo approximation of \(\theta\) could then be obtained by
Why the mean?
Let \(E[g(X)] = \theta\), then
and, by the strong law of large numbers,
Monte Carlo integration provides slow convergence, i.e. even though by the SLLN we know we have convergence, it may take us a while to get there.
But, Monte Carlo integration is a very powerful tool. While numerical integration methods are difficult to extend to multiple dimensions and work best with a smooth integrand, Monte Carlo does not suffer these weaknesses.
1.2.4 Algorithm
The approach to finding a Monte Carlo estimator for \(\int g(x)f(x) dx\) is as follows.
Another approach:
We will look at 3 methods to estimate \(\Phi(x)\) for \(x > 0\).
1.2.5 Inference for MC Estimators
The Central Limit Theorem implies
So, we can construct confidence intervals for our estimator
But we need to estimate \(Var(\hat{\theta})\).
So, if \(m \uparrow\) then \(Var(\hat{\theta}) \downarrow\). How much does changing \(m\) matter?
Is there a better way to decrease the variance? Yes!
2 Importance Sampling
Can we do better than the simple Monte Carlo estimator of \[ \theta = E[g(X)] = \int g(x) f(x) dx \approx \frac{1}{m} \sum\limits_{i = 1}^m g(X_i) \] where the variables \(X_1, \dots, X_m\) are randomly sampled from \(f\)?
Yes!!
Goal: estimate integrals with lower variance than the simplest Monte Carlo approach.
To accomplish this, we will use importance sampling.
2.1 The Problem
If we are sampling an event that doesn’t occur frequently, then the naive Monte Carlo estimator will have high variance.
We want to improve accuracy by causing rare events to occur more frequently than they would under the naive Monte Carlo sampling framework, thereby enabling more precise estimation.
2.2 Algorithm
Consider a density function \(f(x)\) with support \(\mathcal{X}\). Consider the expectation of \(g(X)\), \[ \theta = E[g(X)] = \int_{\mathcal{X}}g(x)f(x)dx. \] Let \(\phi(x)\) be a density where \(\phi(x) > 0\) for all \(x \in \mathcal{X}\). Then the above statement can be rewritten as
An estimator of \(\phi\) is given by the importance sampling algorithm:
For this strategy to be convenient, it must be
2.3 Choosing \(\phi\)
In order for the estimators to avoid excessive variability, it is important that \(f(x)/\phi(x)\) is bounded and that \(\phi\) has heavier tails than \(f\).
A rare draw from \(\phi\) with much higher density under \(f\) than under \(\phi\) will receive a huge weight and inflate the variance of the estimate.
Strategy –
The importance sampling estimator can be shown to converge to \(\theta\) under the SLLN so long as the support of \(\phi\) includes all of the support of \(f\).
2.4 Compare to Previous Monte Carlo Approach
Common goal –
Step 1 Do some derivations.
Find an appropriate \(f\) and \(g\) to rewrite your integral as an expected value.
For importance sampling only,
Find an appropriate \(\phi\) to rewrite \(\theta\) as an expectation with respect to \(\phi\).
Step 2 Write pseudo-code (a plan) to define estimator and set-up the algorithm.
For Monte Carlo integration
For importance sampling
Step 3 Program it.
2.5 Extended Example
In this example, we will estimate \(\theta = \int_0^1 \frac{e^{-x}}{1 + x^2} dx\) using MC integration and importance sampling with two different importance sampling distributions, \(\phi\).
Your Turn
We want to use the following distribution for inference, where we know the shape, but not the full distributional form.
\[ f(x) = c\frac{\log(x)}{1 + x^2}, \quad x \in [1, \pi] \] What do we need for this to be a valid pdf?