Monte Carlo integration is a statistical method based on random sampling in order to approximate integrals. This section could alternatively be titled,

“Integrals are hard, how can we avoid doing them?”

https://xkcd.com/2117/

https://xkcd.com/2117/

1 A Tale of Two Approaches

Consider a one-dimensional integral.




The value of the integral can be derived analytically only for a few functions, \(f\). For the rest, numerical approximations are often useful.

Why is integration important to statistics?




1.1 Numerical Integration

Idea: Approximate \(\int_a^bf(x) dx\) via the sum of many polygons under the curve \(f(x)\).

To do this, we could partition the interval \([a, b]\) into \(m\) subintervals \([x_i, x_{i+1}]\) for \(i = 0, \dots, m - 1\) with \(x_0 = a\) and \(x_m = b\).

Within each interval, insert \(k + 1\) nodes, so for \([x_i, x_{i+1}]\) let \(x^*_{ij}\) for \(j = 0, \dots, k\), then \[ \int\limits_a^b f(x)dx = \sum\limits_{i = 0}^{m - 1}\int\limits_{x_i}^{x_{i + 1}} f(x) dx \approx \sum\limits_{i = 0}^{m - 1}\sum\limits_{j = 0}^{k}A_{ij}f(x_{ij}^*) \] for some set of constants, \(A_{ij}\).

1.2 Monte Carlo Integration

How do we compute the mean of a distribution?

Example 1.1 Let \(X \sim Unif(0,1)\) and \(Y \sim Unif(10, 20)\).

Theory







1.2.1 Notation

\(\theta\)


\(\hat{\theta}\)


Distribution of \(\hat{\theta}\)


\(E[\hat{\theta}]\)


\(Var(\hat{\theta})\)


\(\hat{E}[{\hat{\theta}}]\)


\(\hat{Var}({\hat{\theta}})\)


\(se(\hat{\theta})\)


\(\hat{se}(\hat{\theta})\)


1.2.2 Monte Carlo Simulation

What is Monte Carlo simulation?







1.2.3 Monte Carlo Integration

To approximate \(\theta = E[X] = \int x f(x) dx\), we can obtain an iid random sample \(X_1, \dots, X_n\) from \(f\) and then approximate \(\theta\) via the sample average





Example 1.2 Again, let \(X \sim Unif(0,1)\) and \(Y \sim Unif(10, 20)\). To estimate \(E[X]\) and \(E[Y]\) using a Monte Carlo approach,











Now consider \(E[g(X)]\).

\[ \theta = E[g(X)] = \int\limits_{-\infty}^{\infty}g(x)f(x) dx. \]

The Monte Carlo approximation of \(\theta\) could then be obtained by









Definition 1.1 Monte Carlo integration is the statistical estimation of the value of an integral using evaluations of an integrand at a set of points drawn randomly from a distirbution with support over the range of integration.


Example 1.3







Why the mean?

Let \(E[g(X)] = \theta\), then





and, by the strong law of large numbers,





Example 1.4 Let \(v(x) = (g(x) - \theta)^2\), where \(\theta = E[g(X)]\), and assume \(g(X)^2\) has finite expectation under \(f\). Then \[ Var(g(X)) = E[(g(X) - \theta)^2] = E[v(X)]. \] We can estimate this using a Monte Carlo approach.

Monte Carlo integration provides slow convergence, i.e. even though by the SLLN we know we have convergence, it may take us a while to get there.

But, Monte Carlo integration is a very powerful tool. While numerical integration methods are difficult to extend to multiple dimensions and work best with a smooth integrand, Monte Carlo does not suffer these weaknesses.



1.2.4 Algorithm

The approach to finding a Monte Carlo estimator for \(\int g(x)f(x) dx\) is as follows.









Example 1.5 Estimate \(\theta = \int_0^1 h(x) dx\).

Example 1.6 Estimate \(\theta = \int_a^b h(x) dx\).







Another approach:

Example 1.7 Monte Carlo integration for the standard Normal cdf. Let \(X \sim N(0, 1)\), then the pdf of \(X\) is \[\phi(x) = f(x) = \frac{1}{\sqrt{2\pi}}\exp\left(-\frac{x^2}{2}\right), \qquad -\infty < x < \infty\] and the cdf of \(X\) is \[\Phi(x) = F(x) = \int\limits_{-\infty}^x \frac{1}{\sqrt{2\pi}}\exp\left(-\frac{t^2}{2}\right)dt.\]
We will look at 3 methods to estimate \(\Phi(x)\) for \(x > 0\).

1.2.5 Inference for MC Estimators

The Central Limit Theorem implies





So, we can construct confidence intervals for our estimator





But we need to estimate \(Var(\hat{\theta})\).

So, if \(m \uparrow\) then \(Var(\hat{\theta}) \downarrow\). How much does changing \(m\) matter?

Example 1.8 If the current \(se(\hat{\theta}) = 0.01\) based on \(m\) samples, how many more samples do we need to get \(se(\hat{\theta}) = 0.0001\)?



































Is there a better way to decrease the variance? Yes!

2 Importance Sampling

Can we do better than the simple Monte Carlo estimator of \[ \theta = E[g(X)] = \int g(x) f(x) dx \approx \frac{1}{m} \sum\limits_{i = 1}^m g(X_i) \] where the variables \(X_1, \dots, X_m\) are randomly sampled from \(f\)?

Yes!!

Goal: estimate integrals with lower variance than the simplest Monte Carlo approach.


To accomplish this, we will use importance sampling.

2.1 The Problem

If we are sampling an event that doesn’t occur frequently, then the naive Monte Carlo estimator will have high variance.

Example 2.1 Monte Carlo integration for the standard Normal cdf. Consider estimating \(\Phi(-3)\) or \(\Phi(3)\).






We want to improve accuracy by causing rare events to occur more frequently than they would under the naive Monte Carlo sampling framework, thereby enabling more precise estimation.

2.2 Algorithm

Consider a density function \(f(x)\) with support \(\mathcal{X}\). Consider the expectation of \(g(X)\), \[ \theta = E[g(X)] = \int_{\mathcal{X}}g(x)f(x)dx. \] Let \(\phi(x)\) be a density where \(\phi(x) > 0\) for all \(x \in \mathcal{X}\). Then the above statement can be rewritten as









An estimator of \(\phi\) is given by the importance sampling algorithm:









For this strategy to be convenient, it must be

Example 2.2 Suppose you have a fair six-sided die. We want to estimate the probability that a single die roll will yield a \(1\).

2.3 Choosing \(\phi\)

In order for the estimators to avoid excessive variability, it is important that \(f(x)/\phi(x)\) is bounded and that \(\phi\) has heavier tails than \(f\).





Example 2.3







Example 2.4







A rare draw from \(\phi\) with much higher density under \(f\) than under \(\phi\) will receive a huge weight and inflate the variance of the estimate.


Strategy –


Example 2.5

The importance sampling estimator can be shown to converge to \(\theta\) under the SLLN so long as the support of \(\phi\) includes all of the support of \(f\).

2.4 Compare to Previous Monte Carlo Approach

Common goal –


Step 1 Do some derivations.

  1. Find an appropriate \(f\) and \(g\) to rewrite your integral as an expected value.

  2. For importance sampling only,

    Find an appropriate \(\phi\) to rewrite \(\theta\) as an expectation with respect to \(\phi\).



Step 2 Write pseudo-code (a plan) to define estimator and set-up the algorithm.

  • For Monte Carlo integration







  • For importance sampling







Step 3 Program it.

2.5 Extended Example

In this example, we will estimate \(\theta = \int_0^1 \frac{e^{-x}}{1 + x^2} dx\) using MC integration and importance sampling with two different importance sampling distributions, \(\phi\).

Your Turn

We want to use the following distribution for inference, where we know the shape, but not the full distributional form.

\[ f(x) = c\frac{\log(x)}{1 + x^2}, \quad x \in [1, \pi] \] What do we need for this to be a valid pdf?