Statisticians (and other users of data) need to simulate data for many reasons.

For example, I simulate as a way to check whether a model is appropriate. If the observed data are similar to the data I generated, then this is one way to show my model may be a good one.

It is also sometimes useful to simulate data from a distribution when I need to estimate an expected value (approximate an integral).

R can already generate data from many (named) distributions:

set.seed(400) #reproducibility

rnorm(10) # 10 observations of a N(0,1) r.v.
##  [1] -1.0365488  0.6152833  1.4729326 -0.6826873 -0.6018386 -1.3526097
##  [7]  0.8607387  0.7203705  0.1078532 -0.5745512
rnorm(10, 0, 5) # 10 observations of a N(0,5^2) r.v.
##  [1] -4.5092359  0.4464354 -7.9689786 -0.4342956 -5.8546081  2.7596877
##  [7] -3.2762745 -2.1184014  2.8218477 -5.0927654
rexp(10) # 10 observations from an Exp(1) r.v.
##  [1] 0.67720831 0.04377997 5.38745038 0.48773005 1.18690322 0.92734297
##  [7] 0.33936255 0.99803323 0.27831305 0.94257810

But what about when we don’t have a function to do it?

1 Inverse Transform Method

Theorem 1.1 (Probability Integral Transform) If \(X\) is a continuous r.v. with cdf \(F_X\), then \(U = F_X(X) \sim \text{Uniform}[0, 1]\).



This leads to to the following method for simulating data.




Inverse Transform Method:

First, generate \(u\) from Uniform\([0,1]\). Then, \(x = F_X^{-1}(u)\) is a realization from \(F_X\).


Note:


1.1 Algorithm

  1. Derive the inverse function \(F_X^{-1}\).

  2. Write a function to compute \(x = F_X^{-1}(u)\).

  3. For each realization,





Example 1.1 Simulate a random sample of size \(1000\) from the pdf \(f_X(x) = 3x^2, 0 \le x \le 1\).












1.2 Discrete RVs

If \(X\) is a discrete random variable and \(\cdots < x_{i-1} < x_i < \cdots\) are the points of discontinuity of \(F_X(x)\), then the inverse transform is \(F_X^{-1}(u) = x_i\) where \(F_X(x_{i-1}) < u \le F_X(x_i)\). This leads to the following algorithm:

  1. Generate a r.v. \(U\) from Unif\((0,1)\).

  2. Select \(x_i\) where \(F_X(x_{i-1}) < U \le F_X(x_{i})\).

Example 1.2 Generate 1000 samples from the following discrete distribution.
x 1.0 2.0 3.0
f 0.1 0.2 0.7

2 Acceptance-Reject Method

The goal is to generate realizations from a target density, \(f\).

Most cdfs cannot be inverted in closed form.

The Acceptance-Reject (or “Accept-Reject”) samples from a distribution that is similar to \(f\) and then adjusts by only accepting a certain proportion of those samples.



The method is outlined below:

Let \(g\) denote another density from which we know how to sample and we can easily calculate \(g(x)\).

Let \(e(\cdot)\) denote an envelope, having the property \(e(x) = c g(x) \ge f(x)\) for all \(x \in \mathcal{X} = \{x:f(x) > 0\}\) for a given constant \(c \ge 1\).

The Accept-Reject method then follows by sampling \(Y \sim g\) and \(U \sim \text{Unif}(0,1)\).

If \(U < f(Y)/e(Y)\), accept \(Y\). Set \(X = Y\) and consider \(X\) to be an element of the target random sample.

Note: \(1/c\) is the expected proportion of candidates that are accepted.



2.1 Algorithm

  1. Find a suitable density \(g\) and envelope \(e\).

  2. Sample \(Y \sim g\).

  3. Sample \(U \sim \text{Unif}(0, 1)\).

  4. If \(U < f(Y)/e(Y)\), accept \(Y\).

  5. Repeat from Step 2 until you have generated your desired sample size.




2.3 Why does this work?

Recall that we require \[ cg(y) \ge f(y) \qquad \forall y \in \{y: f(y) > 0\}. \] Thus,






The larger the ratio \(\frac{f(y)}{cg(y)}\), the more the random variable \(Y\) looks like a random variable distributed with pdf \(f\) and the more likely \(Y\) is to be accepted.

2.4 Additional Resources

See p.g. 69-70 of Rizzo for a proof of the validity of the method.

3 Transformation Methods

We have already used one transformation method – Inverse transform method – but there are many other transformations we can apply to random variables.

  1. If \(Z \sim N(0, 1)\), then \(V = Z^2 \sim\)

  2. If \(U \sim \chi^2_{m}\) and \(V \sim \chi^2_{n}\) are independent, then \(F = \frac{U/m}{V/n} \sim\)

  3. If \(Z \sim N(0, 1)\) and \(V \sim \chi^2_{n}\) are independendent, then \(T = \frac{Z}{\sqrt{V/n}}\sim\)

  4. If \(U \sim \text{Gamma}(r, \lambda)\) and \(V \sim \text{Gamma}(s, \lambda)\) are independent, then \(X = \frac{U}{U + V}\sim\)


Definition 3.1 A transformation is any function of one or more random variables.

Sometimes we want to transform random variables if observed data don’t fit a model that might otherwise be appropriate. Sometimes we want to perform inference about a new statistic.

Example 3.1 If \(X_1, \dots, X_n \stackrel{iid}{\sim} \text{Bernoulli}(p)\). What is the distribution of \(\sum_{i = 1}^n X_i\)?



Example 3.2 If \(X\sim N(0,1)\), what is the distribution of \(X + 5\)?



Example 3.3 For \(X_1, \dots, X_n\) iid random variables, what is the distribution of the median of \(X_1, \dots, X_n\)? What is the distribution of the order statistics? \(X_{[i]}\)?



There are many approaches to deriving the pdf of a transformed variable.

But the theory isn’t always available. What can we do?

3.1 Algorithm

Let \(X_1, \dots, X_p\) be a set of independent random variables with pdfs \(f_{X_1}, \dots, f_{X_p}\), respectively, and let \(g(X_1, \dots, X_p)\) be some transformation we are interested in simulating from.

  1. Simulate \(X_1 \sim f_{X_1}, \dots, X_p \sim f_{X_p}\).

  2. Compute \(G = g(X_1, \dots, X_p)\). This is one draw from \(g(X_1, \dots, X_p)\).

  3. Repeat Steps 1-2 many times to simulate from the target distribution.



Example 3.4 It is possible to show for \(X_1, \dots, X_p \stackrel{iid}{\sim} N(0,1)\), \(Z = \sum_{i = 1}^p X_i^2 \sim \chi^2_p\). Imagine that we cannot use the rchisq function. How would you simulate \(Z\)?


4 Mixture Distributions

The faithful dataset in R contains data on eruptions of Old Faithful (Geyser in Yellowstone National Park).

##   eruptions waiting
## 1     3.600      79
## 2     1.800      54
## 3     3.333      74
## 4     2.283      62
## 5     4.533      85
## 6     2.883      55

What is the shape of these distributions?



Definition 4.1 A random variable \(Y\) is a discrete mixture if the distribution of \(Y\) is a weighted sum \(F_Y(y) = \sum \theta_i F_{X_i}(y)\) for some sequence of random variables \(X_1, X_2, \dots\) and \(\theta_i > 0\) such that \(\sum \theta_i = 1\).

For \(2\) r.v.s,

Example 4.1

4.1 Mixtures vs. Sums

Note that mixture distributions are not the same as the distribution of a sum of r.v.s.

Example 4.2 Let \(X_1 \sim N(0, 1)\) and \(X_2 \sim N(4, 1)\), independent.

\(S = \frac{1}{2}(X_1 + X_2)\)









\(Z\) such that \(f_Z(z) = 0.5f_{X_1}(z) + 0.5f_{X_2}(z)\).

What about \(f_Z(z) = 0.7f_{X_1}(z) + 0.3f_{X_2}(z)\)?

4.2 Models for Count Data (refresher)

Recall that the Poisson\((\lambda)\) distribution is useful for modeling count data.

\[ f(x) = \frac{\lambda^x \exp\{-\lambda\}}{x!}, \quad x = 0, 1, 2, \dots \] Where \(X =\) number of events occuring in a fixed period of time or space.

When the mean \(\lambda\) is low, then the data consists of mostly low values (i.e. \(0, 1, 2\), etc.) and less frequently higher values.

As the mean count increases, the skewness goes away and the distribution becomes approximately normal.

With the Poisson distribution, \[E[X] = Var X = \lambda.\]

Example 4.3





Example 4.4 The Colorado division of Parks and Wildlife has hired you to analyze their data on the number of fish caught in Horsetooth resevoir by visitors. Each visitor was asked - How long did you stay? - How many fish did you catch? - Other questions: How many people in your group, were children in your group, etc.

Some visiters do not fish, but there is not data on if a visitor fished or not. Some visitors who did fish did not catch any fish.

Note, this is modified from https://stats.idre.ucla.edu/r/dae/zip/.

A zero-inflated model assumes that the zero observations have two different origins – structural and sampling zeroes.

Example 4.5









A zero-inflated model is a mixture model because the distribution is a weighted average of the sampling model (i.e. Poisson) and a point-mass at \(0\).

For \(Y\sim ZIP(\lambda)\),

\[ Y \sim \begin{cases} 0 & \text{with probability } \pi \\ \text{Poisson}(\lambda) & \text{with probability } 1-\pi \end{cases} \] So that,

\[ Y = \qquad \qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad \]

To simulate from this distribution,