Particle Markov-chain Monte Carlo with {idmodelr}

Akira Endo

library(idmodelr)
#> Warning: package 'idmodelr' was built under R version 3.6.2

Introduction

This vignette is to introduce to the {idmodelr} package users a simple use example of the particle Markov-chain Monte Carlo (PMCMC) algorithm. We take simple deterministic/stochastic epidemic models with partial observation as model cases and show how the models are coupled with PMCMC for efficient parameter estimation.

Infectious disease modelling often involves handling of time-series datasets with partial (or modified) observation. A familiar example of the partial observation is underreporting, where only a certain fraction of infected individuals are identified and recorded in the dataset. One can account for the observation processes by explicitly incorporating them into the model, but this approach can lead to a particular class of model called “hidden Markov process (HMP)”, whose likelihood is usually intractable.

HMPs are typically represented as a Markov process with hidden inner states and its observation. Namely, the hidden variable \(x_t\) evolves over time according to the time-evolution process \(f\) and only \(y_t\), the observed variable subject to the observation process \(g\), is available as the model output.

\[\begin{matrix} x_t & \sim & f(x_t;x_{t-1}) \\ y_t &\sim &g(y_t;x_t) \end{matrix} \]

The likelihood of this HMP given the data \(\{y_1,y_2,...,y_T\}\) should theoretically be obtained by integrating out hidden variables \(x_t\)’s; however, this integration is computationally too heavy in practice due to high dimensionality.

\[ \mathcal L=\int\int...\int \left( p(x_1)g(y_1;x_1)\sum_{t=2}^T {f(x_t;x_{t-1})g(y_t;x_t)} \right) dx_1dx_2...dx_T. \] PMCMC employs the sequential Monte Carlo (SMC) to approximate this likelihood for better efficiency. The posterior distributions of \(x_t\)’s are represented by a set of samples (referred to as “particles”), which is sequentially sampled for \(t=1,2,...,T\). For each time step \(t\), SMC simulates the current state \(\{x_t\}\) based on the previous state samples \(\{x_{t-1}\}\) and weight them based on the stepwise likelihood \(g(y_t;x_t)\), so that the resulting samples provide a Monte Carlo integration. The model parameters are then searched using MCMC based on the approximated likelihood.

For further details and formal introduction to the PMCMC method, see Endo et al. (2019).

Deterministic SIR model

First, we take the determinisitic Susceptible-Infectious-Recovered (SIR) model as an example. Although PMCMC is not necessary to fit the deterministic SIR model to data because the likelihood can be directly computed, we will introduce it for a comparison with the stochastic SIR model example later, which PMCMC is better suited for.

Model specifications

The SIR model is a simple compartmental model widely used in infectious disease epidemiology. The model categorises the population into three classes: \(S\) (not yet infected but at risk), \(I\) (infected and currently infectious) and \(R\) (previously infected but not infectious anymore), and characterises the transitions between the classes by a set of ordinary differential equations.

\[\begin{cases} \frac{d}{dt}S(t)=-\beta S(t)I(t) \\ \frac{d}{dt}I(t)=\beta S(t)I(t) - \gamma I(t) \\ \frac{d}{dt}R(t)=\gamma I(t) \end{cases} \]

SIR model runs on continuous time and we can get the state of each compartment at given time \(t\).

However, we usually do not observe infections on continuous time: outbreak data is usually reported as incidence on discrete time steps (e.g. weekly case counts). Therefore, we need to assume certain obervation processes when we fit the model to the observed data.

One of the options is to assume the reported number of cases follows a Poisson distribution whose mean is equal to the ascertainment probability \(p\) times the cumulative amount of the newly-infected (i.e., inflow into the \(I\) compartment) in SIR model. Here we assume \(p\) is an unknown constant. Let \(C_n\) be the number of cases observed over the period between \(t=T_{n-1}\) and \(t=T_{n}\). We then get

\[ \begin{matrix} C_n & \sim & \operatorname{Pois}(pc_n), \\ c_n & = & \int_{T_{n-1}}^{T_{n}}\beta I(t)S(t)dt \\ & = & S(T_{n-1})-S(T_{n}). \\ %& = & S(T_{n-1})\left[1-\exp\left(-\int_{T_{n-1}}^{T_{n}}\beta I(t)dt\right)\right] \end{matrix} \]

We can consider this system as an HMP. For discrete time steps \(n=1,2,...,N\), \(S(T_n)\), \(I(T_n)\) and \(R(T_n)\) constitute the (hidden) states of the HMP. The ODEs provide the time-evolution process and the Poisson distributions are the observation process.

Although we introduced PMCMC as an inference tool for HMPs, we do not need to use PMCMC for this deterministic setting because the likelihood is directly available without particle approximation. Given parameters \(\beta\) and \(\gamma\), the likelihood of observing \(C_1, C_2, ..., C_N\) cases is computed as below.

Parameters can be estimated with this likelihood; e.g., by the maximum likelihood estimation or Bayesian posterior sampling.

Model fitting by MCMC

In this section, we show how the deterministic SIR model can be fitted to the observed data. For better comparison with the stochastic SIR model case in the later section, we incrementally evaluate the likelihood by following time steps rather than in one shot. Here, getSIR(n) returns the hidden state variables at step \(n\) (i.e. \(S(t_n)\), \(I(t_n)\) and \(R(t_n)\)) and observe(C_n, c_n, p) gives the incremental likelihood of observing \(C_n\) cases given \(\lambda_n\) and \(p\). Assuming improper uniform priors on the positive real line, MCMC is implemented targeting the likelihood function.

# Stepwise likelihood compuation and MCMC
getSIR <- function(n, SIRtraj) {
    t_n <- T_n[n + 1]
    SIR <- unlist(SIRtraj[t_n, ])
    return(list(S = SIR["S"], I = SIR["I"], R = SIR["R"]))
}
observe <- function(C_n, c_n, p) {
    logl <- dpois(C_n, c_n * p, log = T)
    return(logl)
}

# MCMC implementation with {LaplacesDemon}
library(LaplacesDemon)
n_iter <- 5000
chain <- matrix(0, n_iter, 3)
Data <- list(parm.names = c("beta", "tau", "p"), mon.names = "logl", N = 1000)
Model <- function(parms, Data) {
    beta <- abs(parms[1])
    tau <- abs(parms[2])
    p <- interval(parms[3], 0, 1)
    traj <- sim_outbreak(c(beta, tau), 1000, times)
    prevS <- unlist(traj[1, "S"])
    logl <- 0
    for (n in 1:10) {
        currS <- getSIR(n, traj)$S
        logl <- logl + observe(cases[n], prevS - currS, p)
        prevS <- currS
    }
    return(list(LP = logl, Dev = -2 * logl, Monitor = logl, yhat = NULL, parm = c(beta, 
        tau, p)))
}
Covar <- matrix(c(0.5, 0.45, 0.03, 0.45, 0.45, 0.03, 0.03, 0.03, 0.003), 3)  # Specified ad-hoc: need to use adaptive methods etc. in practice
set.seed(2020)
chain <- LaplacesDemon(Model, Data, Initial.Values = c(1, 1, 0.5), Covar = Covar, 
    Iterations = n_iter, Status = 1000, Algorithm = "RWM")
#> 
#> Laplace's Demon was called on Wed Jan 15 18:58:42 2020
#> 
#> Performing initial checks...
#> Suggestion: 1 possible instance(s) of for loops
#>      were found in the Model specification. Iteration speed will
#>      increase if for loops are vectorized in R or coded in a
#>      faster language such as C++ via the Rcpp package.
#> Algorithm: Random-Walk Metropolis 
#> 
#> Laplace's Demon is beginning to update...
#> Iteration: 1000,   Proposal: Multivariate,   LP: -23.9
#> Iteration: 2000,   Proposal: Multivariate,   LP: -23.4
#> Iteration: 3000,   Proposal: Multivariate,   LP: -24.9
#> Iteration: 4000,   Proposal: Multivariate,   LP: -23.5
#> Iteration: 5000,   Proposal: Multivariate,   LP: -26.3
#> 
#> Assessing Stationarity
#> Assessing Thinning and ESS
#> Creating Summaries
#> Estimating Log of the Marginal Likelihood
#> Creating Output
#> 
#> Laplace's Demon has finished.
chain$Summary2
#>                Mean          SD       MCSE       ESS          LB      Median
#> beta       3.567631  0.58522096 0.09101409  93.48947   2.7805671   3.4446113
#> tau        1.666154  0.69820241 0.11052192  89.21031   0.7351573   1.5256705
#> p          0.340713  0.06092527 0.00939460  86.13116   0.2677050   0.3237893
#> Deviance  53.826001 78.62832926 4.83391214 343.17428  46.2839755  48.6346771
#> logl     -26.913000 39.31416463 2.41695607 343.17428 -28.9875397 -24.3173385
#>                   UB
#> beta       5.1835019
#> tau        3.5933746
#> p          0.5110684
#> Deviance  57.9750794
#> logl     -23.1419877
for (c in 1:3) plot(chain$Posterior1[, c], type = "l", lty = 1, col = c, xlab = "Iterations", 
    ylab = c("beta", "tau", "p")[c])

Because the time evolution process is deterministic, the incremental likelihood is computed without random fluctuations; getSIR(n) returns the fixed values and therefore the fixed incremental likelihood observe(C_n, c_n, p). One the other hand, the time evolution randomly fluctuates in the stochastic SIR model (detailed in the next section). A possible approach to accounting for the stochasticity is to use many samples and approximate the likelihood by averaging over the samples, which is the key idea of PMCMC.

Stochastic SIR model

Next, we show the PMCMC implementation for the stochastic SIR model, which constitutes a typical HMP with an intractable likelihood function.

Model specifications

We adopt the Gillespie algorithm to incorporate stochasticity into the SIR model. The three variables \(S\), \(I\) and \(R\), are now all considered as integers, and we assume that each individual transitions between compartment randomly according to the following rates:

\[\begin{matrix} r_{S\rightarrow I}&=&\beta I(t), \\ r_{I\rightarrow R}&=&\gamma. \end{matrix}\] Assuming that transitions of more than one individuals do not happen exactly at the same time, we can characterise the time evolution of the stochastic SIR model as a random sequence of the two events:

Because the number of individuals in each compartment stays constant between the successive two events, the transition rates are also constant meanwhile. Note that the next earliest happening of an event is the earlier of either “one of the \(S\) susceptible individuals gets infected” or “one of the \(I\) infectious individuals recovers”. The time between the two successive events (where the time of the first event is \(t_k\)) therefore follows an exponential distribution with the rate \[r_{\mathrm{Next}}=r_{S\rightarrow I}S(t_k)+r_{I\rightarrow R}I(t_k)=\beta S(t_k)I(t_k)+\gamma I(t_k).\]

The Gillespie algorithm simulates the epidemic by the following steps. For complete program code examples, see epirecipes or {idmodelr} package repository.

  1. Start with the initial condition \(S(0),I(0),R(0), t_0=0.\)
  2. For \(k=1,2,...,\) sample the time of the \(k\)-th event: \[t_k\sim \mathrm{Exp}(r_{\mathrm{Next}}).\]
  3. Randomly determine whether the \(k\)-th event is “new infection” or “new recovery” according to the relative ratio between \(r_{S\rightarrow I}\) and \(r_{I\rightarrow R}\): \[\begin{matrix} p(``\text{new infection}")&=& \frac{r_{S\rightarrow I}S(t_{k-1})}{r_{S\rightarrow I}S(_{k-1})+r_{I\rightarrow R}I(_{k-1})}, \\ p(``\text{new recovery}")&=& \frac{r_{I\rightarrow R}S(_{k-1})}{r_{S\rightarrow I}S(_{k-1})+r_{I\rightarrow R}I(_{k-1})}. \end{matrix} \]
  4. Update the state variables and obtain \((S(t_k),I(t_k),R(t_k))\) according to the type of event.
  5. Repeat 2-4 until \(t_k\) exceed the simulation stop time \(t_k\).

The output of the Gillespie algorithm,\(\{S(t_k),I(t_k),R(t_k)\}\) for \(k=1,2,...\), can then be translated into a continuous-time epidemic time series. {idmodelr} supports epidemic simulations with the Stochastic SIR model.

As we did for the deterministic SIR model, we can incorporate an observation process (underreporting) into the model. \(C_n\), the number of observed cases between time \(T_{n-1}\) and \(T_n\), is given as \[\begin{matrix} C_n&\sim&\operatorname{Bin}(c_n,p), \\ c_n&=&S(T_{n-1})-S(T_n), \end{matrix}\] where \(c_n\) is the number of “new infection” event occurrences between time \(T_{n-1}\) and \(T_n\). The system consisting of the hidden states \(S(T_n), I(T_n),R(T_n)\) and \(c_n\) for discrete time steps \(n=1,2,...,N\) is an HMP. Unlike the case of deterministic SIR model, the likelihood function for observed data \(\{C_1,C_2,...,C_N\}\) is intractable and PMCMC needs to be employed to efficiently estimate the model parameters with the approximated likelihood.

Model fitting by PMCMC

We implement PMCMC to estimate the parameters from the observed data \(\{C_1,C_2,...,C_N\}\). The function simulateSIR_stoch(n) simulates the stochastic SIR model from \(t=T_{n-1}\) to \(T_n\) and observe_stoch(C_n, c_n, p) gives the incremental likelihood corresponding to the observation \(C_n\). Improper uniform priors on the positive real line are assumed.

The SMC part of PMCMC starts off with a set of initial particles all corresponding to the initial state \(S(0)=999, I(0)=1, R(0)=0\). For each time step \(n\), the time evolution of the stochastic SIR model is simulated for each particle. Then the particles are weighted by the incremental likelihood and resampled to yield the samples that approximate the total likelihood. In the MCMC part, the parameter values are updated by the Metropolis Hastings algorithm based on the approximated likelihood.

PMCMC for the stochastic SIR model works in the following steps:

  1. Set the initial value for the parameters \(\theta_0=(\beta_0,\tau_0,p_0)\).
  2. For \(m=1,2,...,\), draw the next parameter values \(\theta_{m-1}\) from the proposal distribution.
  3. Run SMC with the proposed parameters \(\theta_{m-1}\) to approximate the marginal likelihood \(\mathcal L(\theta_{m})\) :
    1. Let \(x_n=(S_n,I_n,R_n)\). Initialise \(J\) particles \(\{x_0^{(j)}\}(j=1,2,...,J)\) as \(x_0^{(j)}=(S_0,I_0,R_0)=(999,1,0)\).
    2. For \(n=1,2,...,N\), draw the current states \(x_n^{(j)}\) by simulating the stochastic SIR model (characterised by \(\theta_m\)) from the corresponding previous state \(x_{n-1}^{(j)}\).
    3. Calculate the weights for each particle \(x_n^{(j)}\) proportionally to \(g(y_n;x_n^{(j)})\), that is, \(w_n^{(j)}=\frac{g(y_n;x_n^{(j)})}{\sum_{j=1}^J g(y_n;x_n^{(j)})}\) (resample the particles/trajectories if necessary).
    4. Repeat 2-3 to obtain the approximated marginal likelihood by taking the average over the samples: \(\hat L(\theta_m)=\sum_{n=1}^N\sum_{j=1}^Jw_n^{(j)}g(y_n;x_n^{(j)})\)
  4. Accept/reject the proposed parameters \(\theta_{m-1}\) based on the comparison between the likelihoods \(\hat L(\theta_m)\) and \(\hat L(\theta_{m-1})\)
  5. Repeat 2-4 until convergence.