Laplace Approximation for Mixed Effects Models

theory

@(@\newcommand{\R}[1]{ {\rm #1} } \newcommand{\B}[1]{ {\bf #1} } \newcommand{\W}[1]{ \; #1 \; }@)@@(@ \newcommand{\dtheta}[1]{ \frac{\R{d}}{\R{d} \theta_{ #1}} } @)@This is cppad_mixed--20220519 documentation: Here is a link to its current documentation . Laplace Approximation for Mixed Effects Models
Reference
Total Likelihood
Random Likelihood, f(theta, u)
     Assumption
     Random Effects Objective
Fixed Likelihood, g(theta)
Optimal Random Effects, u^(theta)
Objective
     Laplace Approximation, h(theta, u)
     Laplace Objective, r(theta)
     Fixed Effects Objective, L(theta)
Derivative of Optimal Random Effects
Derivative of Random Constraints
Derivative of Laplace Objective
Approximate Optimal Random Effects
     First Order, U(beta, theta, u)
     Second Order, W(beta, theta, u)
Approximate Laplace Objective, H(beta, theta, u)
Approximate Random Constraint Function, B(beta, theta, u)
Hessian of Laplace Objective
Hessian of Random Constraints
Sparse Observed Information

Reference
TMB: Automatic Differentiation and Laplace Approximation, Kasper Kristensen, Anders Nielsen, Casper W. Berg, Hans Skaug, Bradley M. Bell, Journal of Statistical Software 70, 1-21 April 2016.

Total Likelihood
The reference above defines @(@ f( \theta, u) @)@ to be the negative log-likelihood of the @(@ z @)@, @(@ y @)@, @(@ u @)@ and @(@ \theta @)@; i.e., @[@ - \log [ \; \B{p} ( y | \theta, u ) \B{p} ( u | \theta ) \; \B{p} ( z | \theta )\B{p} ( \theta ) \; ] @]@

Random Likelihood, f(theta, u)
We use @(@ f( \theta , u ) @)@ for the part of the likelihood that depends on the random effects @(@ u @)@; @[@ f( \theta, u ) = - \log [ \B{p} ( y | \theta, u ) \B{p} ( u | \theta ) ] @]@

Assumption
The function @(@ f(\theta, u) @)@ is assumed to be smooth. Furthermore, there are no constraints on the value of @(@ u @)@.

Random Effects Objective
Give a value for the fixed effects @(@ \theta @)@, the random effects objective is the random likelihood as just a function of the random effects; i.e., @(@ f( \theta , \cdot ) @)@.

Fixed Likelihood, g(theta)
We use @(@ g( \theta ) @)@ for the part of the likelihood that only depends on the fixed effects @(@ \theta @)@; @[@ g( \theta ) = - \log [ \B{p} ( z | \theta ) \B{p} ( \theta ) ] @]@ The function @(@ g( \theta ) @)@ may not be smooth, to be specific, it can have absolute values in it (corresponding to the Laplace densities). Furthermore, there may be constraints on the value of @(@ \theta @)@.

Optimal Random Effects, u^(theta)
Given the fixed effects @(@ \theta @)@, we use @(@ \hat{u} ( \theta ) @)@ to denote the random effects that maximize the random likelihood; i.e., @[@ \hat{u} ( \theta ) = \R{argmin} \; f( \theta, u ) \; \R{w.r.t.} \; u @]@ Note that this definition agrees with the other definition for u^(theta) .

Objective

Laplace Approximation, h(theta, u)
Using the notation above, the Laplace approximation as a function of both the fixed and random effects is @[@ h( \theta, u ) = + \frac{1}{2} \log \det f_{u,u} ( \theta, u ) + f( \theta, u ) - \frac{n}{2} \log ( 2 \pi ) @]@ where @(@ n @)@ is the number of random effects.

Laplace Objective, r(theta)
We refer to @[@ r( \theta ) = h[ \theta , \hat{u} ( \theta ) ] \approx - \log \left[ \int_{-\infty}^{+\infty} \B{p} ( y | \theta, u ) \B{p} ( u | \theta ) \; \B{d} u \right] @]@ as the Laplace objective. This corresponds to equation (4) in the reference .

Fixed Effects Objective, L(theta)
The fixed effects objective, as a function of just the fixed effects, is @[@ L ( \theta ) = r( \theta ) + g( \theta ) @]@

Derivative of Optimal Random Effects
Because @(@ f(\theta, u) @)@ is smooth, and @(@ \hat{u} ( \theta ) @)@ is optimal w.r.t @(@ u @)@, we obtain @[@ f_u [ \theta , \hat{u} ( \theta ) ] = 0 @]@ From this equation, and the implicit function theorem, it follows that @[@ \hat{u}_\theta ( \theta ) = - f_{u,u} \left[ \theta , \hat{u} ( \theta ) \right]^{-1} f_{u,\theta} \left[ \theta , \hat{u} ( \theta ) \right] @]@

Derivative of Random Constraints
The derivative of the random constraint function is given by @[@ \partial_\theta [ A \; \hat{u} ( \theta ) ] = A \; \hat{u}_\theta ( \theta ) @]@

Derivative of Laplace Objective
The derivative of the random part of the objective is given by @[@ r_\theta ( \theta ) = h_\theta [ \theta , \hat{u} ( \theta ) ] + h_u [ \theta , \hat{u} ( \theta ) ] \hat{u}_\theta ( \theta ) @]@ Thus the derivative of @(@ r ( \theta ) @)@ can be computed using the derivative of @(@ \hat{u} ( \theta ) @)@ and the partials of @(@ h( \theta , u ) @)@. Let @(@ \partial_k @)@ denote the partial with respect to the k-th component of the combined vector @(@ ( \theta , u ) @)@. @[@ \partial_k [ h( \theta , u ) ] = \partial_k [ f( \theta , u ) ] + \frac{1}{2} \sum_{i=0}^{n-1} \sum_{j=0}^{n-1} f_{u,u} ( \theta , u )_{i,j}^{-1} \partial_k [ f_{u,u} ( \theta , u)_{i,j} ] @]@ where @(@ n @)@ is the number of random effects. Note that @(@ f_{u,u} ( \theta , u ) @)@ is often sparse and only non-zero components need be included in the summation. This is discussed in more detail near equation (8) in the reference . We also note that if @(@ k @)@ corresponds to a component of @(@ u @)@ then @[@ \partial_k ( f[ \theta , \hat{u} ( \theta ) ] ) = 0 @]@

Approximate Optimal Random Effects

First Order, U(beta, theta, u)
We define the function @[@ U ( \beta , \theta , u ) = u - f_{u,u} ( \theta , u )^{-1} f_u ( \beta , u ) @]@ It follows that @[@ U \left[ \theta , \theta , \hat{u} ( \theta ) \right] = \hat{u} ( \theta ) @]@ @[@ U_{\beta} [ \theta , \theta , \hat{u} ( \theta ) ] = \hat{u}_\theta ( \theta ) @]@

Second Order, W(beta, theta, u)
We define the function @[@ W ( \beta , \theta , u ) = U( \beta , \theta , u ) - f_{u,u} ( \theta , u )^{-1} f_u [ \beta , U( \beta , \theta , u) ] @]@ It follows that @[@ W \left[ \theta , \theta , \hat{u} ( \theta ) \right] = \hat{u} ( \theta ) @]@ @[@ W_{\beta} [ \theta , \theta , \hat{u} ( \theta ) ] = \hat{u}_\theta ( \theta ) @]@ and for random effects indices @(@ i @)@, @[@ W^i_{\beta \beta} [ \theta , \theta , \hat{u} ( \theta ) ] = \hat{u}^i_{\theta , \theta} ( \theta ) @]@

Approximate Laplace Objective, H(beta, theta, u)
Given these facts we define @[@ H( \beta , \theta , u) = + \frac{1}{2} \log \det f_{u,u} [ \beta, W( \beta , \theta , u) ] + f[ \beta, U( \beta , \theta , u) ] - \frac{n}{2} \log ( 2 \pi ) @]@ It follow that @[@ r_{\theta,\theta} ( \theta ) = H_{\beta,\beta} \left[ \theta , \theta , \hat{u} ( \theta ) \right] @]@

Approximate Random Constraint Function, B(beta, theta, u)
We also define the approximation random constraint function @[@ B( \beta , \theta , u) = A \; W( \beta , \theta , u ) @]@

Hessian of Laplace Objective
Note that the Hessian of the Laplace objective @(@ r_{\theta,\theta} ( \theta ) @)@ is required when quasi_fixed is false. In this case, the representation @[@ r_{\theta,\theta} ( \theta ) = H_{\beta,\beta} \left[ \theta , \theta , \hat{u} ( \theta ) \right] @]@ is used to compute this Hessian.

Hessian of Random Constraints
In the case where quasi_fixed is false we need to compute second derivatives of the random constraint function. We use @(@ A^i @)@ ( @(@ B^i @)@) to denote one of the rows of the random constraint matrix ( approximate random constraint function ). The Hessian of the random constraints can be computed using the formula @[@ \partial_\theta \partial_\theta [ A^i \; \hat{u} ( \theta ) ] = B^i_{\beta,\beta} \left[ \theta , \theta , \hat{u} ( \theta ) \right] @]@

Sparse Observed Information
Suppose that @(@ H @)@ is a sparse positive definite Hessian of a likelihood at the maximum likelihood estimate for its unknown parameters. The corresponding asymptotic covariance for posterior distribution of the parameters is normal with covariance @(@ H^{-1} @)@. A vector @(@ v @)@ with this covariance can be simulated as @[@ v = R w @]@ where @(@ R @)@ is defined by @(@ H^{-1} = R R^\R{T} @)@ and @(@ w @)@ is a normal with mean zero and the identity covariance. Suppose we have a sparse factorization of the form @[@ L D L^\R{T} = P H P^\R{T} @]@ where @(@ L @)@ is lower triangular, @(@ D @)@ is diagonal, and @(@ P @)@ is a permutation matrix. It follows that @[@ H = P^\R{T} L D L^\R{T} P @]@ @[@ H^{-1} = P^\R{T} L^{-\R{T}} D^{-1} L^{-1} P @]@ @[@ R = P^\R{T} L^{-\R{T}} D^{-1/2} @]@ @[@ v = P^\R{T} L^{-\R{T}} D^{-1/2} w @]@ If @(@ w @)@ is simulated as a normal random vector with mean zero and identity covariance, and @(@ v @)@ is computed using this formula, the mean of @(@ v @)@ is zero and its covariance is given by @[@ \B{E}[ v v^\R{T} ] = H^{-1} @]@

Input File: omh/theory.omh