Simulating Posterior Distribution for Model Variables

posterior

@(@\newcommand{\B}[1]{ {\bf #1} } \newcommand{\R}[1]{ {\rm #1} } \newcommand{\W}[1]{ \; #1 \; }@)@ This is dismod_at-20221105 documentation: Here is a link to its current documentation . Simulating Posterior Distribution for Model Variables
Purpose
Lemma 1
     Proof
     Remark
Lemma 2
     Remark
     Proof
Simulation

Purpose
The sample command with method equal to simulate fits simulated random measurements data_sim_table and simulated random priors prior_sim_table . The Lemmas on this page prove that, in the linear Gaussian case, this gives the proper statistics for the posterior distribution of the maximum likelihood estimate. Note that the dismod_at case is not linear nor are all the statistics Gaussian.

Lemma 1
Suppose we are given a matrix @(@ A \in \B{R}^{m \times n} @)@, a positive definite matrix, @(@ V \in \B{R}^{m \times m} @)@ and a @(@ y \in \B{R}^{m \times 1} @)@. Further suppose that there is an unknown vector @(@ \bar{\theta} \in \B{R}^{n \times 1} @)@ such that @(@ y \sim \B{N} ( A \bar{\theta} , V ) @)@. The maximum likelihood estimator @(@ \hat{\theta} @)@ for @(@ \bar{\theta} @)@ given @(@ y @)@ has mean @(@ \B{E} [ \hat{\theta} ] = \bar{\theta} @)@ and variance @[@ \B{E} [ ( \hat{\theta} - \bar{\theta} ) ( \hat{\theta} - \bar{\theta} )^\R{T} ] = ( A^\R{T} V^{-1} A )^{-1} = - \; \left( \frac{ \partial^2 } { \partial \theta^2 } \log \B{p} ( y | \theta ) \right)^{-1} @]@

Proof
The probability density for @(@ y @)@ given @(@ \theta @)@ is @[@ \B{p} ( y | \theta ) = \det ( 2 \pi V )^{-1/2} \exp \left[ - \frac{1}{2} ( y - A \theta )^\R{T} V^{-1} ( y - A \theta ) \right] @]@ Dropping the determinant term, because it does not depend on @(@ \theta @)@, and taking the negative log we get the objective @[@ f ( \theta ) = \frac{1}{2} ( y - A \theta )^\R{T} V^{-1} ( y - A \theta ) @]@ and the equivalent problem to minimize @(@ f ( \theta ) @)@ with respect to @(@ \theta \in \B{R}^{n \times 1} @)@. The derivative @(@ f^{(1)} ( \theta ) \in \B{R}^{1 \times n} @)@ is given by @[@ f^{(1)} ( \theta ) = - ( y - A \theta )^\R{T} V^{-1} A @]@ It follows that @[@ - \frac{ \partial^2 } { \partial \theta^2 } \log \B{p} ( y | \theta ) = f^{(2)} ( \theta ) = A^\R{T} V^{-1} A @]@ This completes the proof of the equation for the second partial of @(@ \B{p} ( y | \theta ) @)@ in the statement of the lemma.

The maximum likelihood estimate @(@ \hat{\theta} @)@ satisfies the equation @(@ f^{(1)} ( \hat{\theta} ) = 0 @)@; i.e., @[@ \begin{array}{rcl} 0 & = & - ( y - A \hat{\theta} )^\R{T} V^{-1} A \\ A^\R{T} V^{-1} y & = & A^\R{T} V^{-1} A \hat{\theta} \\ \hat{\theta} & = & ( A^\R{T} V^{-1} A )^{-1} A^\R{T} V^{-1} y \end{array} @]@ Defining @(@ e = y - A \bar{\theta} @)@, we have @(@ \B{E} [ e ] = 0 @)@ and @[@ \begin{array}{rcl} \hat{\theta} & = & ( A^\R{T} V^{-1} A )^{-1} A^\R{T} V^{-1} ( A \bar{\theta} + e ) \\ \hat{\theta} & = & \bar{\theta} + ( A^\R{T} V^{-1} A )^{-1} A^\R{T} V^{-1} e \end{array} @]@ This expresses the estimate @(@ \hat{\theta} @)@ as a deterministic function of the noise @(@ e @)@. It follows from the last equation for @(@ \hat{\theta} @)@ above, and the fact that @(@ \B{E} [ e ] = 0 @)@, that @(@ \B{E} [ \hat{\theta} ] = \bar{\theta} @)@. This completes the proof of the equation for the expected value of @(@ \hat{\theta} @)@ in the statement of the lemma.

It also follows, from the equation for @(@ \hat{\theta} @)@ above, that @[@ \begin{array}{rcl} ( \hat{\theta} - \bar{\theta} ) ( \hat{\theta} - \bar{\theta} )^\R{T} & = & ( A^\R{T} V^{-1} A )^{-1} A^\R{T} V^{-1} e e^\R{T} V^{-1} A ( A^\R{T} V^{-1} A )^{-1} \\ \B{E} [ ( \hat{\theta} - \bar{\theta} ) ( \hat{\theta} - \bar{\theta} )^\R{T} ] & = & ( A^\R{T} V^{-1} A )^{-1} \end{array} @]@ This completes the proof of the equation for the covariance of @(@ \hat{\theta} - \bar{\theta} @)@ in the statement of the lemma.

Remark
For the case in Lemma 1 , the second partial of @(@ \log \B{p} ( y | \theta ) @)@ with respect to @(@ \theta @)@ does not depend on @(@ \theta @)@ and @(@ A V^{-1} A @)@ is the information matrix.

Lemma 2
Suppose that in addition to all the information in Lemma 1 we have a matrix @(@ B \in \B{R}^{p \times n} @)@, a positive definite matrix, @(@ P \in \B{R}^{p \times p} @)@, and @(@ z \in \B{R}^{p \times 1} @)@ where we have independent prior information @(@ B \theta \sim \B{N}( z , P ) @)@. Further suppose @(@ B @)@ has rank @(@ n @)@. For this case we define @(@ \hat{\theta} @)@ as the maximizer of @(@ \B{p}( y | \theta ) \B{p}( \theta ) @)@. It follows that @[@ \B{E} [ ( \hat{\theta} - \B{E}[ \hat{\theta} ] ) ( \hat{\theta} - \B{E}[ \hat{\theta} ] )^\R{T} \prec ( A^\R{T} V^{-1} A + B^\R{T} P^{-1} B )^{-1} = - \; \left( \frac{ \partial^2 } { \partial \theta^2 } \log [ \B{p} ( y | \theta ) \B{p} ( \theta ) ] \right)^{-1} @]@ where @(@ \prec @)@ means less than in a positive definite matrix sense.

Remark
The posterior distribution for the maximum likelihood estimate, when including a prior, cannot be sampled by fitting simulated data alone. To see this, consider the case where column one of the matrix @(@ A @)@ is zero. In this case, that data @(@ y @)@ does not depend on @(@ \theta_1 @)@ and @(@ \hat{\theta}_1 = \bar{\theta}_1 @)@ no matter what the value of @(@ y @)@. On the other hand, the posterior distribution for @(@ \theta_1 @)@, for this case, is the same as its prior distribution and has uncertainty.

Proof
@[@ \B{p} ( y | \theta ) \B{p} ( \theta ) = \det ( 2 \pi V )^{-1/2} \det ( 2 \pi P )^{-1/2} \exp \left[ - \frac{1}{2} ( y - A \theta )^\R{T} V^{-1} ( y - A \theta ) - \frac{1}{2} ( B \theta - z )^\R{T} P^{-1} ( B \theta - z ) \right] @]@The derivative of the corresponding negative log likelihood is @[@ - ( y - A \theta )^\R{T} V^{-1} A + ( B \theta - z )^\R{T} P^{-1} B @]@ From this point, the proof of the equation for the second partial is very similar to Lemma 1 and left to the reader.

Setting the derivative to zero, we get the corresponding maximum likelihood estimate @(@ \hat{\theta} @)@ satisfies @[@ \begin{array}{rcl} ( y - A \hat{\theta} )^\R{T} V^{-1} A & = & ( B \hat{\theta} - z )^\R{T} P^{-1} B \\ y^\R{T} V^{-1} A + z^\R{T} P^{-1} B & = & \hat{\theta}^\R{T} A^\R{T} V^{-1} A + \hat{\theta}^\R{T} B^\R{T} P^{-1} B \\ \hat{\theta} & = & ( A^\R{T} V^{-1} A + B^\R{T} P^{-1} B )^{-1} ( A^\R{T} V^{-1} y + B^\R{T} P^{-1} z ) \\ \hat{\theta} & = & ( A^\R{T} V^{-1} A + B^\R{T} P^{-1} B )^{-1} ( A^\R{T} V^{-1} A \bar{\theta} + B^\R{T} P^{-1} z ) + ( A^\R{T} V^{-1} A + B^\R{T} P^{-1} B )^{-1} A^\R{T} V^{-1} e \end{array} @]@ The first term is deterministic and the second term is mean zero. It follows that @[@ \begin{array}{rcl} \B{E} [ \hat{\theta} ] & = & ( A^\R{T} V^{-1} A + B^\R{T} P^{-1} B )^{-1} ( A^\R{T} V^{-1} A \bar{\theta} + B^\R{T} P^{-1} z ) \\ \B{E} [ ( \hat{\theta} - \B{E} [ \hat{\theta} ] ) ( \hat{\theta} - \B{E} [ \hat{\theta} ] )^\R{T} ] & = & ( A^\R{T} V^{-1} A + B^\R{T} P^{-1} B )^{-1} A^\R{T} V^{-1} A ( A^\R{T} V^{-1} A + B^\R{T} P^{-1} B )^{-1} \end{array} @]@ Since the matrix @(@ B^\R{T} P^{-1} B @)@ is positive definite, we have @[@ A^\R{T} V^{-1} A \prec A^\R{T} V^{-1} A + B^\R{T} P^{-1} B @]@ Replacing @(@ A^\R{T} V^{-1} A @)@ by @(@ A^\R{T} V^{-1} A + B^\R{T} P^{-1} B @)@ in the center of the previous expression for the variance of @(@ \hat{\theta} @)@ we obtain @[@ \B{E} [ ( \hat{\theta} - \B{E} [ \hat{\theta} ] ) ( \hat{\theta} - \B{E} [ \hat{\theta} ] )^\R{T} ] \prec ( A^\R{T} V^{-1} A + B^\R{T} P^{-1} B )^{-1} @]@ This completes the proof of this lemma.

Simulation
Suppose we simulate date @(@ y \sim \B{N}( A \bar{\theta}, V) @)@ and independent prior values @(@ z \sim \B{N}( B \bar{\theta}, P) @)@ were @(@ A @)@, @(@ V @)@ are as in Lemma 1 and @(@ B @)@, @(@ P @)@ are as in Lemma 2 . Furthermore, we define @(@ \hat{\theta} @)@ as the maximizer of @[@ \B{p}( y, z | \theta ) = \B{p} ( y | \theta ) \B{p} ( z | \theta ) @]@ We define @(@ w \in \B{R}^{(m + n) \times 1} @)@, @(@ C \in \B{R}^{ (m + n) \times n} @)@, and @(@ U \in \B{R}^{ (m + n) \times (m + n)} @)@ by @[@ w = \left( \begin{array}{c} y \\ z \end{array} \right) \W{,} C = \left( \begin{array}{cc} A & 0 \\ 0 & B \end{array} \right) \W{,} U = \left( \begin{array}{cc} V & 0 \\ 0 & P \end{array} \right) @]@ We can now apply Lemma 1 with @(@ y @)@ replaced by @(@ w @)@, @(@ A @)@ replaced by @(@ C @)@ and @(@ V @)@ replaced by @(@ U @)@. It follows from Lemma 1 that @(@ \B{E} [ \hat{\theta} ] = \bar{\theta} @)@ and @[@ \B{E} [ ( \hat{\theta} - \bar{\theta} ) ( \hat{\theta} - \bar{\theta} )^\R{T} ] = ( A^\R{T} V^{-1} A + B^\R{T} P^{-1} B )^{-1} @]@ which is the proper posterior variance for the case in Lemma 2.

Input File: omh/model/posterior.omh