@(@\newcommand{\B}[1]{ {\bf #1} }
\newcommand{\R}[1]{ {\rm #1} }
\newcommand{\W}[1]{ \; #1 \; }@)@This is dismod_at-20221105 documentation: Here is a link to its
current documentation
.
Goal
Use previous cumulative death data for a location to predict future
cumulative deaths.
Data File
The cumulative death data is the most reliable data available for the
Covid-19 epidemic. It is also the statistic we are most interested in predicting.
Our example data has the following columns:
day
day that this row of data corresponds to
death
cumulative covid_19 death as fraction of population
mobility
mobility shifted and scaled to be in [-1, 0.]
testing
testing shifted and scaled to be in [0., 1]
We assume that difference in the cumulative death data are independent; i.e.,
the amount added to the cumulative death is the actual measurement recorded
at the end of a time interval.
For the purpose of this example, we mimic a csv file with a string that
has this information.
ODE
We use @(@
Q
@)@ in place of @(@
S
@)@ in the SIR Model
to avoid confusion with the dismod meaning of @(@
S
@)@.
The SIR Model for an epidemic is the following differential equations:
@[@
\begin{array}{rcll}
\dot{Q} & = & - \beta Q I \\
\dot{I} & = & + \beta Q I & - ( \gamma + \chi ) I \\
\dot{R} & = & + \gamma I
\end{array}
@]@
@(@
Q(t)
@)@
susceptible fraction of the population
@(@
I(t)
@)@
infectious fraction of the population
@(@
R(t)
@)@
recovered fraction of the population
@(@
\beta(t)
@)@
infectious rate
@(@
\gamma(t)
@)@
recovery rate
@(@
\chi(t)
@)@
excess mortality rate (for this disease)
Data Model
The model for a measurement of the i-th difference in cumulative death is
@[@
d_i = e_i + \int_{a(i)}^{b(i)} \chi(t) I (t) dt
@]@
@(@
d_i
@)@
the i-th difference in cumulative death
@(@
e_i
@)@
noise in the i-th difference in cumulative death
@(@
a_i
@)@
start time for i-th difference in cumulative death
@(@
b_i
@)@
end time for i-th difference in cumulative death
Discussion
The model does not include births and deaths due to other causes, but this
would introduce even more unknowns.
Another problem with the model above is that it is not identifiable given just
measurements of cumulative death; i.e., there are two many unknown functions.
One approach to this problem is to assume we know
the rates @(@
\gamma(t)
@)@, @(@
\chi(t)
@)@ and
only estimate @(@
\beta(t)
@)@ and the initial conditions
@(@
Q(0)
@)@, @(@
I(0)
@)@, @(@
R(0)
@)@.
Another problem is that deaths are often under reported; e.g., they might only
be the deaths in the hospitals or for people who have been confirmed to have the
disease.
Dismod Model
The Dismod model for an epidemic is the following ODE:
@[@
\begin{array}{rcll}
\dot{S} & = & - \iota S & + \rho C \\
\dot{C} & = & + \iota S & - ( \rho + \chi ) C
\end{array}
@]@
together with the following cumulative death model
@[@
d_i = e_i + \int_{a(i)}^{b(i)} \chi(t) \frac{C(t)}{S(t) + C(t)} dt
@]@
Remission @(@
\rho
@)@ is @(@
\gamma
@)@ in the SIR model.
Prevalence @(@
C(t) / [ S(t) + C(t) ]
@)@
is to @(@
I(t)
@)@ in the SIR model.
Susceptible @(@
S(t)
@)@ is @(@
Q(t) + R(t)
@)@ in the SIR model.
Not all the members of S are susceptible to the disease; i.e., the members of R.
It would be nice to complete the immunity
wish list item so this would not be necessary.
However, because of under reporting of deaths, @(@
Q(t)
@)@ in the SIR model
also has members that are not susceptible to the disease.
Data
For this example, we get the cumulative death data and covariates from a
text string version of a CSV file with the following columns:
day
days since the first cumulative death value
death
cumulative death value for this day
mobility
a social mobility covariate
testing
level of testing for the disease covariate
The
mobility
covariate has been shifted and scaled to be in the
interval [-1, 0] with zero corresponding to normal mobility.
The
testing
covariate has been scaled to the interval [0, 1]
with zero corresponding to no testing.
Model Variables
There is only one location for this example, so there are no random effects.
If you had data from multiple locations you could use random effects to improve
the estimation.
Rates
The prior for the value of @(@
\iota(t)
@)@ is uniform with
lower limit 1e-6, upper limit 1.0, and mean 1e-3.
The mean is only used as a starting point and scaling point for the optimization.
The prior for the forward difference of @(@
\iota(t)
@)@ between grid points is
log Gaussian with mean zero, standard deviation 0.1 and the offset in the
log transformation is 1e-5. The corresponds to a difference of 10 percent between
days being a weighted residual of one.
The other rates are modeled as the following known constants:
@(@
\rho = 0.1
@)@ and @(@
\chi = 0.01
@)@.
Covariates
There are two covariate multipliers for this example, one for mobility
and the other for testing. These affect the rate @(@
\iota(t)
@)@.
They are both have one grid point (are constant in time) and
have a uniform prior with lower limit -1, upper limit +1, and mean zero.
It appears that the bounds on the covariate multipliers are active at the
solution; i.e., more effect would improve the objective; see the file
variable.csv
generated by the
db2csv command after the fit was done.
Predictions
To do predictions for the next week, we could fit a function to the previous
weeks baseline value of @(@
\iota(t)
@)@ (value without the covariates effects).
This gives us a prediction for the baseline value of @(@
\iota(t)
@)@
for the next week. We would then include some assumed covariate values for the next
and predict the differences in cumulative death for each day during the next week.
The subtitle point here is that the prior for @(@
\iota(t)
@)@ is a constant,
but the posterior is not. Actually @(@
\beta(t)
@)@ in the SIR model
is more likely to be constant and it might be better to fit the @(@
\beta(t)
@)@
that corresponds to the previous fit for @(@
I(t) = C(t) / [ S(t) + C(t) ]
@)@.
Display Fit
If this variable is true, some of the fit results will be printed and plotted: