The Data Table

Processing math: 0%

$\newcommand{\B}[1]{ {\bf #1} } \newcommand{\R}[1]{ {\rm #1} } \newcommand{\W}[1]{ \; #1 \; }$ This is dismod_at-20221105 documentation: Here is a link to its current documentation . The Data Table
Discussion
data_id
data_name
integrand_id
density_id
     Nonsmooth
node_id
     Parent Data
     Child Data
subgroup_id
     group_id
     Nonsmooth
weight_id
     null
hold_out
meas_value
meas_std
eta
     null
nu
     null
age_lower
age_upper
time_lower
time_upper
Covariates
     Null
Example

Discussion
Each row of the data table corresponds to one measurement; see meas_value below.

data_id
This column has type integer and is the primary key for the data table. Its initial value is zero, and it increments by one for each row.

data_name
This column has type text and has a different value for every row; i.e., the names are unique and can act as substitutes for the primary key. The names are intended to be easier for a human to remember than the ids.

integrand_id
This column has type integer and is the integrand_id that identifies the integrand for this measurement.

density_id
This column has type integer and is the density_id that identifies the density function for the measurement nose. The density_name corresponding to density_id cannot be uniform. (Use hold_out to ignore data during fitting.) This density may be replaced using the data_density_command .

Nonsmooth
If the density is nonsmooth , the average_integrand cannot depend on any of the random effects. For example, if the node_id is the parent_node_id , the average integrand does not depend on the random effects. Also, if the node corresponding to a child that has all its random effects constrained, the average integrand does not depend on the random effects. Each nonsmooth data point adds a hidden variable to the optimization problem (that is the max of the residual and its negative). Having a lot of these variables slows down the optimization.

node_id
This column has type integer and is the node_id that identifies the node for this measurement.

Parent Data
If the node_id is the parent_node_id , this data will be associated with the parent node and not have any random effects in its model.

Child Data
If the node_id is a child of the parent node, or a descendant of a child, the data will be associated with the random effects for that child. In this case density_id cannot correspond to laplace or log_laplace . The corresponding densities would not be differentiable at zero and the Laplace approximation would not make sense in this case.

subgroup_id
This column has type integer and is the subgroup_id that this data point corresponds to.

group_id
The automatically is a group_id corresponding to the subgroup_id even though the group_id does not appear in the data file (if it does appear, it will not be used).

Nonsmooth
If the density is nonsmooth , the subgroup_smooth_id corresponding to subgroup_id must be null.

weight_id
This column has type integer and is the weight_id that identifies the weighting used for this measurement. If weight_id is nu

null
If weight_id is null, the constant weighting is used for this data point.

hold_out
This column has type integer and has value zero or one. Only the rows where hold_out is zero are included in the objective optimized during a fit_command . See the fit command hold_out documentation.

meas_value
This column has type real and is the measured value for each row of the data table; i.e., the measurement of the integrand, node, etc.

meas_std
This column has type real, has same units at the data, and must be a positive number. This is not the only contribution to the standard deviation used in the data likelihood; see minimum cv standard deviation

$\Delta$ , transformed standard deviation

$\sigma$ , and adjusted standard deviation

$\delta ( \theta )$ .

eta
This column has type real. If density_id corresponds to a log scaled density , eta must be greater than or equal zero and is the offset in the log transformation for this data point; see log scaled case definition of the weighted residual function . This offset may be replaced using the data_density_command .

null
If density_id does not correspond to log_gaussian, log_laplace, or log_students, eta can be null.

nu
This column has type real. If density_id corresponds to students or log_students, nu must be greater than two and is number of degrees of freedom in the distribution for this point; see the definition of the log-density for Student's-t and Log-Student's-t . The degrees of freedom may be replaced using the data_density_command .

null
If density_id does not correspond to students or log_students, nu can be null.

age_lower
This column has type real and is the lower age limit for this measurement. It must be greater than or equal the minimum age_table value.

age_upper
This column has type real and is the upper age limit for this measurement. It must be greater than or equal the corresponding age_lower and less than or equal the maximum age_table value.

time_lower
This column has type real and is the lower time limit for this measurement. It must be greater than or equal the minimum time_table value.

time_upper
This column has type real and is the upper time limit for this measurement. It must be greater than or equal the corresponding time_lower and less than or equal the maximum time_table value.

Covariates
The covariate columns have type real and column names that begin with the two characters x_. For each valid covariate_id , column x_covariate_id contains the value, for this measurement, of the covariate specified by covariate_id .

Null
The covariate value null is interpreted as the reference value for the corresponding covariate.

Example
The file data_table.py create example data tables.

Input File: omh/table/data_table.omh