Create Csv Files that Summarize The Database

@(@\newcommand{\B}[1]{ {\bf #1} } \newcommand{\R}[1]{ {\rm #1} } \newcommand{\W}[1]{ \; #1 \; }@)@ This is dismod_at-20221105 documentation: Here is a link to its current documentation . Create Csv Files that Summarize The Database
Syntax
     As Program
     As Python Function
Convention
database
dir
fit_var, fit_data_subset
simulate_index
option.csv
log.csv
age_avg.csv
hes_fixed.csv
hes_random.csv
trace_fixed.csv
mixed_info.csv
variable.csv
     var_id
     var_type
     s_id
     m_id
     m_diff
     bound
     age
     time
     rate
     integrand
     covariate
     node
     group
     subgroup
     fixed
     depend
     fit_value
     start
     scale
     truth
     sam_avg
     sam_std
     res_value
     res_dage
     res_dtime
     lag_value
     lag_dage
     lag_dtime
     sim_v, sim_a, sim_t
     prior_info
data.csv
     data_id
     data_extra_columns
     child
     node
     group
     subgroup
     integrand
     weight
     age_lo
     age_up
     time_lo
     time_up
     d_out
     s_out
     density
     eta
     nu
     meas_std
     meas_stdcv
     meas_delta
     meas_value
     avgint
     residual
     sim_value
     Covariates
predict.csv
     avgint_id
     avgint_extra_columns
     s_index
     avgint
     age_lo
     age_up
     time_lo
     time_up
     integrand
     weight
     node
     group
     subgroup
     Covariates
Example
ihme_db.sh

Syntax

As Program
dismodat.py database db2csv

As Python Function
dismod_at.db2csv_command( database )

Convention
The null value in the database corresponds to an empty string in the csv files.

database
is the path from the currently directory to the database. This must be a dismod_at and the init_command must have been run on the database.

dir
We use the notation dir for the directory where database is located.

fit_var, fit_data_subset
The log_table is used to determine if the previous fit command had a simulate_index . If so, the fit_var_table and fit_data_subset_table corresponds to simulated data. Otherwise, if they exist, the correspond to the measured data.

simulate_index
If the previous fit command had a simulate_index that value is used for simulate_index below. Otherwise, zero is used for simulate_index below.

option.csv
The file dir/option.csv is written by this command. It is a CSV file with one row for each possible row in the option_table . The columns in option.csv are option_name and option_value . If a row does not appear in the option table, the corresponding default value is written to option.csv. If the parent_node_id appears in the option table, the parent_node_name row of option.csv is filled in with the corresponding node name.

log.csv
The file dir/log.csv is written by this command. It is a CSV file with one row for each message in the log_table . The columns in this table are message_type , table_name , row_id , unix_time , and message .

age_avg.csv
The file dir/age_avg.csv is written by this command. It is a CSV file with the contents of the age_avg table. The only column in this table is age . Note that a set_command may change the value of ode_step_size or age_avg_split but it will not write out the new age_avg table.

hes_fixed.csv
If the asymptotic sample command was executed, the contents of the hes_fixed_table are written to the CSV file dir/hes_fixed.csv . The columns in this table are row_var_id , col_var_id , hes_fixed_value .

hes_random.csv
If a fit both , fit random , or sample asymptotic command was executed, the contents of the hes_random_table are written to the CSV file dir/hes_random.csv . The columns in this table are row_var_id , col_var_id , hes_random_value .

trace_fixed.csv
If the fit fixed or fit both command has completed, the contents of the trace_fixed_table are written to the CSV file dir/trace_fixed.csv . The columns in this table have the same name as in the corresponding table with the exception that the column regularization_size is called reg_size .

mixed_info.csv
If the fit_command completed the contents of the mixed_info_table are written to the CSV file dir/mixed_info.csv .

variable.csv
The file dir/variable.csv is written by this command. It is a CSV file with one row for each of the model_variables and has the following columns:

var_id
is the var_id .

var_type
is the var_type .

s_id
is the smooth_id for this variable. If the variable is a smoothing standard deviation multiplier this is the smoothing that this multiplier effects. Otherwise, it is the smoothing where the prior for this variable comes from.

m_id
If this variable is a covariate multiplier, this is the corresponding mulcov_id .

m_diff
If this variable is a covariate multiplier, this is the corresponding max_cov_diff .

bound
If the upper and lower value limits in the value prior for this variable are not equal, this is a bound for the absolute value of this variable; see max_mulcov and bound_random .

age
is the age .

time
is the time .

rate
is the rate_name .

integrand
is the integrand_name .

covariate
is the covariate_name .

node
is the node_name .

group
This field is non-empty for group covariate multipliers .

subgroup
This field is non-empty for subgroup covariate multipliers .

fixed
is true if this variable is a fixed effect , otherwise it is false.

depend
If the depend_var_table exists, this has one of the following: none if neither the data nor the prior depends on this variable, data if only the data depends on this variable, prior if only the prior depends on this variable, both if both the data and the prior depend on this variable.

fit_value
If the fit_command has been run, this is the fit_var_value .

start
is the start_var_value for this variable.

scale
is the scale_var_value for this variable.

truth
If the truth_var table exists, this is the truth_var_value for this variable.

sam_avg
If the sample table exists, for each var_id this is the average with respect to with respect to sample_index of the var_value corresponding to this var_id .

sam_std
If the sample table exists, for each fixed var_id this is the estimated standard deviation with respect to with respect to sample_index of the # var_value corresponding to this var_id . If there is only one sample_index in the sample table, this column is empty because the standard deviation cannot be estimated from one sample.

res_value
If the fit_command has been run, this is the residual_value .

res_dage
If the fit_command has been run, this is the residual_dage ; see fit_var above.

res_dtime
If the fit_command has been run, this is the residual_dtime ; see fit_var above.

lag_value
If the fit_command has been run, this is the lagrange_value ; see fit_var above.

lag_dage
If the fit_command has been run, this is the lagrange_dage ; see fit_var above.

lag_dtime
If the fit_command has been run, this is the lagrange_dtime ; see fit_var above.

sim_v, sim_a, sim_t
If the simulate_command has been run, these are the values of prior_sim_value , prior_sim_dage , and prior_sim_dtime , for the simulate_index .

prior_info
There is a column named



 field_character

for character equal to v, a and t and for field equal to mean , lower , upper , std , eta , nu and density .

The character v denotes this is the prior information for a value, a the prior information for an age difference, and t the prior information for a time difference.
The density has been mapped to the corresponding density_name .
If the corresponding value_prior_id is null, the const_value prior is displayed.
If is null, or has no affect, it is displayed as empty. Note that the fields eta_v are always displayed for fixed effects because they have a scaling affect.

data.csv
The file dir/data.csv is written by this command. It is a CSV file with one row for each row in the data_subset_table and has the following columns:

data_id
is the data table data_id .

data_extra_columns
Each column specified by the data_extra_columns option is included in the data.csv file.

child
If this data row is associated with a child, this is the name of the child. Otherwise, this data is associated with the parent node .

node
is the node_name for this data row. This will correspond directly to the data table node_id .

group
is the group_name corresponding to the subgroup for this data row.

subgroup
is the subgroup_name for this data row. This will correspond directly to the data table subgroup_id .

integrand
is the integrand table integrand_name .

weight
is the weight_name .

age_lo
is the lower age used in the fits; i.e., the data table age_lower modified by the age compression interval in the compress_interval option.

age_up
is the upper age used in the fits; i.e., the data table age_upper modified by the age compression interval.

time_lo
is the lower time used in the fits; i.e., the data table time_lower modified by the time compression interval.

time_up
is the upper time used in the fits; i.e., the data table time_upper modified by the time compression interval.

d_out
is the value of hold_out in the data table.

s_out
is the value of hold_out in the data_subset table.

density
is the density_name for data_subset table density_id for this row.

eta
is the data_subset table eta for this row.

nu
is the data_subset table nu for this row.

meas_std
is the data table meas_std .

meas_stdcv
is the minimum cv standard deviation used to define the likelihood; see Delta .

meas_delta
If the previous fit command had a simulate_index , this column is empty. We use delta to denote the adjusted standard deviation for this row. If the density for this row is linear



 meas_delta = delta

Otherwise, the density is log scaled and



 delta = log(meas_value + eta + meas_delta) - log(meas_value + eta)

The value delta is computed by dividing by the residual, which is plus infinity and not valid when the residual is zero. This value is reported as empty if the calculation for meas_delta is greater than the maximum python float value.

meas_value
is the data table meas_value .

avgint
If the fit_command has been run, this is the avg_integrand for this row.

residual
If the fit_command has been run, this is the weighted_residual for this row; see fit_data_subset above.

sim_value
If the simulate_command has been run, this is the data_sim_value for this data_id and simulate_index in the previous fit command. If there is no simulate_index in the previous fit command, the value zero is used for the simulate_index .

Covariates
For each covariate in the covariate_table there is a column with the corresponding covariate_name . For each covariate column and measurement row, the value in the covariate column is covariate value for this measurement minus the reference value for this covariate, i.e., the corresponding covariate difference x_ij in the model for the average integrand.

predict.csv
If the predict_command has was executed, the CSV file dir/predict.csv is written. For each row of the predict_table there is a corresponding row in predict.csv.

avgint_id
is the avgint table avgint_id .

avgint_extra_columns
Each column specified by the avgint_extra_columns option is included in the predict.csv file.

s_index
This identifies the set model variables corresponding to the last predict_command executed. If the source for the predict command was sample , the model variables correspond to the rows on the sample table with the same sample_index equal to s_index . Otherwise, s_index is empty and the model variables correspond to the fit_var or truth_var table depending on the source for the last predict command executed.

avgint
is the average integrand @(@ A_i(u, \theta) @)@. The model variables @(@ (u, \theta) @)@ correspond to the s_index , and measurement subscript @(@ i @)@ denotes to the avgint_table information for this row of predict.csv; i.e., age_lo , age_up , ...

age_lo
is the avgint table age_lower .

age_up
is the avgint table age_upper .

time_lo
is the avgint table time_lower .

time_up
is the avgint table time_upper .

integrand
is the avgint table integrand_name .

weight
is the weight_name for this row.

node
is the node_name for this row.

group
is the group_name corresponding to the subgroup for this data row.

subgroup
is the subgroup_name for this data row. This will correspond directly to the avgint table subgroup_id .

Covariates
For each covariate in the covariate_table there is a column with the corresponding covariate_name . For each covariate column and measurement row, the value in the covariate column is covariate value in the avgint_table minus the reference value for this covariate. i.e., the corresponding covariate difference x_ij in the model for the average integrand.

Example
The file db2csv_command.py contains an example and test using this command.

ihme_db.sh
The script ihme_db.sh can be used to run db2csv for a dismod_at database on the IHME cluster.

Input File: python/dismod_at/db2csv_command.py