Prev Next hold_out_command

@(@\newcommand{\B}[1]{ {\bf #1} } \newcommand{\R}[1]{ {\rm #1} } \newcommand{\W}[1]{ \; #1 \; }@)@This is dismod_at-20221105 documentation: Here is a link to its current documentation .
Hold Out Command: Randomly Sub-sample The Data

Syntax
Purpose
database
integrand_name
max_fit
cov_name
cov_value_1
cov_value_2
Balancing
     Child Nodes
     Covariates
data_subset_table
Example

Syntax
dismod_at database hold_out integrand_name max_fit
dismod_at database hold_out integrand_name max_fit cov_name cov_value_1 cov_value_2

Purpose
This command is used to set a maximum number of data values that are included in subsequent fits. It is intended to make the initialization and fitting faster. The random sample of which values to include can be made repeatable using random_seed .

database
Is an http://www.sqlite.org/sqlite/ database containing the dismod_at input tables which are not modified.

integrand_name
This is the integrand that we are sub-sampling.

max_fit
This is the maximum number of data points to fit for the specified integrand; i.e., the maximum number that are not held out. If for this integrand there are more than mas_fit points with hold_out zero in the data table, points are randomly held out so that there are max_fit points fit for this integrand.

cov_name
If this argument is present, it specifies a covariate column that will be balanced; see covariate balancing below:

cov_value_1
If this argument is present, it specifies one of the covariate values for the balancing. This is a string representation of a double value.

cov_value_2
If this argument is present, it specifies the opposite covariate value for the balancing. This is a string representation of a double value.

Balancing

Child Nodes
The choice of which points to include in the fit tries to sample the same number of data points from each of the child nodes (and the parent node). If there are not sufficiently many data for one of these nodes, the others make up the difference.

Covariates
If cov_name is present, any sample that has the covariate value cov_value_1 or cov_value_2 will be paired with a sample from the opposite value (if possible).

data_subset_table
Only rows of the data_subset_table that correspond to this integrand are modified. The hold_out is set one (zero) if the corresponding data is (is not) selected for hold out. Only points that have hold_out zero in the data table can have hold_out non-zero in the data_subset table. See the fit command hold_out documentation.

Example
The files user_hold_out_1.py and user_hold_out_2.py contain examples and tests using this command.
Input File: devel/cmd/hold_out_command.cpp