Parameter values are drawn from four sources: command line arguments, inserted parameter files, default parameter files, and the program itself. Initial models and regularizers cannot be specified on the command line. Several programs require certain parameters to be set; if they are not (or if parameters are specified incorrectly), a usage message is output to standard error.
Each parameter, including the initial model and regularizer, has a reasonable setting hardwired in the SAM code. These are the default values listed in Section 12. The default regularizer is actually two defaults, one for RNA or DNA, and the other for proteins.
These hardwired values can be overridden by a user-specific default
file or command line specification. This file can be one of three
alternatives. First, if the environment variable SAMRC is set, new
default values are read from that file. Second, if the SAMRC variable
was not set and a .samrc file exists in the current directory,
that file is used as the default. Third, if SAMRC was not set and
.samrc was not found in the current directory,
$HOME/.samrc is checked.
Parameter files can cause other parameter files to be read using the
insert directive. When this directive is used in a
default such as .samrc, the inserted file is assumed to have
defaults as well. Non-default files are specified on the command line
as, for example,
buildmodel test -alphabet RNA -insert trna.initIn this case, the alphabet is set to RNA, and the file
trna.init will override default parameters hardwired in the
program or specified in one of the .samrc files. If the file
contained, for example, the line alphabet DNA, the alphabet
would be switched to DNA with an appropriate warning message. Values
are set and insert files are read according to their position on the
command line or within a file.
Command line arguments are evaluated in the order they are presented to the program. If one of the command line arguments specifies an inserted parameter file, that file is processed before the next command line argument. If one file inserts another, the inserted file is processed before completing the original file. Thus, to override values specified in an inserted file, insert the file first on the command line, and then specify the parameters to reset--the last specified values win.
It is often important to conditionally specify initialization information. In addition to the insert, three conditional insertion directives are also available: insert_file_dna, insert_file_rna, and insert_file_protein. These cause a file to be inserted if the alphabet matches the directive. If the alphabet is not yet set when one of these is encountered, and warning message is generated.
Two parameter names have abbreviated forms: i can be used in place of insert, and a can be used in place of alphabet. The following will set the alphabet to RNA and read in the file named parameters.
buildmodel test -a RNA -i parameters
The model output (such as test.mod, in the command line above) includes statistics about the run and a listing of all parameters that have been changed from their default values. Inserted file names are listed, but commented out, because their effect has been recorded in the list of all changed parameter values. Random number seeds created based on the pid are also commented out so that new seeds will be created if the program is rerun on the file.
Models are usually specified using the insert file (-i)
command line argument. In this case, the model type (i.e., model,
regularizer, frequency counts, or null model, discussed in
Section 8.4) is read from the file. Alternatively,
a model_file, regularizer_file, or nullmodel_file can be specified, in which case the very first model
structure in that file (which could be a regularizer or frequency
count model, for example) is read in. These file names will override
any models present in the inserted files, even if the inserted file
occurs after the model_file parameter on the command line.
This option is particularly useful for discrimination training with
positive and negative examples, in which case a model generated by the
negative examples can be used as the null model. See Section 10.2.1.
There are three special paramater names, db, id, and not_id, that form lists of strings. That is, when multiple database or sequence identifiers are found on the command line or in a paramater file, they are added to the current list of databases or sequence identifiers, rather than replacing the previously-specified value.