next up previous contents
: 7 Sequence formats : SAM (Sequence Alignment and : 5 File types   Ìܼ¡


6 Parameter specification

Parameter values are drawn from four sources: command line arguments, inserted parameter files, default parameter files, and the program itself. Initial models and regularizers cannot be specified on the command line. Several programs require certain parameters to be set; if they are not (or if parameters are specified incorrectly), a usage message is output to standard error.

Each parameter, including the initial model and regularizer, has a reasonable setting hardwired in the SAM code. These are the default values listed in Section 12. The default regularizer is actually two defaults, one for RNA or DNA, and the other for proteins.

These hardwired values can be overridden by a user-specific default file or command line specification. This file can be one of three alternatives. First, if the environment variable SAMRC is set, new default values are read from that file. Second, if the SAMRC variable was not set and a .samrc file exists in the current directory, that file is used as the default. Third, if SAMRC was not set and .samrc was not found in the current directory, $HOME/.samrc is checked.

Parameter files can cause other parameter files to be read using the insert directive. When this directive is used in a default such as .samrc, the inserted file is assumed to have defaults as well. Non-default files are specified on the command line as, for example,


buildmodel test -alphabet RNA -insert trna.init
In this case, the alphabet is set to RNA, and the file trna.init will override default parameters hardwired in the program or specified in one of the .samrc files. If the file contained, for example, the line alphabet DNA, the alphabet would be switched to DNA with an appropriate warning message. Values are set and insert files are read according to their position on the command line or within a file.

Command line arguments are evaluated in the order they are presented to the program. If one of the command line arguments specifies an inserted parameter file, that file is processed before the next command line argument. If one file inserts another, the inserted file is processed before completing the original file. Thus, to override values specified in an inserted file, insert the file first on the command line, and then specify the parameters to reset--the last specified values win.

It is often important to conditionally specify initialization information. In addition to the insert, three conditional insertion directives are also available: insert_file_dna, insert_file_rna, and insert_file_protein. These cause a file to be inserted if the alphabet matches the directive. If the alphabet is not yet set when one of these is encountered, and warning message is generated.

Two parameter names have abbreviated forms: i can be used in place of insert, and a can be used in place of alphabet. The following will set the alphabet to RNA and read in the file named parameters.


buildmodel test -a RNA -i parameters

The model output (such as test.mod, in the command line above) includes statistics about the run and a listing of all parameters that have been changed from their default values. Inserted file names are listed, but commented out, because their effect has been recorded in the list of all changed parameter values. Random number seeds created based on the pid are also commented out so that new seeds will be created if the program is rerun on the file.

Models are usually specified using the insert file (-i) command line argument. In this case, the model type (i.e., model, regularizer, frequency counts, or null model, discussed in Section 8.4) is read from the file. Alternatively, a model_file, regularizer_file, or nullmodel_file can be specified, in which case the very first model structure in that file (which could be a regularizer or frequency count model, for example) is read in. These file names will override any models present in the inserted files, even if the inserted file occurs after the model_file parameter on the command line. This option is particularly useful for discrimination training with positive and negative examples, in which case a model generated by the negative examples can be used as the null model. See Section 10.2.1.

There are three special paramater names, db, id, and not_id, that form lists of strings. That is, when multiple database or sequence identifiers are found on the command line or in a paramater file, they are added to the current list of databases or sequence identifiers, rather than replacing the previously-specified value.


next up previous contents
: 7 Sequence formats : SAM (Sequence Alignment and : 5 File types   Ìܼ¡
SAM
sam-info@cse.ucsc.edu
UCSC Computational Biology Group