Though Improbizer was originally designed as a Web CGI program, on large data sets it takes long enough to run that it is sometimes necessary to run it from the command line. The executable for the Improbizer is called ameme. The same executable is used for both the Improbizer and the Motif Matcher web pages. The format of the command line reflects Improbizer's origins. In general command line options are of the form:
cgiVar=someValue
The program writes out an html file to standard output, and also creates a gif file and a temporary file with the suffix .pfl. Normally you'll want to redirect the output to a file, and then view that file using the "Open File" or "Open" option of your web browser. A simple example of a command line would be:
ameme good=foreground.fa bad=background.fa numMotifs=4 >motifs.html
To invoke Motif Matcher rather than Improbizer include
motifMatcher=on
in the command line. A simple example of a Motif Matcher command line would be
ameme motifMatcher=on seqFile=data.fa motifs=motifFile maxOcc=3 >some.html
The order of arguments in the command line does not matter. Below is a table which lists all of the current command line options for Improbizer, whether they are required, and their default values. Following this is the corresponding table for Match Maker
Var. Name | Description |
good | Name of foreground sequence file. Required. A fasta (.fa) format file containing DNA or RNA sequences that you
suspect share a motif or three. Example: good=immunePromoters.fa |
bad | Background sequence file name. Highly recommended especially for 1st and 2nd order Markov background models. This
file should ideally contain a large number of sequences in most ways like the "good" sequences, but not
the motif you're looking for (or at least not high levels of the motif you're looking for). If you don't use this
the background model will be created from the foreground sequence. Example: bad=mousePromoters.fa |
ignoreLocation | Controls whether the position of a motif is considered important. By default position is considered. To change this include in the command line: ignoreLocation=on |
numMotifs | The number of motifs to looks for. By default this is 2. To only look for one do: numMotifs=1 |
maxOcc | The maximum number of times you expect a single motif to occur in a sequence. Default is 1. |
rcToo | Set rcToo=on if you want to search both strands for motifs. |
tileSize | This sets the initial size (in nucleotides) of a motif. Generally motifs will grow and shrink to fit the data, but if you have some idea of the size you expect it can help to set this explicitly with something like: tileSize=13. By default tileSize is 7. |
constrainer | This controls the tendency of the motif size to grow. Set constrainer=1000 if you wish the motif to stay at tileSize. Set to zero for unconstrained growth (which is often not a bad thing on large data sets). The default value of 1.0 mildly constrains motif size. |
leftAlign | If your sequences aren't all the same size the shorter ones are padded so that the right ends all line up. If you set leftAlign=on then instead they'll be padded so that the left ends all line up. |
startScanLimit | This sets how many sequences are scanned for initial motifs. By default it is 20. Doubling this to 40 with make the program take nearly twice as long to run, but occassionally will result in a better motif. |
background | This controls the background (null) model. Possible values are: even - each base has a 25% chance m0 - (Markov 0) Base probability depends on how many of that base are in background. m1 - (Markov 1) Base probability depends on base before. m2 - (Markov 2) Base probability depends on previous two bases. coding - Three interleaved Markov 2 models, one for each frame of codon. By default background=m0 |
motifOutput | In addition to the .html and .gif files, program will create a simple text file containing the motifs if this is
set. Example: motifOutput=splicingMotifs.txt |
controlRun | If you include controlRun=on in the command line, a random set of sequences will be generated that match your foreground data set in size, and your background data set in nucleotide probabilities. The program will then look for motifs in this random set. If the scores you get in a real run are about the same as those you get in a control run, then the motifs Improbizer has found are probably not significant. |
html | Where to put html output (by default goes to standard output). Example html=run1.html. |
gif | Where to put gif output (by default goes to a cryptically named file). Example gif=run1.gif. |
Var. Name | Description |
motifMatcher | Tells program to just search for a predefined motif in the input sequences rather than to find a motif. Required
in essence for program to behave as Motif Matcher rather than as Improbizer. Example: motifMatcher=on |
motifs | A file containing the motifs. This can be either a file you've gotten from using the motifOutput option with Improbizer, or any file containing one or more motifs as described in the Motif Matcher help. |
hits | Where to put motif hits in a simple tab-delimited format.
The columns are: motif# score sequence position Example: hits=sl1.txt |
good | Fasta format file containing sequences to scan for motifs. Example: good=immunePromoters.fa |
bad | Background sequence file name. Highly recommended especially for 1st and 2nd order Markov background models. This
file should ideally contain a large number of sequences in most ways like the "good" sequences, but not
the motif you're looking for (or at least not high levels of the motif you're looking for). If you don't use this
the background model will be created from the foreground sequence. Example: bad=mousePromoters.fa |
background | This controls the background (null) model. Possible values are: even - each base has a 25% chance m0 - (Markov 0) Base probability depends on how many of that base are in background. m1 - (Markov 1) Base probability depends on base before. m2 - (Markov 2) Base probability depends on previous two bases. coding - Three interleaved Markov 2 models, one for each frame of codon. By default background=m0 |
ignoreLocation | Controls whether the position of a motif is considered important. By default position is considered. To change this include in the command line: ignoreLocation=on |
maxOcc | The maximum number of times you expect a single motif to occur in a sequence. Default is 1. |
rcToo | Set rcToo=on if you want to search both strands for motifs. |
html | Where to put html output (by default goes to standard output). Example html=run1.html. |
gif | Where to put gif output (by default goes to a cryptically named file). Example gif=run1.gif. |