|
HMMER
User's Guide
|
|
Dept. of Genetics |
WashU |
Medical School |
Sequencing Center |
CGM |
IBC|
|
Eddy lab |
Internal (lab only) |
HMMER |
PFAM |
tRNAscan-SE |
Software |
Publications
|
Next: hmmcalibrate - calibrate HMM
Up: Manual pages
Previous: hmmalign - align sequences
Subsections
hmmbuild
[options] hmmfile alignfile
hmmbuild reads a multiple
sequence alignment file alignfile , builds a new profile HMM, and saves
the HMM in hmmfile.
alignfile may be in ClustalW, GCG MSF, or SELEX alignment
format.
By default, the model is configured to find one or more nonoverlapping
alignments to the complete model. This is analogous to the behavior of
the hmmls program of HMMER 1. To configure the model for a single global
alignment, use the -g option; to configure the model for multiple local
alignments a la the old program hmmfs, use the -f option; and to configure
the model for a single local alignment (a la standard Smith/Waterman,
or the old hmmsw program), use the -s option.
- [-f ] Configure the
model for finding multiple domains per sequence, where each domain can
be a local (fragmentary) alignment. This is analogous to the old hmmfs
program of HMMER 1.
- [-g ] Configure the model for finding a single global
alignment to a target sequence, analogous to the standard Needleman/Wunsch
algorithm or the old hmms program of HMMER 1.
- [-h ] Print brief help; includes
version number and summary of all options, including expert options.
- [-n
<s> ] Name this HMM <s>. <s> can be any string of non-whitespace characters (e.g.
one "word"). There is no length limit (at least not one imposed by HMMER;
your shell will complain about command line lengths first).
- [-o <file> ] Re-save
the starting alignment to <file>, in SELEX format. The columns which were
assigned to match states will be marked with x's in an #=RF annotation
line. If either the -hand or -fast construction options were chosen, the
alignment may have been slightly altered to be compatible with Plan 7
transitions, so saving the final alignment and comparing to the starting
alignment can let you view these alterations. See the User's Guide for more
information on this arcane side effect.
- [-s ] Configure the model for finding
a single local alignment per target sequence. This is analogous to the
standard Smith/Waterman algorithm or the hmmsw program of HMMER 1.
- [-A
] Append this model to an existing hmmfile rather than creating hmmfile.
Useful for building HMM libraries (like Pfam).
- [-F ] Force overwriting of
an existing hmmfile. Otherwise HMMER will refuse to clobber your existing
HMM files, for safety's sake.
- [-amino ] Force the sequence
alignment to be interpreted as amino acid sequences. Normally HMMER autodetects
whether the alignment is protein or DNA, but sometimes alignments are
so small that autodetection is ambiguous. See -nucleic.
- [-archpri <x> ] Set the
"architecture prior" used by MAP architecture construction to <x>, where
<x> is a probability between 0 and 1. This parameter governs a geometric
prior distribution over model lengths. As <x> increases, longer models are
favored a priori. As <x> decreases, it takes more residue conservation in
a column to make a column a "consensus" match column in the model architecture.
The 0.85 default has been chosen empirically as a reasonable setting.
- [-binary
] Write the HMM to hmmfile in HMMER binary format instead of readable ASCII
text.
- [-cfile <file> ] Save the observed emission and transition counts to
<file> after the architecture has been determined (e.g. after residues/gaps
have been assigned to match, delete, and insert states). This option is
used in HMMER development for generating data files useful for training
new Dirichlet priors. The format of count files is documented in the User's
Guide.
- [-fast ] Quickly and heuristically determine the architecture of the
model by assigning all columns will more than a certain fraction of gap
characters to insert states. By default this fraction is 0.5, and it can
be changed using the -gapmax option. The default construction algorithm
is a maximum a posteriori (MAP) algorithm, which is slower.
- [-gapmax <x>
] Controls the -fast model construction algorithm, but if -fast is not
being used, has no effect. If a column has more than a fraction <x> of gap
symbols in it, it gets assigned to an insert column. <x> is a frequency
from 0 to 1, and by default is set to 0.5. Higher values of <x> mean more
columns get assigned to consensus, and models get longer; smaller values
of <x> mean fewer columns get assigned to consensus, and models get smaller.
<x>
- [-hand ] Specify the architecture of the model by hand: the alignment
file must be in SELEX format, and the #=RF annotation line is used to
specify the architecture. Any column marked with a non-gap symbol (such
as an 'x', for instance) is assigned as a consensus (match) column in the
model.
- [-idlevel <x> ] Controls both the determination of effective sequence
number and the behavior of the -wblosum weighting option. The sequence
alignment is clustered by percent identity, and the number of clusters
at a cutoff threshold of <x> is used to determine the effective sequence
number. Higher values of <x> give more clusters and higher effective sequence
numbers; lower values of <x> give fewer clusters and lower effective sequence
numbers. <x> is a fraction from 0 to 1, and by default is set to 0.62 (corresponding
to the clustering level used in constructing the BLOSUM62 substitution
matrix).
- [-noeff ] Turn off the effective sequence number calculation, and
use the true number of sequences instead. This will usually reduce the
sensitivity of the final model (so don't do it without good reason!)
- [-nucleic
] Force the alignment to be interpreted as nucleic acid sequence, either
RNA or DNA. Normally HMMER autodetects whether the alignment is protein
or DNA, but sometimes alignments are so small that autodetection is ambiguous.
See -amino.
- [-null <file> ] Read a null model from <file>. The default for protein
is to use average amino acid frequencies from Swissprot 34 and p1 = 350/351;
for nucleic acid, the default is to use 0.25 for each base and p1 = 1000/1001.
For documentation of the format of the null model file and further explanation
of how the null model is used, see the User's Guide.
- [-pam <file> ] Apply a
heuristic PAM- (substitution matrix-) based prior instead of the default
mixture Dirichlet. The substitution matrix is read from <file>. See -pamwgt.
- [-pamwgt <x> ] Controls the weight on a PAM-based prior. Only has effect if
-pam option is also in use. <x> is a positive real number, 20.0 by default.
<x> is the number of "pseudocounts" contriubuted by the heuristic prior.
Very high values of <x> can force a scoring system that is entirely driven
by the substitution matrix, making HMMER somewhat approximate Gribskov
profiles.
- [-prior <file> ] Read a Dirichlet prior from <file>, replacing the
default mixture Dirichlet. The format of prior files is documented in the
User's Guide, and an example is given in the Demos directory of the HMMER
distribution.
- [-swentry <x> ] Controls the total probability that is distributed
to local entries into the model, versus starting at the beginning of the
model as in a global alignment. <x> is a probability from 0 to 1, and by
default is set to 0.5. Higher values of <x> mean that hits that are fragments
on their left (N or 5'-terminal) side will be penalized less, but complete
global alignments will be penalized more. Lower values of <x> mean that fragments
on the left will be penalized more, and global alignments on this side
will be favored. This option only affects the configurations that allow
local alignments, e.g. -s and -f; unless one of these options is also activated,
this option has no effect. You have independent control over local/global
alignment behavior for the N/C (5'/3') termini of your target sequences
using -swentry and -swexit.
- [-swexit <x> ] Controls the total probability that
is distributed to local exits from the model, versus ending an alignment
at the end of the model as in a global alignment. <x> is a probability from
0 to 1, and by default is set to 0.5. Higher values of <x> mean that hits
that are fragments on their right (C or 3'-terminal) side will be penalized
less, but complete global alignments will be penalized more. Lower values
of <x> mean that fragments on the right will be penalized more, and global
alignments on this side will be favored. This option only affects the configurations
that allow local alignments, e.g. -s and -f; unless one of these options
is also activated, this option has no effect. You have independent control
over local/global alignment behavior for the N/C (5'/3') termini of your
target sequences using -swentry and -swexit.
- [-verbose ] Print more possibly
useful stuff, such as the individual scores for each sequence in the alignment.
- [-wblosum ] Use the BLOSUM filtering algorithm to weight the sequences,
instead of the default. Cluster the sequences at a given percentage identity
(see -idlevel); assign each cluster a total weight of 1.0, distributed equally
amongst the members of that cluster.
- [-wgsc ] Use the Gerstein/Sonnhammer/Chothia
ad hoc sequence weighting algorithm. This is already the default, so this
option has no effect (unless it follows another option in the -w family,
in which case it overrides it).
- [-wme ] Use the Krogh/Mitchison maximum entropy
algorithm to "weight" the sequences. This supercedes the Eddy/Mitchison/Durbin
maximum discrimination algorithm, which gives almost identical weights
but is less robust. ME weighting seems to give a marginal increase in
sensitivity over the default GSC weights, but takes a fair amount of time.
- [-wnone ] Turn off all sequence weighting.
- [-wvoronoi ] Use the Sibbald/Argos
Voronoi sequence weighting algorithm in place of the default GSC weighting.
Next: hmmcalibrate - calibrate HMM
Up: Manual pages
Previous: hmmalign - align sequences
Direct comments and questions to <eddy@genetics.wustl.edu>