hmmestimate
Hidden Markov model parameter estimates from emissions and states
Syntax
[TRANS,EMIS] = hmmestimate(seq,states)
hmmestimate(...,'Symbols',SYMBOLS)
hmmestimate(...,'Statenames',STATENAMES)
hmmestimate(...,'Pseudoemissions',PSEUDOE)
hmmestimate(...,'Pseudotransitions',PSEUDOTR)
Description
[TRANS,EMIS] = hmmestimate(seq,states)
calculates
the maximum likelihood estimate of the transition, TRANS
,
and emission, EMIS
, probabilities of a hidden Markov
model for sequence, seq
, with known states, states
.
hmmestimate(...,'Symbols',SYMBOLS)
specifies the symbols that are
emitted. SYMBOLS
can be a numeric array, a string array or a cell array
of the names of the symbols. The default symbols are integers 1 through N, where N is the
number of possible emissions.
hmmestimate(...,'Statenames',STATENAMES)
specifies the names of the
states. STATENAMES
can be a numeric array, a string array, or a cell
array of the names of the states. The default state names are 1 through
M
, where M
is the number of states.
hmmestimate(...,'Pseudoemissions',PSEUDOE)
specifies
pseudocount emission values in the matrix PSEUDOE
.
Use this argument to avoid zero probability estimates for emissions
with very low probability that might not be represented in the sample
sequence. PSEUDOE
should be a matrix of size m-by-n,
where m is the number of states in the hidden Markov
model and n is the number of possible emissions.
If the emission does not occur in seq
,
you can set PSEUDOE(i,k)
to be a positive number
representing an estimate of the expected number of such emissions
in the sequence seq
.
hmmestimate(...,'Pseudotransitions',PSEUDOTR)
specifies
pseudocount transition values. You can use this argument to avoid
zero probability estimates for transitions with very low probability
that might not be represented in the sample sequence. PSEUDOTR
should
be a matrix of size m-by-m,
where m is the number of states in the hidden Markov
model. If the transition does
not occur in states
, you can set PSEUDOTR(i,j)
to
be a positive number representing an estimate of the expected number
of such transitions in the sequence states
.
Pseudotransitions and Pseudoemissions
If the probability of a specific transition or emission is very
low, the transition might never occur in the sequence states
,
or the emission might never occur in the sequence seq
.
In either case, the algorithm returns a probability of 0 for the given
transition or emission in TRANS
or EMIS
.
You can compensate for the absence of transition with the 'Pseudotransitions'
and 'Pseudoemissions'
arguments.
The simplest way to do this is to set the corresponding entry of PSEUDOE
or PSEUDOTR
to 1
.
For example, if the transition does
not occur in states
, set PSEUDOTR(i,j)
= 1
. This forces TRANS(i,j)
to be positive.
If you have an estimate for the expected number of transitions in a sequence of the same length
as states
, and the actual number of transitions that occur in seq
is
substantially less than what you expect, you can set PSEUDOTR(i,j)
to
the expected number. This increases the value of TRANS(i,j)
.
For transitions that do occur in states with the frequency you expect,
set the corresponding entry of PSEUDOTR
to 0
,
which does not increase the corresponding entry of TRANS
.
If you do not know the sequence of states, use hmmtrain
to
estimate the model parameters.
Examples
trans = [0.95,0.05; 0.10,0.90]; emis = [1/6 1/6 1/6 1/6 1/6 1/6; 1/10 1/10 1/10 1/10 1/10 1/2]; [seq,states] = hmmgenerate(1000,trans,emis); [estimateTR,estimateE] = hmmestimate(seq,states);
References
[1] Durbin, R., S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis. Cambridge, UK: Cambridge University Press, 1998.
Version History
Introduced before R2006a