rustling.hmm#
Hidden Markov Model.
Package Contents#
- class rustling.hmm.SeqFeatureTemplate#
A single feature template for sequence labeling.
Do not instantiate directly. Use
seq_obs()orseq_label()factory functions instead.
- class rustling.hmm.HiddenMarkovModel(*, n_states: int, n_iter: int = 100, tolerance: float = 1e-06, gamma: float = 1.0, random_seed: int | None = None, features: Sequence[rustling.seq_feature.SeqFeatureTemplate] | None = None)#
A Hidden Markov Model.
Supports both unsupervised training (Baum-Welch EM algorithm), supervised training (label counting with Lidstone smoothing), and semi-supervised training. Uses the Viterbi algorithm for decoding and the Forward algorithm for computing log-likelihoods.
- fit(sequences: Sequence[Sequence[str]], labels: Sequence[Sequence[str]] | None = None) None#
Train the model.
When
labelsare provided, uses supervised counting with Lidstone smoothing (configurable viagamma). WhenlabelsisNone, uses the Baum-Welch (EM) algorithm for unsupervised training.Semi-supervised training is supported by calling
fittwice: first with labels (supervised), then without labels (unsupervised). The second call uses the supervised model’s parameters as the EM initialization instead of random initialization, and extends the vocabulary with any new observations from the unlabeled data.- Parameters:
sequences – A list of observation sequences. Each sequence is a list of observation strings.
labels – Optional list of label sequences, parallel to
sequences. Each label sequence must have the same length as the corresponding observation sequence.
- Raises:
ValueError – If sequences and labels have mismatched lengths.
- predict(sequences: Sequence[Sequence[str]]) list[list[int]]#
Decode the most likely hidden state sequences.
Uses the Viterbi algorithm to find the state sequence that maximizes the joint probability of the observations and states. Unknown observations (not seen during training) are assigned a uniform emission probability.
- Parameters:
sequences – A list of observation sequences.
- Returns:
A list of state index lists (0-based) corresponding to the most likely hidden state at each time step.
- Raises:
ValueError – If the model has not been fitted yet.
- score(sequences: Sequence[Sequence[str]]) list[float]#
Compute the log-likelihood of each observation sequence.
Uses the Forward algorithm to compute the total log-probability of each observation sequence under the model. Unknown observations (not seen during training) are assigned a uniform emission probability.
- Parameters:
sequences – A list of observation sequences.
- Returns:
A list of log-likelihoods (natural log).
- Raises:
ValueError – If the model has not been fitted yet.
- save(path: str | os.PathLike[str]) None#
Save the model to a zstd-compressed FlatBuffers binary.
- Parameters:
path – The path where the model will be saved. The file extension name
.fb.zstis recommended.
- load(path: str | os.PathLike[str]) None#
Load a model.
- Parameters:
path – The path where the model, stored as a zstd-compressed FlatBuffers binary, is located.
- Raises:
FileNotFoundError – If the file does not exist.
EnvironmentError – If the file cannot be read as an HMM model.