rustling.perceptron_pos_tagger#
Averaged perceptron part-of-speech tagging.
Package Contents#
- class rustling.perceptron_pos_tagger.SeqFeatureTemplate#
A single feature template for sequence labeling.
Do not instantiate directly. Use
seq_obs()orseq_label()factory functions instead.
- class rustling.perceptron_pos_tagger.AveragedPerceptron(*, frequency_threshold: int = 10, ambiguity_threshold: float = 0.95, n_iter: int = 5, random_seed: int | None = None, features: Sequence[rustling.seq_feature.SeqFeatureTemplate] | None = None)#
A part-of-speech tagger using an averaged perceptron model.
This is a modified version based on the textblob-aptagger codebase (MIT license), with original implementation by Matthew Honnibal.
- predict(sequences: Sequence[Sequence[str]]) list[list[str]]#
Predict tags for the sequences.
- Parameters:
sequences – A list of segmented sentences, where each sentence is a sequence of words.
- Returns:
A list of tag sequences, one per input sentence.
- fit(sequences: Sequence[Sequence[str]], tags: Sequence[Sequence[str]]) None#
Fit a model.
- Parameters:
sequences – A list of segmented sentences for training, where each sentence is a sequence of words.
tags – A list of tag sequences corresponding to the sentences.
- save(path: str | os.PathLike[str]) None#
Save the model to a zstd-compressed FlatBuffers binary.
- Parameters:
path – The path where the model will be saved. The file extension name
.fb.zstis recommended.
- load(path: str | os.PathLike[str]) None#
Load a model.
- Parameters:
path – The path where the model, stored as a zstd-compressed FlatBuffers binary, is located.
- Raises:
FileNotFoundError – If the file at the given path does not exist.
EnvironmentError – If the file cannot be read as a tagger model.
- property weights: dict[str, dict[str, float]]#
Get the model’s weights dictionary.
- Returns:
A dictionary mapping features to their weight vectors.