rustling.perceptron_pos_tagger#

Averaged perceptron part-of-speech tagging.

Package Contents#

class rustling.perceptron_pos_tagger.SeqFeatureTemplate#

A single feature template for sequence labeling.

Do not instantiate directly. Use seq_obs() or seq_label() factory functions instead.

class rustling.perceptron_pos_tagger.AveragedPerceptron(*, frequency_threshold: int = 10, ambiguity_threshold: float = 0.95, n_iter: int = 5, random_seed: int | None = None, features: Sequence[rustling.seq_feature.SeqFeatureTemplate] | None = None)#

A part-of-speech tagger using an averaged perceptron model.

This is a modified version based on the textblob-aptagger codebase (MIT license), with original implementation by Matthew Honnibal.

predict(sequences: Sequence[Sequence[str]]) list[list[str]]#

Predict tags for the sequences.

Parameters:

sequences – A list of segmented sentences, where each sentence is a sequence of words.

Returns:

A list of tag sequences, one per input sentence.

fit(sequences: Sequence[Sequence[str]], tags: Sequence[Sequence[str]]) None#

Fit a model.

Parameters:
  • sequences – A list of segmented sentences for training, where each sentence is a sequence of words.

  • tags – A list of tag sequences corresponding to the sentences.

save(path: str | os.PathLike[str]) None#

Save the model to a zstd-compressed FlatBuffers binary.

Parameters:

path – The path where the model will be saved. The file extension name .fb.zst is recommended.

load(path: str | os.PathLike[str]) None#

Load a model.

Parameters:

path – The path where the model, stored as a zstd-compressed FlatBuffers binary, is located.

Raises:
property weights: dict[str, dict[str, float]]#

Get the model’s weights dictionary.

Returns:

A dictionary mapping features to their weight vectors.

property tagdict: dict[str, str]#

Get the tag dictionary.

Returns:

A dictionary mapping words to their most likely tags.

property classes: set[str]#

Get the set of POS tag classes.

Returns:

A set of all tag classes in the model.