rustling.wordseg#

Type stubs for rustling.wordseg.

Package Contents#

class rustling.wordseg.LongestStringMatching(*, max_word_length: int)#

Longest string matching segmenter.

This model constructs predicted words by moving from left to right along an unsegmented sentence and finding the longest matching words, constrained by a maximum word length parameter.

fit(sents: Sequence[Sequence[str]]) None#

Train the model with the input segmented sentences.

No cleaning or preprocessing (e.g., normalizing upper/lowercase, tokenization) is performed on the training data.

Parameters:

sents – An iterable of segmented sentences (each sentence is a sequence of words).

predict(sent_strs: Sequence[str]) list[list[str]]#

Segment the given unsegmented sentences.

Parameters:

sent_strs – An iterable of unsegmented sentences.

Returns:

A list of segmented sentences.

class rustling.wordseg.RandomSegmenter(*, prob: float)#

A random segmenter.

Segmentation is predicted at random at each potential word boundary independently for a given probability. No training is required.

fit(sents: Sequence[Sequence[str]]) None#

Training is not required for RandomSegmenter.

Parameters:

sents – Unused.

Raises:

NotImplementedError – Always, since no training is needed.

predict(sent_strs: Sequence[str]) list[list[str]]#

Segment the given unsegmented sentences.

Parameters:

sent_strs – An iterable of unsegmented sentences.

Returns:

A list of segmented sentences.