rustling.wordseg#
Word segmentation.
Package Contents#
- class rustling.wordseg.LongestStringMatching(*, max_word_length: int)#
Longest string matching segmenter.
This model constructs predicted words by moving from left to right along an unsegmented sentence and finding the longest matching words, constrained by a maximum word length parameter.
- fit(sents: Sequence[Sequence[str]]) None#
Train the model with the input segmented sentences.
No cleaning or preprocessing (e.g., normalizing upper/lowercase, tokenization) is performed on the training data.
- Parameters:
sents – An iterable of segmented sentences (each sentence is a sequence of words).
- predict(sent_strs: Sequence[str]) list[list[str]]#
Segment the given unsegmented sentences.
- Parameters:
sent_strs – An iterable of unsegmented sentences.
- Returns:
A list of segmented sentences.
- class rustling.wordseg.RandomSegmenter(*, prob: float)#
A random segmenter.
Segmentation is predicted at random at each potential word boundary independently for a given probability. No training is required.
- fit(sents: Sequence[Sequence[str]]) None#
Training is not required for RandomSegmenter.
- Parameters:
sents – Unused.
- Raises:
NotImplementedError – Always, since no training is needed.
- predict(sent_strs: Sequence[str]) list[list[str]]#
Segment the given unsegmented sentences.
- Parameters:
sent_strs – An iterable of unsegmented sentences.
- Returns:
A list of segmented sentences.