API Reference#
Word Segmentation#
- class rustling.wordseg.LongestStringMatching(max_word_length: int)#
Greedy left-to-right longest match segmenter.
- Parameters:
max_word_length – Maximum word length to consider during segmentation.
- fit(sentences: list[tuple[str, ...]]) None#
Train the segmenter on a list of segmented sentences.
- Parameters:
sentences – A list of tuples, where each tuple contains the words of a sentence.
- predict(unsegmented: list[str]) list[list[str]]#
Segment a list of unsegmented strings.
- Parameters:
unsegmented – A list of unsegmented strings.
- Returns:
A list of segmented sentences, where each sentence is a list of words.
- class rustling.wordseg.RandomSegmenter(prob: float)#
Random baseline segmenter.
- Parameters:
prob – Probability of inserting a word boundary at each character position.
- predict(unsegmented: list[str]) list[list[str]]#
Randomly segment a list of unsegmented strings.
- Parameters:
unsegmented – A list of unsegmented strings.
- Returns:
A list of segmented sentences, where each sentence is a list of words.
Part-of-Speech Tagging#
- class rustling.taggers.AveragedPerceptronTagger#
Averaged perceptron part-of-speech tagger.
- fit(sentences: list[list[tuple[str, str]]]) None#
Train the tagger on a list of tagged sentences.
- Parameters:
sentences – A list of sentences, where each sentence is a list of (word, tag) tuples.
- predict(sentences: list[list[str]]) list[list[tuple[str, str]]]#
Predict tags for a list of sentences.
- Parameters:
sentences – A list of sentences, where each sentence is a list of words.
- Returns:
A list of tagged sentences, where each sentence is a list of (word, tag) tuples.