API Reference#

Word Segmentation#

class rustling.wordseg.LongestStringMatching(max_word_length: int)#

Greedy left-to-right longest match segmenter.

Parameters:

max_word_length – Maximum word length to consider during segmentation.

fit(sentences: list[tuple[str, ...]]) None#

Train the segmenter on a list of segmented sentences.

Parameters:

sentences – A list of tuples, where each tuple contains the words of a sentence.

predict(unsegmented: list[str]) list[list[str]]#

Segment a list of unsegmented strings.

Parameters:

unsegmented – A list of unsegmented strings.

Returns:

A list of segmented sentences, where each sentence is a list of words.

class rustling.wordseg.RandomSegmenter(prob: float)#

Random baseline segmenter.

Parameters:

prob – Probability of inserting a word boundary at each character position.

predict(unsegmented: list[str]) list[list[str]]#

Randomly segment a list of unsegmented strings.

Parameters:

unsegmented – A list of unsegmented strings.

Returns:

A list of segmented sentences, where each sentence is a list of words.

Part-of-Speech Tagging#

class rustling.taggers.AveragedPerceptronTagger#

Averaged perceptron part-of-speech tagger.

fit(sentences: list[list[tuple[str, str]]]) None#

Train the tagger on a list of tagged sentences.

Parameters:

sentences – A list of sentences, where each sentence is a list of (word, tag) tuples.

predict(sentences: list[list[str]]) list[list[tuple[str, str]]]#

Predict tags for a list of sentences.

Parameters:

sentences – A list of sentences, where each sentence is a list of words.

Returns:

A list of tagged sentences, where each sentence is a list of (word, tag) tuples.