rustling.ngram#
Tools for keeping track of and counting ngrams.
Package Contents#
- class rustling.ngram.Ngrams(n: int, *, min_n: int | None = None)#
An counter for storing n-grams efficiently and counting their frequencies.
Accumulates n-gram counts from sequences of elements. N-grams do not cross sequence boundaries.
- count(seq: Sequence[str]) None#
Count n-grams from a single sequence.
- Parameters:
seq – A sequence of elements to extract n-grams from.
- count_seqs(seqs: Sequence[Sequence[str]]) None#
Count n-grams from multiple sequences.
- Parameters:
seqs – An iterable of sequences.
- get(ngram: Sequence[str]) int#
Return the count for a specific n-gram.
- Parameters:
ngram – The n-gram to look up.
- Returns:
The count, or 0 if not observed.
- most_common(n: int | None = None, *, order: int | None = None) list[tuple[tuple[str, Ellipsis], int]]#
Return the n most common n-grams with their counts.
- Parameters:
n – Number of top entries to return. If None, returns all n-grams sorted by count (descending).
order – If specified, only return n-grams of this specific order. Must be between min_n and n (inclusive).
- Returns:
A list of (ngram_tuple, count) pairs sorted by count.
- Raises:
ValueError – If order is out of range.
- items(*, order: int | None = None) list[tuple[tuple[str, Ellipsis], int]]#
Return all (n-gram, count) pairs.
- Parameters:
order – If specified, only return n-grams of this specific order. Must be between min_n and n (inclusive).
- Returns:
A list of (ngram_tuple, count) pairs.
- Raises:
ValueError – If order is out of range.
- total(*, order: int | None = None) int#
Return the total number of n-gram tokens counted.
- Parameters:
order – If specified, return total for this specific order only. Must be between min_n and n (inclusive). If None, returns the sum across all orders.
- Returns:
Total count.
- Raises:
ValueError – If order is out of range.
- to_counter(*, order: int | None = None) collections.Counter[tuple[str, Ellipsis]]#
Convert to a
collections.Counter.- Parameters:
order – If specified, only include n-grams of this specific order. Must be between min_n and n (inclusive). If None, defaults to the highest order (n).
- Returns:
A Counter mapping n-gram tuples to their counts.
- Raises:
ValueError – If order is out of range.