rustling.ngram#

Tools for keeping track of and counting ngrams.

Package Contents#

class rustling.ngram.Ngrams(n: int, *, min_n: int | None = None)#

An counter for storing n-grams efficiently and counting their frequencies.

Accumulates n-gram counts from sequences of elements. N-grams do not cross sequence boundaries.

count(seq: Sequence[str]) None#

Count n-grams from a single sequence.

Parameters:

seq – A sequence of elements to extract n-grams from.

count_seqs(seqs: Sequence[Sequence[str]]) None#

Count n-grams from multiple sequences.

Parameters:

seqs – An iterable of sequences.

get(ngram: Sequence[str]) int#

Return the count for a specific n-gram.

Parameters:

ngram – The n-gram to look up.

Returns:

The count, or 0 if not observed.

most_common(n: int | None = None, *, order: int | None = None) list[tuple[tuple[str, Ellipsis], int]]#

Return the n most common n-grams with their counts.

Parameters:
  • n – Number of top entries to return. If None, returns all n-grams sorted by count (descending).

  • order – If specified, only return n-grams of this specific order. Must be between min_n and n (inclusive).

Returns:

A list of (ngram_tuple, count) pairs sorted by count.

Raises:

ValueError – If order is out of range.

items(*, order: int | None = None) list[tuple[tuple[str, Ellipsis], int]]#

Return all (n-gram, count) pairs.

Parameters:

order – If specified, only return n-grams of this specific order. Must be between min_n and n (inclusive).

Returns:

A list of (ngram_tuple, count) pairs.

Raises:

ValueError – If order is out of range.

total(*, order: int | None = None) int#

Return the total number of n-gram tokens counted.

Parameters:

order – If specified, return total for this specific order only. Must be between min_n and n (inclusive). If None, returns the sum across all orders.

Returns:

Total count.

Raises:

ValueError – If order is out of range.

property n: int#

The n-gram order.

property min_n: int#

The minimum n-gram order.

to_counter(*, order: int | None = None) collections.Counter[tuple[str, Ellipsis]]#

Convert to a collections.Counter.

Parameters:

order – If specified, only include n-grams of this specific order. Must be between min_n and n (inclusive). If None, defaults to the highest order (n).

Returns:

A Counter mapping n-gram tuples to their counts.

Raises:

ValueError – If order is out of range.

clear() None#

Clear all counts.