TextGrid (Praat)#

The rustling.textgrid module provides tools for parsing Praat TextGrid annotation files.

A TextGrid file contains one or more tiers, each holding either time-aligned intervals or time-stamped points:

File type = "ooTextFile"
Object class = "TextGrid"

xmin = 0
xmax = 2.3
tiers? <exists>
size = 1
item []:
    item [1]:
        class = "IntervalTier"
        name = "words"
        xmin = 0
        xmax = 2.3
        intervals: size = 2
            intervals [1]:
                xmin = 0
                xmax = 1.5
                text = "hello"
            intervals [2]:
                xmin = 1.5
                xmax = 2.3
                text = "world"

Both the normal “text” format and the compact “short text” format are supported.

Loading Data#

read_textgrid()#

The quickest way to load TextGrid data is with read_textgrid(). It accepts a file path, directory, ZIP archive, git URL, or HTTP URL and figures out the right loading strategy automatically:

import rustling

# From a local .TextGrid file
tg = rustling.read_textgrid("path/to/recording.TextGrid")

# From a directory (recursively finds all .TextGrid files)
tg = rustling.read_textgrid("path/to/corpus/")

# From a ZIP archive
tg = rustling.read_textgrid("path/to/corpus.zip")

# From a git repository
tg = rustling.read_textgrid("https://github.com/user/corpus.git")

# From a URL (ZIP files are automatically detected and extracted)
tg = rustling.read_textgrid("https://example.com/corpus.zip")

Using the class methods directly#

If you need finer control — for example, to pass specific files, filter by regex, change the file extension, control caching, or parse in-memory strings — use the TextGrid class methods directly:

from rustling.textgrid import TextGrid

From specific files:

tg = TextGrid.from_files(["path/to/file1.TextGrid", "path/to/file2.TextGrid"])

From a directory with a regex filter:

tg = TextGrid.from_dir("path/to/corpus/", match=r"speaker_01")

The extension parameter controls which file extension to look for (default: ".TextGrid").

From a ZIP archive:

tg = TextGrid.from_zip("path/to/corpus.zip")

From a git repository:

tg = TextGrid.from_git("https://github.com/user/corpus.git")

From a URL (ZIP files are automatically detected and extracted):

tg = TextGrid.from_url("https://example.com/corpus.zip")

From in-memory strings:

tg = TextGrid.from_strs([textgrid_string_1, textgrid_string_2])

Parallel processing#

All loading methods accept a parallel parameter (default: True) to enable parallel parsing of multiple files.

Accessing Tiers and Annotations#

Each TextGrid file contains tiers that can be either interval tiers or point tiers. Call tiers() to get a list of lists, one per file, where each inner list contains IntervalTier and/or TextTier objects:

import rustling
from rustling.textgrid import IntervalTier, TextTier

tg = rustling.read_textgrid("path/to/corpus/")

for file_tiers in tg.tiers():
    for tier in file_tiers:
        print(tier.name, tier.tier_class)
        if isinstance(tier, IntervalTier):
            for interval in tier.intervals:
                print(f"  [{interval.xmin}-{interval.xmax}] {interval.text}")
        elif isinstance(tier, TextTier):
            for point in tier.points:
                print(f"  [{point.number}] {point.mark}")

An IntervalTier has:

  • name – Tier name.

  • xmin – Start time in seconds.

  • xmax – End time in seconds.

  • intervals – List of Interval objects.

  • tier_class – Always "IntervalTier".

An Interval has:

  • xmin – Start time in seconds.

  • xmax – End time in seconds.

  • text – The annotation text.

A TextTier has:

  • name – Tier name.

  • xmin – Start time in seconds.

  • xmax – End time in seconds.

  • points – List of Point objects.

  • tier_class – Always "TextTier".

A Point has:

  • number – Time in seconds.

  • mark – The annotation text.

Converting to ELAN#

A TextGrid reader can convert its data to ELAN format.

import rustling

tg = rustling.read_textgrid("recording.TextGrid")

# Convert to an ELAN object
elan = tg.to_elan()

# Or get EAF XML strings
eaf_strs = tg.to_elan_strs()

# Or write .eaf files directly
tg.to_elan_files("output_dir/")

Mapping:

  • Each IntervalTier becomes an ELAN tier with alignable annotations.

  • TextTiers are skipped (point annotations have no duration for ELAN).

  • Empty-text intervals are skipped.

  • Times are converted from seconds to milliseconds.

Converting to CHAT#

A TextGrid reader can convert its data to CHAT format for use with CHILDES / TalkBank tools.

import rustling

tg = rustling.read_textgrid("recording.TextGrid")

# Convert to a CHAT object
chat = tg.to_chat()

# Or get CHAT-formatted strings
chat_strs = tg.to_chat_strs()

# Or write .cha files directly
tg.to_chat_files("output_dir/")

Participant selection:

By default, only IntervalTiers with a 3-character name are treated as CHAT main tiers. To override this, pass the participants keyword argument:

chat = tg.to_chat(participants=["words", "phones"])

Converting to SRT#

A TextGrid reader can convert its data to SRT (SubRip Subtitle) format.

import rustling

tg = rustling.read_textgrid("recording.TextGrid")

# Convert to an SRT object
srt = tg.to_srt()

# Or get SRT-formatted strings
srt_strs = tg.to_srt_strs()

# Or write .srt files directly
tg.to_srt_files("output_dir/")

Participant selection works the same as for CHAT conversion above.

Collection Operations#

A TextGrid reader behaves like a collection of files. You can iterate, slice, combine, and modify it:

import rustling

tg = rustling.read_textgrid("path/to/corpus/")

# File count and paths
print(tg.n_files)
print(tg.file_paths)

# Iteration and slicing
for single_file in tg:
    print(single_file.n_files)  # 1

subset = tg[0:3]

# Combining
combined = tg1 + tg2
tg1 += tg2

# Appending and extending
tg1.append(tg2)
tg1.extend([tg2, tg3])

# Removing
last = tg.pop()
first = tg.pop_left()
tg.clear()