TextGrid (Praat)#
The rustling.textgrid module provides tools for parsing
Praat TextGrid annotation files.
A TextGrid file contains one or more tiers, each holding either time-aligned intervals or time-stamped points:
File type = "ooTextFile"
Object class = "TextGrid"
xmin = 0
xmax = 2.3
tiers? <exists>
size = 1
item []:
item [1]:
class = "IntervalTier"
name = "words"
xmin = 0
xmax = 2.3
intervals: size = 2
intervals [1]:
xmin = 0
xmax = 1.5
text = "hello"
intervals [2]:
xmin = 1.5
xmax = 2.3
text = "world"
Both the normal “text” format and the compact “short text” format are supported.
Loading Data#
read_textgrid()#
The quickest way to load TextGrid data is with read_textgrid().
It accepts a file path, directory, ZIP archive, git URL, or HTTP URL
and figures out the right loading strategy automatically:
import rustling
# From a local .TextGrid file
tg = rustling.read_textgrid("path/to/recording.TextGrid")
# From a directory (recursively finds all .TextGrid files)
tg = rustling.read_textgrid("path/to/corpus/")
# From a ZIP archive
tg = rustling.read_textgrid("path/to/corpus.zip")
# From a git repository
tg = rustling.read_textgrid("https://github.com/user/corpus.git")
# From a URL (ZIP files are automatically detected and extracted)
tg = rustling.read_textgrid("https://example.com/corpus.zip")
Using the class methods directly#
If you need finer control — for example, to pass specific files,
filter by regex, change the file extension, control caching, or parse
in-memory strings — use the TextGrid class methods directly:
from rustling.textgrid import TextGrid
From specific files:
tg = TextGrid.from_files(["path/to/file1.TextGrid", "path/to/file2.TextGrid"])
From a directory with a regex filter:
tg = TextGrid.from_dir("path/to/corpus/", match=r"speaker_01")
The extension parameter controls which file extension to look for (default: ".TextGrid").
From a ZIP archive:
tg = TextGrid.from_zip("path/to/corpus.zip")
From a git repository:
tg = TextGrid.from_git("https://github.com/user/corpus.git")
From a URL (ZIP files are automatically detected and extracted):
tg = TextGrid.from_url("https://example.com/corpus.zip")
From in-memory strings:
tg = TextGrid.from_strs([textgrid_string_1, textgrid_string_2])
Parallel processing#
All loading methods accept a parallel parameter (default: True)
to enable parallel parsing of multiple files.
Accessing Tiers and Annotations#
Each TextGrid file contains tiers that can be either interval tiers or point tiers.
Call tiers() to get a list of lists,
one per file, where each inner list contains
IntervalTier and/or
TextTier objects:
import rustling
from rustling.textgrid import IntervalTier, TextTier
tg = rustling.read_textgrid("path/to/corpus/")
for file_tiers in tg.tiers():
for tier in file_tiers:
print(tier.name, tier.tier_class)
if isinstance(tier, IntervalTier):
for interval in tier.intervals:
print(f" [{interval.xmin}-{interval.xmax}] {interval.text}")
elif isinstance(tier, TextTier):
for point in tier.points:
print(f" [{point.number}] {point.mark}")
An IntervalTier has:
name– Tier name.xmin– Start time in seconds.xmax– End time in seconds.intervals– List ofIntervalobjects.tier_class– Always"IntervalTier".
An Interval has:
xmin– Start time in seconds.xmax– End time in seconds.text– The annotation text.
A TextTier has:
name– Tier name.xmin– Start time in seconds.xmax– End time in seconds.points– List ofPointobjects.tier_class– Always"TextTier".
A Point has:
number– Time in seconds.mark– The annotation text.
Converting to ELAN#
A TextGrid reader can convert its data to
ELAN format.
import rustling
tg = rustling.read_textgrid("recording.TextGrid")
# Convert to an ELAN object
elan = tg.to_elan()
# Or get EAF XML strings
eaf_strs = tg.to_elan_strs()
# Or write .eaf files directly
tg.to_elan_files("output_dir/")
Mapping:
Each IntervalTier becomes an ELAN tier with alignable annotations.
TextTiers are skipped (point annotations have no duration for ELAN).
Empty-text intervals are skipped.
Times are converted from seconds to milliseconds.
Converting to CHAT#
A TextGrid reader can convert its data to CHAT format
for use with CHILDES / TalkBank tools.
import rustling
tg = rustling.read_textgrid("recording.TextGrid")
# Convert to a CHAT object
chat = tg.to_chat()
# Or get CHAT-formatted strings
chat_strs = tg.to_chat_strs()
# Or write .cha files directly
tg.to_chat_files("output_dir/")
Participant selection:
By default, only IntervalTiers with a 3-character name are treated as
CHAT main tiers. To override this, pass the participants keyword argument:
chat = tg.to_chat(participants=["words", "phones"])
Converting to SRT#
A TextGrid reader can convert its data to SRT
(SubRip Subtitle) format.
import rustling
tg = rustling.read_textgrid("recording.TextGrid")
# Convert to an SRT object
srt = tg.to_srt()
# Or get SRT-formatted strings
srt_strs = tg.to_srt_strs()
# Or write .srt files directly
tg.to_srt_files("output_dir/")
Participant selection works the same as for CHAT conversion above.
Collection Operations#
A TextGrid reader behaves like a collection of files.
You can iterate, slice, combine, and modify it:
import rustling
tg = rustling.read_textgrid("path/to/corpus/")
# File count and paths
print(tg.n_files)
print(tg.file_paths)
# Iteration and slicing
for single_file in tg:
print(single_file.n_files) # 1
subset = tg[0:3]
# Combining
combined = tg1 + tg2
tg1 += tg2
# Appending and extending
tg1.append(tg2)
tg1.extend([tg2, tg3])
# Removing
last = tg.pop()
first = tg.pop_left()
tg.clear()