SRT (SubRip Subtitle)#
The rustling.srt module provides tools for parsing
SubRip subtitle (.srt) files.
An .srt file is a plain-text format where each subtitle block has a
sequence number, a time range, and one or more lines of text:
1
00:02:16,612 --> 00:02:19,376
Senator, we're making
our final approach into Coruscant.
2
00:02:19,482 --> 00:02:21,609
Very good, Lieutenant.
Loading Data#
read_srt()#
The quickest way to load SRT data is with read_srt().
It accepts a file path, directory, ZIP archive, git URL, or HTTP URL
and figures out the right loading strategy automatically:
import rustling
# From a local .srt file
srt = rustling.read_srt("path/to/movie.srt")
# From a directory (recursively finds all .srt files)
srt = rustling.read_srt("path/to/subtitles/")
# From a ZIP archive
srt = rustling.read_srt("path/to/subtitles.zip")
# From a git repository
srt = rustling.read_srt("https://github.com/user/corpus.git")
# From a URL (ZIP files are automatically detected and extracted)
srt = rustling.read_srt("https://example.com/subtitles.zip")
Using the class methods directly#
If you need finer control — for example, to pass specific files,
filter by regex, change the file extension, control caching, or parse
in-memory strings — use the SRT class methods directly:
from rustling.srt import SRT
From specific files:
srt = SRT.from_files(["path/to/file1.srt", "path/to/file2.srt"])
From a directory with a regex filter:
srt = SRT.from_dir("path/to/subtitles/", match=r"episode_01")
The extension parameter controls which file extension to look for (default: ".srt").
From a ZIP archive:
srt = SRT.from_zip("path/to/subtitles.zip")
From a git repository:
srt = SRT.from_git("https://github.com/user/corpus.git")
From a URL (ZIP files are automatically detected and extracted):
srt = SRT.from_url("https://example.com/subtitles.zip")
From in-memory strings:
srt = SRT.from_strs([srt_string_1, srt_string_2])
Parallel processing#
All loading methods accept a parallel parameter (default: True)
to enable parallel parsing of multiple files.
Accessing Subtitle Data#
Call utterances() to get a flat list of all
subtitle blocks across all files:
import rustling
srt = rustling.read_srt("movie.srt")
for utterance in srt.utterances():
print(utterance.index, utterance.time_marks, utterance.line)
An Utterance has the following properties:
index– 1-based sequence number from the SRT file.line– The subtitle text (multiline text preserved with\n).time_marks– Start and end time in milliseconds as atuple[int, int].
Utterance objects can also be constructed directly:
from rustling.srt import Utterance
utt = Utterance(index=1, line="Hello world.", time_marks=(0, 1500))
Converting to CHAT#
An SRT reader can convert its data to CHAT format
for use with CHILDES / TalkBank tools.
import rustling
srt = rustling.read_srt("recording.srt")
# Convert to a CHAT object
chat = srt.to_chat()
# Or get CHAT-formatted strings
chat_strs = srt.to_chat_strs()
# Or write .cha files directly
srt.to_chat_files("output_dir/")
Since SRT files have no participant information, a default participant code
"SPK" (Speaker) is used. Multiline subtitle text is joined with a space
in the CHAT output (CHAT utterances are single-line).
Converting to ELAN#
An SRT reader can convert its data to ELAN format.
import rustling
srt = rustling.read_srt("recording.srt")
# Convert to an ELAN object
elan = srt.to_elan()
# Or get EAF XML strings
eaf_strs = srt.to_elan_strs()
# Or write .eaf files directly
srt.to_elan_files("output_dir/")
The conversion creates a single alignable tier named "SPK" (Speaker)
with one annotation per subtitle block.
Converting to TextGrid#
An SRT reader can convert its data to
TextGrid
format for use with Praat.
import rustling
srt = rustling.read_srt("recording.srt")
# Convert to a TextGrid object
textgrid = srt.to_textgrid()
# Or get TextGrid-formatted strings
textgrid_strs = srt.to_textgrid_strs()
# Or write .TextGrid files directly
srt.to_textgrid_files("output_dir/")
The conversion creates a single IntervalTier named "SPK" (Speaker)
with one interval per subtitle block.
Collection Operations#
An SRT reader behaves like a collection of files.
You can iterate, slice, combine, and modify it:
import rustling
srt = rustling.read_srt("path/to/subtitles/")
# File count and paths
print(srt.n_files)
print(srt.file_paths)
# Iteration and slicing
for single_file in srt:
print(single_file.n_files) # 1
subset = srt[0:3]
# Combining
combined = srt1 + srt2
srt1 += srt2
# Appending and extending
srt1.append(srt2)
srt1.extend([srt2, srt3])
# Removing
last = srt.pop()
first = srt.pop_left()
srt.clear()