Format Conversion#
The CHAT class can convert CHAT data to other annotation formats.
CHAT to ELAN#
to_elan() converts CHAT data to an
ELAN object.
Each CHAT file produces one ELAN file.
Tier mapping:
Each CHAT participant (e.g.,
*CHI:,*MOT:) becomes an alignable (time-aligned, parent) tier in ELAN, with the tier ID set to the participant code (e.g.,CHI,MOT).Each CHAT dependent tier (e.g.,
%mor,%gra,%gpx) becomes a reference annotation (child) tier in ELAN, with the tier ID{tier}@{participant}(e.g.,mor@CHI,gra@MOT).If the CHAT file has an
@Mediaheader, an ELANMEDIA_DESCRIPTORelement is included.
Example:
import rustling
chat = rustling.read_chat("path/to/your/data.cha")
# Convert to an ELAN object
elan = chat.to_elan()
# Write .eaf files to a directory
chat.to_elan_files("output_dir/")
# With custom filenames
chat.to_elan_files("output_dir/", filenames=["alice.eaf", "bob.eaf"])
To get EAF XML strings in memory (e.g., for inspection or further processing),
use to_elan_strs():
eaf_strings = chat.to_elan_strs()
The resulting ELAN object (or .eaf files) can be opened in
ELAN
or further processed with rustling.elan.ELAN.
CHAT to SRT#
to_srt() converts CHAT data to an
SRT object.
Each CHAT file produces one SRT file.
Mapping:
Each CHAT utterance with time marks becomes one subtitle block.
Utterances without time marks are skipped (SRT requires time ranges).
When multiple participants are present, the subtitle text is prefixed with the participant code (e.g.,
"CHI: more cookie ."). For a single participant, no prefix is added.
Participant selection:
By default, all participants are included.
To select specific participants, pass the participants keyword argument:
import rustling
chat = rustling.read_chat("path/to/your/data.cha")
# Convert to an SRT object
srt = chat.to_srt()
# Only include specific participants
srt = chat.to_srt(participants=["CHI"])
# Write .srt files to a directory
chat.to_srt_files("output_dir/")
# With custom filenames
chat.to_srt_files("output_dir/", filenames=["child.srt"])
To get SRT strings in memory (e.g., for inspection or further processing),
use to_srt_strs():
srt_strings = chat.to_srt_strs()
The resulting SRT object (or .srt files)
can be opened in any media player or subtitle editor.
CHAT to TextGrid#
to_textgrid() converts CHAT data to a
TextGrid object.
Each CHAT file produces one TextGrid file.
Mapping:
Each CHAT participant becomes an IntervalTier (tier name = participant code).
Utterances without time marks are skipped.
Times are converted from milliseconds to seconds.
Participant selection:
By default, all participants are included.
To select specific participants, pass the participants keyword argument:
import rustling
chat = rustling.read_chat("path/to/your/data.cha")
# Convert to a TextGrid object
textgrid = chat.to_textgrid()
# Only include specific participants
textgrid = chat.to_textgrid(participants=["CHI"])
# Write .TextGrid files to a directory
chat.to_textgrid_files("output_dir/")
# With custom filenames
chat.to_textgrid_files("output_dir/", filenames=["child.TextGrid"])
To get TextGrid strings in memory, use to_textgrid_strs():
textgrid_strings = chat.to_textgrid_strs()
The resulting TextGrid object (or .TextGrid files)
can be opened in Praat.
CHAT to CoNLL-U#
to_conllu() converts CHAT data to a
CoNLLU object.
Each CHAT file produces one CoNLL-U file, with each utterance becoming one sentence.
Mapping:
Each CHAT utterance becomes one CoNLL-U sentence.
Token.wordmaps to FORM.Token.pos(from%mor) maps to UPOS.Token.mor(from%mor) maps to LEMMA.Token.gra(from%gra) maps to HEAD and DEPREL.Fields without a direct mapping (XPOS, FEATS, DEPS, MISC) are set to
_.
Example:
import rustling
chat = rustling.read_chat("path/to/your/data.cha")
# Convert to a CoNLL-U object
conllu = chat.to_conllu()
# Write .conllu files to a directory
chat.to_conllu_files("output_dir/")
# With custom filenames
chat.to_conllu_files("output_dir/", filenames=["output.conllu"])
To get CoNLL-U strings in memory, use to_conllu_strs():
conllu_strings = chat.to_conllu_strs()
The resulting CoNLLU object (or .conllu files)
can be used with Universal Dependencies tools.