Format Conversion#

The CHAT class can convert CHAT data to other annotation formats.

CHAT to ELAN#

to_elan() converts CHAT data to an ELAN object. Each CHAT file produces one ELAN file.

Tier mapping:

  • Each CHAT participant (e.g., *CHI:, *MOT:) becomes an alignable (time-aligned, parent) tier in ELAN, with the tier ID set to the participant code (e.g., CHI, MOT).

  • Each CHAT dependent tier (e.g., %mor, %gra, %gpx) becomes a reference annotation (child) tier in ELAN, with the tier ID {tier}@{participant} (e.g., mor@CHI, gra@MOT).

  • If the CHAT file has an @Media header, an ELAN MEDIA_DESCRIPTOR element is included.

Example:

import rustling

chat = rustling.read_chat("path/to/your/data.cha")

# Convert to an ELAN object
elan = chat.to_elan()

# Write .eaf files to a directory
chat.to_elan_files("output_dir/")

# With custom filenames
chat.to_elan_files("output_dir/", filenames=["alice.eaf", "bob.eaf"])

To get EAF XML strings in memory (e.g., for inspection or further processing), use to_elan_strs():

eaf_strings = chat.to_elan_strs()

The resulting ELAN object (or .eaf files) can be opened in ELAN or further processed with rustling.elan.ELAN.

CHAT to SRT#

to_srt() converts CHAT data to an SRT object. Each CHAT file produces one SRT file.

Mapping:

  • Each CHAT utterance with time marks becomes one subtitle block.

  • Utterances without time marks are skipped (SRT requires time ranges).

  • When multiple participants are present, the subtitle text is prefixed with the participant code (e.g., "CHI: more cookie ."). For a single participant, no prefix is added.

Participant selection:

By default, all participants are included. To select specific participants, pass the participants keyword argument:

import rustling

chat = rustling.read_chat("path/to/your/data.cha")

# Convert to an SRT object
srt = chat.to_srt()

# Only include specific participants
srt = chat.to_srt(participants=["CHI"])

# Write .srt files to a directory
chat.to_srt_files("output_dir/")

# With custom filenames
chat.to_srt_files("output_dir/", filenames=["child.srt"])

To get SRT strings in memory (e.g., for inspection or further processing), use to_srt_strs():

srt_strings = chat.to_srt_strs()

The resulting SRT object (or .srt files) can be opened in any media player or subtitle editor.

CHAT to TextGrid#

to_textgrid() converts CHAT data to a TextGrid object. Each CHAT file produces one TextGrid file.

Mapping:

  • Each CHAT participant becomes an IntervalTier (tier name = participant code).

  • Utterances without time marks are skipped.

  • Times are converted from milliseconds to seconds.

Participant selection:

By default, all participants are included. To select specific participants, pass the participants keyword argument:

import rustling

chat = rustling.read_chat("path/to/your/data.cha")

# Convert to a TextGrid object
textgrid = chat.to_textgrid()

# Only include specific participants
textgrid = chat.to_textgrid(participants=["CHI"])

# Write .TextGrid files to a directory
chat.to_textgrid_files("output_dir/")

# With custom filenames
chat.to_textgrid_files("output_dir/", filenames=["child.TextGrid"])

To get TextGrid strings in memory, use to_textgrid_strs():

textgrid_strings = chat.to_textgrid_strs()

The resulting TextGrid object (or .TextGrid files) can be opened in Praat.

CHAT to CoNLL-U#

to_conllu() converts CHAT data to a CoNLLU object. Each CHAT file produces one CoNLL-U file, with each utterance becoming one sentence.

Mapping:

  • Each CHAT utterance becomes one CoNLL-U sentence.

  • Token.word maps to FORM.

  • Token.pos (from %mor) maps to UPOS.

  • Token.mor (from %mor) maps to LEMMA.

  • Token.gra (from %gra) maps to HEAD and DEPREL.

  • Fields without a direct mapping (XPOS, FEATS, DEPS, MISC) are set to _.

Example:

import rustling

chat = rustling.read_chat("path/to/your/data.cha")

# Convert to a CoNLL-U object
conllu = chat.to_conllu()

# Write .conllu files to a directory
chat.to_conllu_files("output_dir/")

# With custom filenames
chat.to_conllu_files("output_dir/", filenames=["output.conllu"])

To get CoNLL-U strings in memory, use to_conllu_strs():

conllu_strings = chat.to_conllu_strs()

The resulting CoNLLU object (or .conllu files) can be used with Universal Dependencies tools.