<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
<title>Tilde Language resources and tools</title>
<link href="http://hdl.handle.net/20.500.12574/73" rel="alternate"/>
<subtitle/>
<id>http://hdl.handle.net/20.500.12574/73</id>
<updated>2026-01-10T18:20:41Z</updated>
<dc:date>2026-01-10T18:20:41Z</dc:date>
<entry>
<title>Evaluation and development data sets for speech translation for meetings</title>
<link href="http://hdl.handle.net/20.500.12574/74" rel="alternate"/>
<author>
<name>Pinnis, Mārcis</name>
</author>
<author>
<name>Pole, Megija</name>
</author>
<author>
<name>Kapočiūtė-Dzikienė, Jurgita</name>
</author>
<author>
<name>Nicmanis, Dāvis</name>
</author>
<author>
<name>Salimbajevs, Askars</name>
</author>
<author>
<name>Skadiņš, Raivis</name>
</author>
<author>
<name>Miķelsons, Mārtiņš</name>
</author>
<author>
<name>Lizanders, Kristaps</name>
</author>
<author>
<name>Bērziņš, Aivars</name>
</author>
<author>
<name>Vasiļevskis, Artūrs</name>
</author>
<author>
<name>Rozis, Roberts</name>
</author>
<author>
<name>Kornikaite, Nida</name>
</author>
<id>http://hdl.handle.net/20.500.12574/74</id>
<updated>2022-12-20T11:59:54Z</updated>
<published>2022-12-09T00:00:00Z</published>
<summary type="text">Evaluation and development data sets for speech translation for meetings
Pinnis, Mārcis; Pole, Megija; Kapočiūtė-Dzikienė, Jurgita; Nicmanis, Dāvis; Salimbajevs, Askars; Skadiņš, Raivis; Miķelsons, Mārtiņš; Lizanders, Kristaps; Bērziņš, Aivars; Vasiļevskis, Artūrs; Rozis, Roberts; Kornikaite, Nida
The evaluation and development data sets for speech translation for meetings were created within the microproject "Multi-layer evaluation sets for speech translation of web-based meetings" of the project "HumanE AI Network".&#13;
&#13;
The data sets feature recordings of various public domain (public administration organised, publicly disseminated) meetings in English, Latvian, and Lithuanian, their transcriptions and translations into Latvian and English.&#13;
&#13;
The data sets feature multiple layers of annotation - raw orthographic transcription, normalised transcription (with spoken language words/phrases replaced with equivalents from written language), truncated transcription (with spoken language elements that have no equivalents in written language deleted), reordered transcription (with words reordered to better adhere to syntax norms of written language), and translation.&#13;
&#13;
The English and Latvian data were annotated by linguistics students. The Lithuanian data were annotated by a professional linguist.&#13;
&#13;
The data is intended for the development and evaluation purposes of speech translation systems and various components involved in pipeline-based speech translation systems (speaker diarisation, speech segmentation, automatic speech recognition, punctuation restoration, spoken language normalisation, and machine translation).
</summary>
<dc:date>2022-12-09T00:00:00Z</dc:date>
</entry>
</feed>
