Rādīt vienkāršu vienuma ierakstu

 
dc.contributor.author Darģis, Roberts
dc.contributor.author Znotiņš, Artūrs
dc.contributor.author Auziņa, Ilze
dc.contributor.author Rābante-Buša, Guna
dc.date.accessioned 2024-03-25T14:44:20Z
dc.date.available 2024-03-25T14:44:20Z
dc.date.issued 2024-03
dc.identifier.uri http://hdl.handle.net/20.500.12574/99
dc.description A Latvian speech corpus for the development (validation), testing and comparison of ASR models. The audio data is segmented and aligned with the corresponding orthographic transcriptions which are human verified. The LATE-media subset contains both verbatim (raw) and formatted transcriptions (with punctuation, capitalisation, numbers, abbreviations, etc.), while the LATE-conversations subset currently contains only verbatim transcriptions (no punctuation, capitalisation, etc.). The dataset consists of: - 5 hours of broadcast media recordings, both spontaneous and prepared speech (2.5h dev set, 2.5h test set); - 5 hours of conversational speech recordings, spontaneous speech (2.5h dev set, 2.5h test set).
dc.language.iso lav
dc.publisher AiLab IMCS UL
dc.relation.isreferencedby https://korpuss.lv/id/LATE-mediji
dc.relation.isreferencedby https://korpuss.lv/id/LATE-sarunas
dc.rights CLARIN ACA
dc.rights.uri https://www.kielipankki.fi/wp-content/uploads/CLARIN_ACA_AFFIL-EDU_NC_NORED_en.html
dc.rights.label ACA
dc.source.uri http://www.digitalhumanities.lv/projects/vpp-late/
dc.subject ASR
dc.title LATE Dev&Test Set V1 for Latvian ASR
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType audio
has.files yes
branding CLARIN Centre of Latvian language resources and tools
demo.uri https://late.ailab.lv
contact.person Ilze Auziņa ilze.auzina@lumii.lv IMCS at University of Latvia
sponsor Ministry of Education and Science VPP-LETONIKA-2021/1-0006 Research on Modern Latvian Language and Development of Language Technology nationalFunds
size.info 10 hours
files.size 1002359686
files.count 4


 Faili šajā vienumā

 Lejupielādēt visus vienuma failus (955.92 MB)
Šis vienums ir
Academic Use
un ir licencēts saskaņā ar:
CLARIN ACA
Noncommercial
Icon
Vārds
late-media-v1-test.zip
Lielums
234.72 MB
Formāts
application/zip
Apraksts
A test set of orthographically transcribed speech segments from media content. The verbatim and formatted transcriptions are stored in a self-explanatory JSON file.
MD5
b312198af484c23a5cbc74e04388991e
 Lejupielādēt failu
Icon
Vārds
late-conversations-v1-test.zip
Lielums
241.42 MB
Formāts
application/zip
Apraksts
A test set of orthographically transcribed conversational speech segments. The verbatim and formatted transcriptions are stored in a self-explanatory JSON file.
MD5
ed9085d33f715ec6a8d2ab60a0816218
 Lejupielādēt failu
Icon
Vārds
late-conversations-v1-dev.zip
Lielums
244.9 MB
Formāts
application/zip
Apraksts
A development set of orthographically transcribed conversational speech segments. The verbatim and formatted transcriptions are stored in a self-explanatory JSON file.
MD5
e5c32717cf925d04989876426e51fb9c
 Lejupielādēt failu
Icon
Vārds
late-media-v1-dev.zip
Lielums
234.88 MB
Formāts
application/zip
Apraksts
A development set of orthographically transcribed speech segments from media content. The verbatim and formatted transcriptions are stored in a self-explanatory JSON file.
MD5
fd0632512e4fea6b503c3cb734862722
 Lejupielādēt failu

Rādīt vienkāršu vienuma ierakstu