dc.contributor.author |
Darģis, Roberts |
dc.contributor.author |
Znotiņš, Artūrs |
dc.contributor.author |
Auziņa, Ilze |
dc.contributor.author |
Rābante-Buša, Guna |
dc.date.accessioned |
2024-03-25T14:44:20Z |
dc.date.available |
2024-03-25T14:44:20Z |
dc.date.issued |
2024-03 |
dc.identifier.uri |
http://hdl.handle.net/20.500.12574/99 |
dc.description |
A Latvian speech corpus for the development (validation), testing and comparison of ASR models.
The audio data is segmented and aligned with the corresponding orthographic transcriptions which are human verified. The LATE-media subset contains both verbatim (raw) and formatted transcriptions (with punctuation, capitalisation, numbers, abbreviations, etc.), while the LATE-conversations subset currently contains only verbatim transcriptions (no punctuation, capitalisation, etc.).
The dataset consists of:
- 5 hours of broadcast media recordings, both spontaneous and prepared speech (2.5h dev set, 2.5h test set);
- 5 hours of conversational speech recordings, spontaneous speech (2.5h dev set, 2.5h test set). |
dc.language.iso |
lav |
dc.publisher |
AiLab IMCS UL |
dc.relation.isreferencedby |
https://korpuss.lv/id/LATE-mediji |
dc.relation.isreferencedby |
https://korpuss.lv/id/LATE-sarunas |
dc.rights |
CLARIN ACA |
dc.rights.uri |
https://www.kielipankki.fi/wp-content/uploads/CLARIN_ACA_AFFIL-EDU_NC_NORED_en.html |
dc.rights.label |
ACA |
dc.source.uri |
http://www.digitalhumanities.lv/projects/vpp-late/ |
dc.subject |
ASR |
dc.title |
LATE Dev&Test Set V1 for Latvian ASR |
dc.type |
corpus |
metashare.ResourceInfo#ContentInfo.mediaType |
audio |
has.files |
yes |
branding |
CLARIN Centre of Latvian language resources and tools |
demo.uri |
https://late.ailab.lv |
contact.person |
Ilze Auziņa ilze.auzina@lumii.lv IMCS at University of Latvia |
sponsor |
Ministry of Education and Science VPP-LETONIKA-2021/1-0006 Research on Modern Latvian Language and Development of Language Technology nationalFunds |
size.info |
10 hours |
files.size |
1002359686 |
files.count |
4 |