Rādīt vienkāršu vienuma ierakstu
dc.contributor.author | Darģis, Roberts |
dc.contributor.author | Znotiņš, Artūrs |
dc.contributor.author | Auziņa, Ilze |
dc.contributor.author | Rābante-Buša, Guna |
dc.date.accessioned | 2024-03-25T14:44:20Z |
dc.date.available | 2024-03-25T14:44:20Z |
dc.date.issued | 2024-03 |
dc.identifier.uri | http://hdl.handle.net/20.500.12574/99 |
dc.description | A Latvian speech corpus for the development (validation), testing and comparison of ASR models. The audio data is segmented and aligned with the corresponding orthographic transcriptions which are human verified. The LATE-media subset contains both verbatim (raw) and formatted transcriptions (with punctuation, capitalisation, numbers, abbreviations, etc.), while the LATE-conversations subset currently contains only verbatim transcriptions (no punctuation, capitalisation, etc.). The dataset consists of: - 5 hours of broadcast media recordings, both spontaneous and prepared speech (2.5h dev set, 2.5h test set); - 5 hours of conversational speech recordings, spontaneous speech (2.5h dev set, 2.5h test set). |
dc.language.iso | lav |
dc.publisher | AiLab IMCS UL |
dc.relation.isreferencedby | https://korpuss.lv/id/LATE-mediji |
dc.relation.isreferencedby | https://korpuss.lv/id/LATE-sarunas |
dc.rights | CLARIN ACA |
dc.rights.uri | https://www.kielipankki.fi/wp-content/uploads/CLARIN_ACA_AFFIL-EDU_NC_NORED_en.html |
dc.rights.label | ACA |
dc.source.uri | http://www.digitalhumanities.lv/projects/vpp-late/ |
dc.subject | ASR |
dc.title | LATE Dev&Test Set V1 for Latvian ASR |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | audio |
has.files | yes |
branding | CLARIN Centre of Latvian language resources and tools |
demo.uri | https://late.ailab.lv |
contact.person | Ilze Auziņa ilze.auzina@lumii.lv IMCS at University of Latvia |
sponsor | Ministry of Education and Science VPP-LETONIKA-2021/1-0006 Research on Modern Latvian Language and Development of Language Technology nationalFunds |
size.info | 10 hours |
files.size | 1002359686 |
files.count | 4 |
Faili šajā vienumā
Lejupielādēt visus vienuma failus (955.92 MB)- Vārds
- late-media-v1-test.zip
- Lielums
- 234.72 MB
- Formāts
- application/zip
- Apraksts
- A test set of orthographically transcribed speech segments from media content. The verbatim and formatted transcriptions are stored in a self-explanatory JSON file.
- MD5
- b312198af484c23a5cbc74e04388991e
- Vārds
- late-conversations-v1-test.zip
- Lielums
- 241.42 MB
- Formāts
- application/zip
- Apraksts
- A test set of orthographically transcribed conversational speech segments. The verbatim and formatted transcriptions are stored in a self-explanatory JSON file.
- MD5
- ed9085d33f715ec6a8d2ab60a0816218
- Vārds
- late-conversations-v1-dev.zip
- Lielums
- 244.9 MB
- Formāts
- application/zip
- Apraksts
- A development set of orthographically transcribed conversational speech segments. The verbatim and formatted transcriptions are stored in a self-explanatory JSON file.
- MD5
- e5c32717cf925d04989876426e51fb9c
- Vārds
- late-media-v1-dev.zip
- Lielums
- 234.88 MB
- Formāts
- application/zip
- Apraksts
- A development set of orthographically transcribed speech segments from media content. The verbatim and formatted transcriptions are stored in a self-explanatory JSON file.
- MD5
- fd0632512e4fea6b503c3cb734862722