CLARIN-LV digital library at IMCS, University of LatviaThe CLARIN-LV digital repository system captures, stores, indexes, preserves, and distributes digital research material.https://repository.clarin.lv:443/repository/xmlui2024-03-25T15:29:10Z2024-03-25T15:29:10ZLATE Dev&Test Set for Latvian ASRDarģis, RobertsZnotiņš, ArtūrsAuziņa, IlzeRābante-Buša, Gunahttp://hdl.handle.net/20.500.12574/992024-03-25T14:44:20Z2024-03-01T00:00:00ZLATE Dev&Test Set for Latvian ASR
Darģis, Roberts; Znotiņš, Artūrs; Auziņa, Ilze; Rābante-Buša, Guna
A Latvian speech corpus for the validadion, testing and comparison of ASR models.
The audio data is segmented and aligned with the corresponding orthographic transcriptions which are human verified.
The dataset consists of:
- 5 hours of broadcast media recordings (2.5h dev set, 2.5h test set);
- 5 hours of conversational speech recordings (2.5h dev set, 2.5h test set).
2024-03-01T00:00:00ZSELMA Latvian NER DatasetRābante-Buša, GunaGrūzītis, NormundsBārzdiņš, GuntisMendes, Afonsohttp://hdl.handle.net/20.500.12574/982024-03-25T12:55:24Z2022-03-01T00:00:00ZSELMA Latvian NER Dataset
Rābante-Buša, Guna; Grūzītis, Normunds; Bārzdiņš, Guntis; Mendes, Afonso
A dataset of hierarchically annotated named entities in Latvian news articles (provided by the Latvian Information Agency LETA) for the development and evaluation of transition-based parsers for named entity recognition (NER).
2022-03-01T00:00:00ZSELMA Open Source Platform (UC0)Goško, DidzisBārzdiņš, Guntishttp://hdl.handle.net/20.500.12574/972024-02-09T11:46:06Z2024-02-01T00:00:00ZSELMA Open Source Platform (UC0)
Goško, Didzis; Bārzdiņš, Guntis
The SELMA Open-Source Software (OSS) offers effective means to test and compare the performance of various language models used in multilingual media monitoring and content production. The SELMA OSS Platform (also referred to as Use Case 0, UC0, or The Basic Testing and Configuration Interface) provides:
* automatic speech recognition (ASR) from audio/video files,
* punctuation and capitalization of the transcribed text,
* machine translation (MT) into a target language,
* text-to-speech synthesis (TTS) and voice-over generation.
To provide this functionality, the demonstrator release uses these multilingual open source models: OpenAI Whisper (ASR), Meta MMS (TTS, ASR), Meta M2M-100 (MT). Thus, it facilitates easy access to such open large language models.
The SELMA Platform can be used not only by developers in order to combine and test alternative language models before they are integrated into the end-user applications – it can also be used as an entry-level application by journalists and media producers themselves to transcribe their recordings, generate subtitles and voice-over, or to generate a podcast from an input text.
The demonstrator of the SELMA OSS Platform does not require registration and authentication nor does it store any content, original or generated, after the session is closed by the user.
2024-02-01T00:00:00ZDictionary of Latvian Literary Language (LLVV) (2024-01)Ceplītis, LaimdotsSpektors, Andrejshttp://hdl.handle.net/20.500.12574/962024-02-02T11:38:52Z2024-01-01T00:00:00ZDictionary of Latvian Literary Language (LLVV) (2024-01)
Ceplītis, Laimdots; Spektors, Andrejs
In the 20th century, UL Latvian language institute (former Language and literature institute of the Academy of Sciences) has produced the largest lexicographic source of Latvian language, which has been digitalized (2001–2022) by UL Institute of Mathematics and Computer Sciences. The dictionary contains words of standard Latvian used since 19th century’s 70’s up to the end of the 20th century, when the work on the dictionary was carried out (1972-1996). The dictionary was created using words and example sentences from fiction, science texts, newswire and folklore.
2024-01-01T00:00:00ZDictionary of Contemporary Latvian Language (MLVV) (2023-12-22)Kuplā, IevaLejniece, GuntaMigla, IlgaOldere, LaimdotaOzola, ĀrijaPožarnova, VijaRoze, AnitraŠmidebergs, ImantsŠnē, DorisaŠnē, MāraZuicena, IevaPretkalniņa, LaumaAuziņa, IevaBriede, SantaTimuška, AgrisJansone, Irēna IlgaRapa, Sandahttp://hdl.handle.net/20.500.12574/952024-02-02T11:35:40Z2024-01-01T00:00:00ZDictionary of Contemporary Latvian Language (MLVV) (2023-12-22)
Kuplā, Ieva; Lejniece, Gunta; Migla, Ilga; Oldere, Laimdota; Ozola, Ārija; Požarnova, Vija; Roze, Anitra; Šmidebergs, Imants; Šnē, Dorisa; Šnē, Māra; Zuicena, Ieva; Pretkalniņa, Lauma; Auziņa, Ieva; Briede, Santa; Timuška, Agris; Jansone, Irēna Ilga; Rapa, Sanda
“Contemporary dictionary of Latvian language” (MLVV), which is developed by the UL Latvian Language institute, is a new explanatory dictionary based on Latvian language materials obtained during the last decade. The analysis of the word stock is based on MLVV card files, internet sources, as well as, on last decade’s encyclopaedias and dictionaries. Some of the dictionary content is machine-readable.
2024-01-01T00:00:00ZDictionary of Contemporary Latvian Language (MLVV) (2023-09-21)Jērāne, SantaKuplā, IevaLejniece, GuntaMigla, IlgaOldere, LaimdotaOzola, ĀrijaPožarnova, VijaRoze, AnitraŠmidebergs, ImantsŠnē, DorisaŠnē, MāraZuicena, IevaPretkalniņa, LaumaAuziņa, IevaBriede, SantaŠmidebergs, ImantsTimuška, Agrishttp://hdl.handle.net/20.500.12574/942024-02-02T11:35:40Z2023-01-01T00:00:00ZDictionary of Contemporary Latvian Language (MLVV) (2023-09-21)
Jērāne, Santa; Kuplā, Ieva; Lejniece, Gunta; Migla, Ilga; Oldere, Laimdota; Ozola, Ārija; Požarnova, Vija; Roze, Anitra; Šmidebergs, Imants; Šnē, Dorisa; Šnē, Māra; Zuicena, Ieva; Pretkalniņa, Lauma; Auziņa, Ieva; Briede, Santa; Šmidebergs, Imants; Timuška, Agris
“Contemporary dictionary of Latvian language” (MLVV), which is developed by the UL Latvian Language institute, is a new explanatory dictionary based on Latvian language materials obtained during the last decade. The analysis of the word stock is based on MLVV card files, internet sources, as well as, on last decade’s encyclopaedias and dictionaries. Some of the dictionary content is machine-readable.
2023-01-01T00:00:00ZCorpus of Latvian PhD Theses (Disertācijas)Darģis, Robertshttp://hdl.handle.net/20.500.12574/932023-12-21T16:02:37Z2022-01-01T00:00:00ZCorpus of Latvian PhD Theses (Disertācijas)
Darģis, Roberts
The corpus consists of PhD theses and abstracts published in the University of Latvia, Riga Technical University, Riga Stradins University and Liepaja University until 2020.
2022-01-01T00:00:00ZTēzaurs.lv 2023 (Autumn Edition)Spektors, AndrejsPretkalniņa, LaumaGrūzītis, NormundsPaikens, PēterisRituma, LauraSaulīte, BaibaNešpore-Bērzkalne, GuntaLokmane, IlzeKlints, AguteStāde, MadaraGrasmanis, MikusAuziņa, IlzeZnotiņš, ArtūrsDarģis, RobertsBārzdiņš, Guntishttp://hdl.handle.net/20.500.12574/922023-11-24T10:12:22Z2023-09-01T00:00:00ZTēzaurs.lv 2023 (Autumn Edition)
Spektors, Andrejs; Pretkalniņa, Lauma; Grūzītis, Normunds; Paikens, Pēteris; Rituma, Laura; Saulīte, Baiba; Nešpore-Bērzkalne, Gunta; Lokmane, Ilze; Klints, Agute; Stāde, Madara; Grasmanis, Mikus; Auziņa, Ilze; Znotiņš, Artūrs; Darģis, Roberts; Bārzdiņš, Guntis
Tezaurs.lv is the largest open machine-readable dictionary for Latvian. This version contains more than 397,000 entries based on 346 sources. The dictionary is enriched with phonetic, morphological, derivational, semantic and other annotations, inflection tables, corpus examples, and it is integrated with the Latvian WordNet data.
This dataset is available as open data in TEI/XML and LMF/XML formats. If you are interested in acquiring the corresponding PostgreSQL database dump, please, send a request to info@tezaurs.lv.
2023-09-01T00:00:00ZLVTB - Latvian Treebank v2.13 (2023-11-15)Rituma, LauraPretkalniņa, LaumaSaulīte, BaibaNešpore-Bērzkalne, GuntaGrūzītis, Normundshttp://hdl.handle.net/20.500.12574/912023-12-21T13:54:53Z2023-11-15T00:00:00ZLVTB - Latvian Treebank v2.13 (2023-11-15)
Rituma, Laura; Pretkalniņa, Lauma; Saulīte, Baiba; Nešpore-Bērzkalne, Gunta; Grūzītis, Normunds
Latvian Treebank (LVTB) is being developed since 2010. It is manually annotated according to a hybrid dependency-constituency grammar model. This version of LVTB contains data used for deriving the corresponding version of Latvian UD Treebank (UDLV-LVTB).
2023-11-15T00:00:00ZThe Corpus of Early Written Latvian (2022)Andronova, EveritaSpektors, AndrejsVanags, PēterisBaltiņa, MaijaTrumpa, AntaTrumpa, EdmundsGrūzītis, NormundsSiliņa-Piņķe, RenāteFrīdenberga, AnnaSkrūzmane, ElgaĶauķīte, SintijaPretkalniņa, Laumahttp://hdl.handle.net/20.500.12574/902023-12-21T16:07:52Z2022-01-01T00:00:00ZThe Corpus of Early Written Latvian (2022)
Andronova, Everita; Spektors, Andrejs; Vanags, Pēteris; Baltiņa, Maija; Trumpa, Anta; Trumpa, Edmunds; Grūzītis, Normunds; Siliņa-Piņķe, Renāte; Frīdenberga, Anna; Skrūzmane, Elga; Ķauķīte, Sintija; Pretkalniņa, Lauma
The Corpus of early written Latvian ‘SENIE’ provides access to the texts of written Latvian of the 16th–18th century, and its aim is to facilitate studies of early Latvian in general (e.g. the lexis, morphology and syntax of the texts) and to serve as the basis for "The Historical dictionary of Latvian (16th–17th cc.)". The Corpus was first launched in January 2003, but its development is still in progress.
2022-01-01T00:00:00Z