What's New

 corpus 
corpus
Description:
The CSV dataset contains sentence pairs for a text-to-text transformation task: given a sentence that contains 0..n abbreviations, rewrite (normalize) the sentence in full words (word forms). Training dataset: 64,665 ...
 This item contains 1 file (6.73 MB).
 
Publicly Available
 corpus 
corpus
Description:
The Balanced Corpus of Modern Latvian, which contains unique texts not yet included in other so far developed balanced corpora (LVK2013 and LVK2018). The corpus is primarily based on the design principles of previous ...
 This item contains no files.
 corpus 
corpus
Description:
Corpus contains texts of the magazine "Karogs" from 1940 to 1994.
 This item contains no files.
 
Publicly Available

Most Viewed Items

Top Last Week
 lexicalConceptualResource 
lexicalConceptualResource
Description:
Tezaurs.lv is the largest open machine-readable dictionary for Latvian. This version contains nearly 390,000 entries compiled from more than 330 sources. The dictionary is enriched with phonetic, morphological, semantic ...
 This item contains 1 file (24.76 MB).
 
Publicly Available
 corpus 
corpus
Description:
A text corpus of orthographic transcription of a Latvian medical speech corpus. It consists of 900 transcripts (documents) of a ~35 hour radiology speech corpus. Modalities covered: CT, MR, MG, CR, US.
 This item contains 1 file (267.18 KB).
 
Publicly Available
 toolService 
toolService
Author(s):
Description:
LVBERT is the first publicly available monolingual BERT language model pre-trained for Latvian. For training we used the original implementation of BERT on TensorFlow with the whole-word masking and the next sentence ...
 This item contains 3 files (1.51 GB).
 
Publicly Available