Kas jauns
corpus

Apraksts:
The CSV dataset contains sentence pairs for a text-to-text transformation task: given a sentence that contains 0..n abbreviations, rewrite (normalize) the sentence in full words (word forms).
Training dataset: 64,665 ...
Šajā vienumā ir 1 fails (6.73
MB).
Publicly Available
corpus

Apraksts:
The Balanced Corpus of Modern Latvian, which contains unique texts not yet included in other so far developed balanced corpora (LVK2013 and LVK2018). The corpus is primarily based on the design principles of previous ...
Šajā vienumā nav failu.
corpus

Apraksts:
Corpus contains texts of the magazine "Karogs" from 1940 to 1994.
Šajā vienumā nav failu.
Publicly Available
Visvairāk skatītie vienumi
Populārākie pēdējā nedēļā
lexicalConceptualResource

Apraksts:
Tezaurs.lv is the largest open machine-readable dictionary for Latvian. This version contains nearly 390,000 entries compiled from more than 330 sources. The dictionary is enriched with phonetic, morphological, semantic ...
Šajā vienumā ir 1 fails (24.76
MB).
Publicly Available
corpus

Apraksts:
A text corpus of orthographic transcription of a Latvian medical speech corpus. It consists of 900 transcripts (documents) of a ~35 hour radiology speech corpus. Modalities covered: CT, MR, MG, CR, US.
Šajā vienumā ir 1 fails (267.18
KB).
Publicly Available
toolService

Apraksts:
LVBERT is the first publicly available monolingual BERT language model pre-trained for Latvian. For training we used the original implementation of BERT on TensorFlow with the whole-word masking and the next sentence ...
Šajā vienumā ir 3 faili (1.51
GB).
Publicly Available