Show simple item record

 
dc.contributor.author Levāne-Petrova, Kristīne
dc.contributor.author Darģis, Roberts
dc.contributor.author Pokratniece, Kristīne
dc.contributor.author Lasmanis, Viesturs Jūlijs
dc.date.accessioned 2023-04-06T09:09:33Z
dc.date.available 2023-04-06T09:09:33Z
dc.date.issued 2023
dc.identifier.uri http://hdl.handle.net/20.500.12574/84
dc.description The Balanced Corpus of Modern Latvian, which contains unique texts not yet included in other so far developed balanced corpora (LVK2013 and LVK2018). The corpus is primarily based on the design principles of previous balanced corpora. It contains authentic contemporary texts (mostly created after 2000) of various genres with metadata. Unlike its predecessors, this balanced corpus contains texts in the original language as well as translations. When selecting the texts to be included in the corpus from the web, first all current pages from one domain are collected and the content corresponding to the corpus is retrieved. The next processing step consisted of dividing the text into paragraphs and deleting duplicates or paragraphs irrelevant to the corpus (texts in foreign languages, tables, etc.). Paragraphs in some fiction documents have been rearranged alphabetically to comply with the contractual obligations to publishing companies. The balanced corpus has been comprised of the processed documents according to the following proportions of language genres: journalism (60%), fiction (10%), scientific (10%), Wikipedia (7%), legal (7%), parliamentary transcripts (3%) and subtitles (3%).
dc.language.iso lav
dc.publisher AiLab IMCS UL
dc.source.uri https://korpuss.lv/id/LVK2022
dc.subject text
dc.subject general
dc.subject representative
dc.subject morphology
dc.title Balanced Corpus of Modern Latvian (LVK2022)
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files no
branding CLARIN Centre of Latvian language resources and tools
demo.uri https://nosketch.korpuss.lv/#dashboard?corpname=LVK2022
contact.person Kristīne Levāne-Petrova kristine.levanepetrova@lumii.lv AiLab IMCS UL
sponsor Latvian Language Agency grant agreement No. 4.6/2019-029 Enlargement and Development of the Latvian National Text Corpus nationalFunds
size.info 122877749 tokens
files.size 0
files.count 0
featuredService.nosketch search|https://nosketch.korpuss.lv/#dashboard?corpname=LVK2022


Show simple item record