Show simple item record

 
dc.contributor.author Grasmanis, Mikus
dc.contributor.author Valkovska, Baiba
dc.contributor.author Levāne-Petrova, Kristīne
dc.date.accessioned 2025-12-19T13:53:53Z
dc.date.available 2025-12-19T13:53:53Z
dc.date.issued 2025-12-19
dc.identifier.uri http://hdl.handle.net/20.500.12574/148
dc.description This frequency list contains the 25,000 most frequent Latvian lemmas, obtained from 18 morphologically annotated corpora totalling 1.5 billion tokens from the Latvian National Corpora Collection (Korpuss.lv) and Tēzaurs.lv. Supporting academic and practical applications, including language teaching, machine translation, and speech technologies, the list provides a broader and more representative view of the modern Latvian lexicon and usage trends.
dc.language.iso lav
dc.publisher AiLab IMCS UL
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri http://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label PUB
dc.subject frequency data
dc.title Latvian word frequency dataset
dc.type lexicalConceptualResource
metashare.ResourceInfo#ContentInfo.detailedType wordList
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN Centre of Latvian language resources and tools
contact.person Normunds Grūzītis normundsg@ailab.lv IMCS UL
sponsor Ministry of Education and Science VPP-IZM-DH-2022/1-0002 Towards Development of Open and FAIR Digital Humanities Ecosystem in Latvia (DHELI) nationalFunds
sponsor EU Recovery and Resilience Facility 2.3.1.1.i.0/1/22/I/CFLA/002 Language Technology Initiative euFunds
size.info 25000 words
files.size 527519
files.count 1


 Files in this item

This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Icon
Name
frequencies-lv-25K.tsv
Size
515.16 KB
Format
Unknown
Description
word frequencies per mille in tsv format (lemma, part of speech and frequency). POs is annotate according to the Latvian tagset: https://korpuss.lv/static/media/LV_TagSet_v.2.2.4_20250301.pdf.
MD5
d1e8e4b6d75e74c02a03280ca603c336
 Download file

Show simple item record