| dc.contributor.author | Grasmanis, Mikus |
| dc.contributor.author | Valkovska, Baiba |
| dc.contributor.author | Levāne-Petrova, Kristīne |
| dc.date.accessioned | 2025-12-19T13:53:53Z |
| dc.date.available | 2025-12-19T13:53:53Z |
| dc.date.issued | 2025-12-19 |
| dc.identifier.uri | http://hdl.handle.net/20.500.12574/148 |
| dc.description | This frequency list contains the 25,000 most frequent Latvian lemmas, obtained from 18 morphologically annotated corpora totalling 1.5 billion tokens from the Latvian National Corpora Collection (Korpuss.lv) and Tēzaurs.lv. Supporting academic and practical applications, including language teaching, machine translation, and speech technologies, the list provides a broader and more representative view of the modern Latvian lexicon and usage trends. |
| dc.language.iso | lav |
| dc.publisher | AiLab IMCS UL |
| dc.rights | Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) |
| dc.rights.uri | http://creativecommons.org/licenses/by-sa/4.0/ |
| dc.rights.label | PUB |
| dc.subject | frequency data |
| dc.title | Latvian word frequency dataset |
| dc.type | lexicalConceptualResource |
| metashare.ResourceInfo#ContentInfo.detailedType | wordList |
| metashare.ResourceInfo#ContentInfo.mediaType | text |
| has.files | yes |
| branding | CLARIN Centre of Latvian language resources and tools |
| contact.person | Normunds Grūzītis normundsg@ailab.lv IMCS UL |
| sponsor | Ministry of Education and Science VPP-IZM-DH-2022/1-0002 Towards Development of Open and FAIR Digital Humanities Ecosystem in Latvia (DHELI) nationalFunds |
| sponsor | EU Recovery and Resilience Facility 2.3.1.1.i.0/1/22/I/CFLA/002 Language Technology Initiative euFunds |
| size.info | 25000 words |
| files.size | 527519 |
| files.count | 1 |
Files in this item
This item is
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
- Name
- frequencies-lv-25K.tsv
- Size
- 515.16 KB
- Format
- Unknown
- Description
- word frequencies per mille in tsv format (lemma, part of speech and frequency). POs is annotate according to the Latvian tagset: https://korpuss.lv/static/media/LV_TagSet_v.2.2.4_20250301.pdf.
- MD5
- d1e8e4b6d75e74c02a03280ca603c336