Rādīt vienkāršu vienuma ierakstu
| dc.contributor.author | Grasmanis, Mikus |
| dc.contributor.author | Valkovska, Baiba |
| dc.contributor.author | Levāne-Petrova, Kristīne |
| dc.date.accessioned | 2025-12-19T13:53:53Z |
| dc.date.available | 2025-12-19T13:53:53Z |
| dc.date.issued | 2025-12-19 |
| dc.identifier.uri | http://hdl.handle.net/20.500.12574/148 |
| dc.description | This frequency list contains the 25,000 most frequent Latvian lemmas, obtained from 18 morphologically annotated corpora totalling 1.5 billion tokens from the Latvian National Corpora Collection (Korpuss.lv) and Tēzaurs.lv. Supporting academic and practical applications, including language teaching, machine translation, and speech technologies, the list provides a broader and more representative view of the modern Latvian lexicon and usage trends. |
| dc.language.iso | lav |
| dc.publisher | AiLab IMCS UL |
| dc.rights | Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) |
| dc.rights.uri | http://creativecommons.org/licenses/by-sa/4.0/ |
| dc.rights.label | PUB |
| dc.subject | frequency data |
| dc.title | Latvian word frequency dataset |
| dc.type | lexicalConceptualResource |
| metashare.ResourceInfo#ContentInfo.detailedType | wordList |
| metashare.ResourceInfo#ContentInfo.mediaType | text |
| has.files | yes |
| branding | CLARIN Centre of Latvian language resources and tools |
| contact.person | Normunds Grūzītis normundsg@ailab.lv IMCS UL |
| sponsor | Ministry of Education and Science VPP-IZM-DH-2022/1-0002 Towards Development of Open and FAIR Digital Humanities Ecosystem in Latvia (DHELI) nationalFunds |
| sponsor | EU Recovery and Resilience Facility 2.3.1.1.i.0/1/22/I/CFLA/002 Language Technology Initiative euFunds |
| size.info | 25000 words |
| files.size | 527519 |
| files.count | 1 |
Faili šajā vienumā
Šis vienums ir
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Publicly Available
un ir licencēts saskaņā ar:Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
- Vārds
- frequencies-lv-25K.tsv
- Lielums
- 515.16 KB
- Formāts
- Nezināms
- Apraksts
- word frequencies per mille in tsv format (lemma, part of speech and frequency). POs is annotate according to the Latvian tagset: https://korpuss.lv/static/media/LV_TagSet_v.2.2.4_20250301.pdf.
- MD5
- d1e8e4b6d75e74c02a03280ca603c336