Show simple item record

 
dc.contributor.author Štekeļs, Jorens
dc.date.accessioned 2026-05-12T12:35:02Z
dc.date.available 2026-05-12T12:35:02Z
dc.date.issued 2026-05-11
dc.identifier.uri http://hdl.handle.net/20.500.12574/158
dc.description ConLoan-LV is a multi-purpose contrastive dataset designed for the classification and analysis of Latvian language loanwords, code-switching, and named entities. Replicating and extending the ConLoan methodology, the dataset contains 353 manually validated sentences in the baseline version and 676 in the extended version, with all sentences sourced from the LVK2022 corpus. Each entry is enriched with labels for material borrowings (LOAN), while the extended version adds labels for code-switching (CS) and named entities (NE). Furthermore, the dataset includes native-language semantic equivalents for loanwords and English translations, providing a parallel structure for comparative analysis. This resource is intended for training and benchmarking language models in identifying non-native lexical elements within Latvian language texts.
dc.language.iso lav
dc.publisher University of Latvia
dc.relation.isreferencedby https://doi.org/10.18653/v1/2025.acl-long.1453
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri http://creativecommons.org/licenses/by/4.0/
dc.rights.label PUB
dc.source.uri https://github.com/jorenchik/conloan-tools
dc.subject loanwords
dc.subject named entities
dc.subject code-switching
dc.title ConLoan-LV: A Contrastive Dataset for Latvian Language Loanwords, Code-switching, and Named Entities
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN Centre of Latvian language resources and tools
contact.person Jorens Štekeļs js18194@edu.lu.lv University of Latvia
size.info 676 sentences
files.size 2043009
files.count 3


 Files in this item

 Download all files in item (1.95 MB)
This item is
Publicly Available
and licensed under:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Icon
Name
Latvian.json
Size
714.65 KB
Format
Unknown
Description
Baseline dataset
MD5
b92804615ea9193b080f9a10bb468578
 Download file
Icon
Name
Latvian_ext.json
Size
1.25 MB
Format
Unknown
Description
Extended dataset
MD5
c6619713238b37f4fe87a02754a48a4a
 Download file
Icon
Name
README.md
Size
2.35 KB
Format
Unknown
Description
Dataset description
MD5
67fb2a6ad6d46ad9e139d4eecdc67832
 Download file

Show simple item record