| dc.contributor.author | Štekeļs, Jorens |
| dc.date.accessioned | 2026-05-12T12:35:02Z |
| dc.date.available | 2026-05-12T12:35:02Z |
| dc.date.issued | 2026-05-11 |
| dc.identifier.uri | http://hdl.handle.net/20.500.12574/158 |
| dc.description | ConLoan-LV is a multi-purpose contrastive dataset designed for the classification and analysis of Latvian language loanwords, code-switching, and named entities. Replicating and extending the ConLoan methodology, the dataset contains 353 manually validated sentences in the baseline version and 676 in the extended version, with all sentences sourced from the LVK2022 corpus. Each entry is enriched with labels for material borrowings (LOAN), while the extended version adds labels for code-switching (CS) and named entities (NE). Furthermore, the dataset includes native-language semantic equivalents for loanwords and English translations, providing a parallel structure for comparative analysis. This resource is intended for training and benchmarking language models in identifying non-native lexical elements within Latvian language texts. |
| dc.language.iso | lav |
| dc.publisher | University of Latvia |
| dc.relation.isreferencedby | https://doi.org/10.18653/v1/2025.acl-long.1453 |
| dc.rights | Creative Commons - Attribution 4.0 International (CC BY 4.0) |
| dc.rights.uri | http://creativecommons.org/licenses/by/4.0/ |
| dc.rights.label | PUB |
| dc.source.uri | https://github.com/jorenchik/conloan-tools |
| dc.subject | loanwords |
| dc.subject | named entities |
| dc.subject | code-switching |
| dc.title | ConLoan-LV: A Contrastive Dataset for Latvian Language Loanwords, Code-switching, and Named Entities |
| dc.type | corpus |
| metashare.ResourceInfo#ContentInfo.mediaType | text |
| has.files | yes |
| branding | CLARIN Centre of Latvian language resources and tools |
| contact.person | Jorens Štekeļs js18194@edu.lu.lv University of Latvia |
| size.info | 676 sentences |
| files.size | 2043009 |
| files.count | 3 |
Files in this item
Download all files in item (1.95 MB)This item is
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Publicly Available
and licensed under:Creative Commons - Attribution 4.0 International (CC BY 4.0)
- Name
- Latvian.json
- Size
- 714.65 KB
- Format
- Unknown
- Description
- Baseline dataset
- MD5
- b92804615ea9193b080f9a10bb468578
- Name
- Latvian_ext.json
- Size
- 1.25 MB
- Format
- Unknown
- Description
- Extended dataset
- MD5
- c6619713238b37f4fe87a02754a48a4a
- Name
- README.md
- Size
- 2.35 KB
- Format
- Unknown
- Description
- Dataset description
- MD5
- 67fb2a6ad6d46ad9e139d4eecdc67832