Rādīt vienkāršu vienuma ierakstu
| dc.contributor.author | Štekeļs, Jorens |
| dc.date.accessioned | 2026-05-12T12:35:02Z |
| dc.date.available | 2026-05-12T12:35:02Z |
| dc.date.issued | 2026-05-11 |
| dc.identifier.uri | http://hdl.handle.net/20.500.12574/158 |
| dc.description | ConLoan-LV is a multi-purpose contrastive dataset designed for the classification and analysis of Latvian language loanwords, code-switching, and named entities. Replicating and extending the ConLoan methodology, the dataset contains 353 manually validated sentences in the baseline version and 676 in the extended version, with all sentences sourced from the LVK2022 corpus. Each entry is enriched with labels for material borrowings (LOAN), while the extended version adds labels for code-switching (CS) and named entities (NE). Furthermore, the dataset includes native-language semantic equivalents for loanwords and English translations, providing a parallel structure for comparative analysis. This resource is intended for training and benchmarking language models in identifying non-native lexical elements within Latvian language texts. |
| dc.language.iso | lav |
| dc.publisher | University of Latvia |
| dc.relation.isreferencedby | https://doi.org/10.18653/v1/2025.acl-long.1453 |
| dc.rights | Creative Commons - Attribution 4.0 International (CC BY 4.0) |
| dc.rights.uri | http://creativecommons.org/licenses/by/4.0/ |
| dc.rights.label | PUB |
| dc.source.uri | https://github.com/jorenchik/conloan-tools |
| dc.subject | loanwords |
| dc.subject | named entities |
| dc.subject | code-switching |
| dc.title | ConLoan-LV: A Contrastive Dataset for Latvian Language Loanwords, Code-switching, and Named Entities |
| dc.type | corpus |
| metashare.ResourceInfo#ContentInfo.mediaType | text |
| has.files | yes |
| branding | CLARIN Centre of Latvian language resources and tools |
| contact.person | Jorens Štekeļs js18194@edu.lu.lv University of Latvia |
| size.info | 676 sentences |
| files.size | 2043009 |
| files.count | 3 |
Faili šajā vienumā
Lejupielādēt visus vienuma failus (1.95 MB)Šis vienums ir
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Publicly Available
un ir licencēts saskaņā ar:Creative Commons - Attribution 4.0 International (CC BY 4.0)
- Vārds
- Latvian.json
- Lielums
- 714.65 KB
- Formāts
- Nezināms
- Apraksts
- Baseline dataset
- MD5
- b92804615ea9193b080f9a10bb468578
- Vārds
- Latvian_ext.json
- Lielums
- 1.25 MB
- Formāts
- Nezināms
- Apraksts
- Extended dataset
- MD5
- c6619713238b37f4fe87a02754a48a4a
- Vārds
- README.md
- Lielums
- 2.35 KB
- Formāts
- Nezināms
- Apraksts
- Dataset description
- MD5
- 67fb2a6ad6d46ad9e139d4eecdc67832