Language resources of the Faculty of Science and Technology, University of Latvia

Language resources of the Faculty of Science and Technology, University of Latvia http://hdl.handle.net/20.500.12574/134 2026-07-26T01:25:40Z 2026-07-26T01:25:40Z Embedding Model Fine-Tuning Dataset Deksne, Daiga http://hdl.handle.net/20.500.12574/136 2025-09-16T14:56:25Z 2025-08-27T00:00:00Z

Embedding Model Fine-Tuning Dataset Deksne, Daiga Dataset for Embedding Model Fine-Tuning has been created within the framework of the National Research Program project "Analysis of the applicability of artificial intelligence methods in the field of EU fund projects". For the purposes of this project, we fine-tuned the bge-m3 model developed by BAAI (Chen et al., 2024). For fine-tuning, we collected DOCX procurement documents from the Electronic Procurement System (https://www.eis.gov.lv/EKEIS/Supplier/), allocating 7,083 files to the training set and 50 files to the validation set. The text from these documents was extracted and segmented. For each text segment, we used the OpenAI gpt-4o model to generate statements or questions whose correctness can be verified against that specific segment. License and Attribution The dataset is distributed under the CC-BY-NC-SA license: https://creativecommons.org/licenses/by-nc-sa/4.0/. When using this dataset, please cite as: Project "Analysis of the Applicability of Artificial Intelligence Methods in the Field of European Union Fund Projects" (VPP-CFLA-Mākslīgais intelekts-2024/1-0003). Dataset for Embedding Model Fine-Tuning. Licensed under CC BY-NC-SA 4.0.

2025-08-27T00:00:00Z Procurement Validation Dataset Deksne, Daiga Skadiņš, Raivis Hohbergs, Andris Jaunzars, Rūdolfs Petrovs, Andrejs Rūdule, Justīne Pinnis, Mārcis http://hdl.handle.net/20.500.12574/135 2025-09-16T14:52:44Z 2025-08-27T00:00:00Z

Procurement Validation Dataset Deksne, Daiga; Skadiņš, Raivis; Hohbergs, Andris; Jaunzars, Rūdolfs; Petrovs, Andrejs; Rūdule, Justīne; Pinnis, Mārcis The Procurement Validation Dataset was created within the framework of the State Research Programme project "Analysis of the Applicability of Artificial Intelligence Methods in the Field of European Union Fund Projects". The dataset consists of 30 procurement documents evaluated by CFCA experts. The procurement checklists prepared by the experts have been transformed into machine-readable form. For each procurement, 168 questions are asked regarding its compliance with legislation, and each question has an answer provided. The dataset is divided into two subsets: a development dataset (10 procurements) and an evaluation dataset (20 procurements). The dataset consists of: 1) questions based on the checklist S.7.1.-PL-21 (09.12.2019 edition); 2) a labeled dataset corresponding to 30 procurements evaluated by CFCA. The dataset is distributed under the CC-BY-NC-SA license: https://creativecommons.org/licenses/by-nc-sa/4.0/ When using this dataset, please cite as: Project "Analysis of the Applicability of Artificial Intelligence Methods in the Field of European Union Fund Projects" (VPP-CFLA-Mākslīgais intelekts-2024/1-0003). Procurement Validation Dataset. Licensed under CC BY-NC-SA 4.0.

2025-08-27T00:00:00Z