<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
<title>Language resources of the Faculty of Science and Technology, University of Latvia</title>
<link href="http://hdl.handle.net/20.500.12574/134" rel="alternate"/>
<subtitle/>
<id>http://hdl.handle.net/20.500.12574/134</id>
<updated>2026-04-15T03:00:28Z</updated>
<dc:date>2026-04-15T03:00:28Z</dc:date>
<entry>
<title>Embedding Model Fine-Tuning Dataset</title>
<link href="http://hdl.handle.net/20.500.12574/136" rel="alternate"/>
<author>
<name>Deksne, Daiga</name>
</author>
<id>http://hdl.handle.net/20.500.12574/136</id>
<updated>2025-09-16T14:56:25Z</updated>
<published>2025-08-27T00:00:00Z</published>
<summary type="text">Embedding Model Fine-Tuning Dataset
Deksne, Daiga
Dataset for Embedding Model Fine-Tuning has been created within the framework of the National Research Program project "Analysis of the applicability of artificial intelligence methods in the field of EU fund projects". For the purposes of this project, we fine-tuned the bge-m3 model developed by BAAI (Chen et al., 2024).&#13;
&#13;
For fine-tuning, we collected DOCX procurement documents from the Electronic Procurement System (https://www.eis.gov.lv/EKEIS/Supplier/), allocating 7,083 files to the training set and 50 files to the validation set. The text from these documents was extracted and segmented. For each text segment, we used the OpenAI gpt-4o model to generate statements or questions whose correctness can be verified against that specific segment.&#13;
&#13;
License and Attribution&#13;
&#13;
The dataset is distributed under the CC-BY-NC-SA license: https://creativecommons.org/licenses/by-nc-sa/4.0/.  When using this dataset, please cite as:  &#13;
&#13;
Project "Analysis of the Applicability of Artificial Intelligence Methods in the Field of European Union Fund Projects" (VPP-CFLA-Mākslīgais intelekts-2024/1-0003). Dataset for Embedding Model Fine-Tuning. Licensed under CC BY-NC-SA 4.0.
</summary>
<dc:date>2025-08-27T00:00:00Z</dc:date>
</entry>
<entry>
<title>Procurement Validation Dataset</title>
<link href="http://hdl.handle.net/20.500.12574/135" rel="alternate"/>
<author>
<name>Deksne, Daiga</name>
</author>
<author>
<name>Skadiņš, Raivis</name>
</author>
<author>
<name>Hohbergs, Andris</name>
</author>
<author>
<name>Jaunzars, Rūdolfs</name>
</author>
<author>
<name>Petrovs, Andrejs</name>
</author>
<author>
<name>Rūdule, Justīne</name>
</author>
<author>
<name>Pinnis, Mārcis</name>
</author>
<id>http://hdl.handle.net/20.500.12574/135</id>
<updated>2025-09-16T14:52:44Z</updated>
<published>2025-08-27T00:00:00Z</published>
<summary type="text">Procurement Validation Dataset
Deksne, Daiga; Skadiņš, Raivis; Hohbergs, Andris; Jaunzars, Rūdolfs; Petrovs, Andrejs; Rūdule, Justīne; Pinnis, Mārcis
The Procurement Validation Dataset was created within the framework of the State Research Programme project "Analysis of the Applicability of Artificial Intelligence Methods in the Field of European Union Fund Projects". &#13;
&#13;
The dataset consists of 30 procurement documents evaluated by CFCA experts. The procurement checklists prepared by the experts have been transformed into machine-readable form. For each procurement, 168 questions are asked regarding its compliance with legislation, and each question has an answer provided. &#13;
&#13;
The dataset is divided into two subsets: a development dataset (10 procurements) and an evaluation dataset (20 procurements). The dataset consists of:  &#13;
1) questions based on the checklist S.7.1.-PL-21 (09.12.2019 edition);  &#13;
2) a labeled dataset corresponding to 30 procurements evaluated by CFCA.&#13;
&#13;
The dataset is distributed under the CC-BY-NC-SA license: https://creativecommons.org/licenses/by-nc-sa/4.0/&#13;
When using this dataset, please cite as:  &#13;
&#13;
Project "Analysis of the Applicability of Artificial Intelligence Methods in the Field of European Union Fund Projects" (VPP-CFLA-Mākslīgais intelekts-2024/1-0003). Procurement Validation Dataset. Licensed under CC BY-NC-SA 4.0.
</summary>
<dc:date>2025-08-27T00:00:00Z</dc:date>
</entry>
</feed>
