Faili šajā vienumā

 Lejupielādēt visus vienuma failus (11.8 MB)
Šis vienums ir
Publicly Available
un ir licencēts saskaņā ar:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Icon
Vārds
lv-amr-train.txt
Lielums
9.14 MB
Formāts
Teksta fails
Apraksts
Training data
MD5
1cfff686c00b5cae11340398518048e7
 Lejupielādēt failu  Priekšskatījums
 Faila priekšskatījums  
# ::id a-n136-p21s1
# ::snt -LRB- 10 -RRB- Directive 80/987/EEC should be amended accordingly .
# ::snt_lv (10) Attiecīgi ir jāgroza Direktīva 80/987/EEK.
# ::alignments 0-1|0.4 1-2|0.3 2-3|0.2 3-4|0.1.0 4-5|0.1 5-6|0 7-8|0.0 8-9|0.0.0 
# ::alignments_translation 0-0 1-1 2-2 6-3 7-4 4-5 5-6 5-7 3-8 8-9
(v5 / recommend-01
    :ARG1 (v6 / amend-01
        :manner (v7 / accordingly))
    :time (v4 / genericconcept
        :mod (v3 / directive))
    :ARG1 (v2 / -rrb-)
    :li 10
    :condition (v1 / -lrb-))

# ::id a-p5878-p1s1
# ::snt The family of David and Victoria Beckham live in Los Angeles , but it looks like they could move to England soon .
# ::snt_lv Deivida un Viktorijas Bekhemu ģimene mitinās Losandželosā, bet izskatās, ka drīzumā viņi varētu pārvākties uz Angliju.
# ::alignments 1-2|0.3 3-4|0.2.0+0.2.0.0+0.2.0.0.0+0.2.0.1 4-5|0.2 5-7|0.0.0+0.0.0.0+0.0.0.0.0+0.0.0.0.1 7-8|0.0 9-11|0.0.1+0.0.1.0+0.0.1.0.0+0.0.1.0.1+0.0.1.1 12-13|0 14-15|0.1 15-16|0.1.0 17-18|0.1.0.0 18-19|0.1.0.0 . . .
                                            
Icon
Vārds
lv-amr-dev.txt
Lielums
1.43 MB
Formāts
Teksta fails
Apraksts
Development data
MD5
b0676cffe4f613578c91a3b33f8d70bc
 Lejupielādēt failu  Priekšskatījums
 Faila priekšskatījums  
# ::id a-p3754-p5s1
# ::snt The company started the project in 2001 .
# ::snt_lv Uzņēmums projekta realizāciju sāka 2001.gadā.
# ::alignments 1-2|0.0 2-3|0 4-5|0.1 6-7|0.1.0+0.1.0.0 
# ::alignments_translation 0-0 0-1 3-2 2-3 1-4 5-5 4-6 6-6 7-7
(v2 / start-01
    :ARG0 (v1 / company)
    :ARG1 (v3 / project
        :time (v4 / date-entity
            :year 2001)))

# ::id a-p3754-p5s2
# ::snt Since then , more than seven million lats have been invested in landfill development .
# ::snt_lv Kopš tā laika poligona attīstībā ieguldīti vairāk nekā septiņi miljoni latu.
# ::alignments 0-1|0.1 3-4|0.0.0 7-8|0.0 10-11|0 12-13|0.2.0 13-14|0.2 
# ::alignments_translation 0-0 1-1 2-1 6-2 6-3 7-4 8-5 9-6 10-7 5-8 5-9 5-10 4-11 3-12 4-13 11-14
(v4 / invest-01
    :ARG1 (v3 / lat
        :quant (v2 / more))
    :op1 (v1 / since)
    :ARG2 (v6 / develop-02
        :ARG1 (v5 / landfill)
        :ARG0 v3))

# ::id a-d62-p38s1
# ::snt `` We wo n't be let in there , we do n't have such a fine key ! ''
# . . .
                                            
Icon
Vārds
lv-amr-test.txt
Lielums
1.22 MB
Formāts
Teksta fails
Apraksts
Test data
MD5
118bfbf48eb1d1536091a94c0cb790e0
 Lejupielādēt failu  Priekšskatījums
 Faila priekšskatījums  
# ::id a-p3753-p48s1
# ::snt FOR INFORMATION
# ::snt_lv INFORMĀCIJAI
# ::alignments 1-2|0 
# ::alignments_translation 0-0 0-1
(v1 / information)

# ::id a-p15196-p5s1
# ::snt However , he summarized that unpopular decisions in the education sector would have the `` back '' of the Prime Minister if they were rational .
# ::snt_lv Tomēr viņš rezumēja, ka nepopulāriem lēmumiem izglītības nozarē būs premjera "aizmugure", ja tie būs racionāli.
# ::alignments 3-4|0 5-6|0.0.0+0.0.0.0+0.0.0.0.0 6-7|0.0 9-10|0.0.1.0 10-11|0.0.1 14-15|0.0.2 15-16|0.0.2.0 16-17|0.0.3 19-20|0.0.3.0.2 20-21|0.0.3.0+0.0.3.0.0 24-25|0.0.3.0.1 
# ::alignments_translation 0-0 1-1 1-2 2-3 3-4 4-4 5-5 6-6 8-7 7-8 7-9 8-10 9-11 9-12 10-13 11-14 12-15 13-16 10-17 10-18 10-19 10-20 14-21 15-21 16-22 17-23 18-24 19-25
(v1 / summarize-01
    :ARG1 (v4 / decide-01
        :ARG1 (v2 / thing
            :degree-of (v3 / popular-02
                :polarity -))
        :location (v6 / sector
            :mod (v5 / educate-01)) . . .
                                            
Icon
Vārds
README.txt
Lielums
1.64 KB
Formāts
Teksta fails
Apraksts
Data format description of automatically derived AMR annotation layer of the FullStack-LV multi-layer text corpus. See also: https://github.com/LUMII-AILab/FullStack/tree/master/AMR.
MD5
ab87e5b03a6721f7e6e82beffb577290
 Lejupielādēt failu  Priekšskatījums
 Faila priekšskatījums  
DATA FRORMAT

METADATA

id - the sentence ID according to the Latvian UD Treebank data.
snt_lv - the original sentence (Latvian).
snt - the machine-translated sentence (English).
alignments_translation - token-level alignment between the Latvian and English sentences.
alignments - alignment between tokens (English) and nodes of the AMR graph (see the alignment format: https://github.com/lil-lab/amr/blob/07b581e7d160fa2625eeefa86ae5e9fe5c589be2/utils/jamr/docs/Alignment_Format.md).

AMR GRAPH

The English translation encoded as an AMR graph in the PENMAN notation (see the language specification: https://amr.isi.edu/language.html).

Each record in the dataset is separated by an empty line. The dataset is split into subsets for training, development and testing using the same split as for the Latvian UD Treebank (v2.5).

EXAMPLE

# ::id a-p3315-p3s6
# ::snt Sister Ilze is also a model .
# ::snt_lv Māsa Ilze arī ir modelētāja.
# ::alignments 0-1|0.2+0.2.0+0.2.0.0 1-2|0 . . .