Latvian AMR Sembank

Name: Latvian AMR Sembank
License: http://creativecommons.org/licenses/by-sa/4.0/

Znotiņš, Artūrs; Paikens, Pēteris; Grūzītis, Normunds

dc.contributor.author	Znotiņš, Artūrs
dc.contributor.author	Paikens, Pēteris
dc.contributor.author	Grūzītis, Normunds
dc.date.accessioned	2021-05-21T15:51:22Z
dc.date.available	2021-05-21T15:51:22Z
dc.date.issued	2020
dc.identifier.uri	http://hdl.handle.net/20.500.12574/40
dc.description	An automatically derived AMR annotation layer of the FullStack multi-layer text corpus of Latvian. First, Latvian UD Treebank (v2.5) sentences were translated to English using a state-of-the-art Latvian-English neural MT system (Hugo.lv). Second, a state-of-the-art AMR parser for English (AMREager) was applied to the MT-translated sentences. Additionally, alignment information is provided for both Latvian-English translations and English-AMR parses.
dc.language.iso	eng
dc.language.iso	lav
dc.publisher	AiLab IMCS UL
dc.relation.isreferencedby	http://www.lrec-conf.org/proceedings/lrec2018/pdf/935.pdf
dc.rights	Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri	http://creativecommons.org/licenses/by-sa/4.0/
dc.rights.label	PUB
dc.source.uri	https://github.com/LUMII-AILab/FullStack
dc.subject	AMR
dc.subject	Abstract Meaning Representation
dc.title	Latvian AMR Sembank
dc.type	corpus
metashare.ResourceInfo#ContentInfo.detailedType	other
metashare.ResourceInfo#ContentInfo.mediaType	text
has.files	yes
branding	CLARIN Centre of Latvian language resources and tools
demo.uri	https://github.com/LUMII-AILab/FullStack/tree/master/AMR
contact.person	Normunds Grūzītis normundsg@ailab.lv IMCS UL
sponsor	European Regional Development Fund 1.1.1.1/16/A/219 Full Stack of Language Resources for Natural Language Understanding and Generation in Latvian nationalFunds
size.info	12,691 sentences
files.size	12370336
files.count	4

Files in this item

Download all files in item (11.8 MB)

This item is

Publicly Available

and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Name: lv-amr-train.txt
Size: 9.14 MB
Format: Text file
Description: Training data
MD5: 1cfff686c00b5cae11340398518048e7

Download file Preview

File Preview

# ::id a-n136-p21s1
# ::snt -LRB- 10 -RRB- Directive 80/987/EEC should be amended accordingly .
# ::snt_lv (10) Attiecīgi ir jāgroza Direktīva 80/987/EEK.
# ::alignments 0-1|0.4 1-2|0.3 2-3|0.2 3-4|0.1.0 4-5|0.1 5-6|0 7-8|0.0 8-9|0.0.0 
# ::alignments_translation 0-0 1-1 2-2 6-3 7-4 4-5 5-6 5-7 3-8 8-9
(v5 / recommend-01
    :ARG1 (v6 / amend-01
        :manner (v7 / accordingly))
    :time (v4 / genericconcept
        :mod (v3 / directive))
    :ARG1 (v2 / -rrb-)
    :li 10
    :condition (v1 / -lrb-))

# ::id a-p5878-p1s1
# ::snt The family of David and Victoria Beckham live in Los Angeles , but it looks like they could move to England soon .
# ::snt_lv Deivida un Viktorijas Bekhemu ģimene mitinās Losandželosā, bet izskatās, ka drīzumā viņi varētu pārvākties uz Angliju.
# ::alignments 1-2|0.3 3-4|0.2.0+0.2.0.0+0.2.0.0.0+0.2.0.1 4-5|0.2 5-7|0.0.0+0.0.0.0+0.0.0.0.0+0.0.0.0.1 7-8|0.0 9-11|0.0.1+0.0.1.0+0.0.1.0.0+0.0.1.0.1+0.0.1.1 12-13|0 14-15|0.1 15-16|0.1.0 17-18|0.1.0.0 18-19|0.1.0.0 . . .

Name: lv-amr-dev.txt
Size: 1.43 MB
Format: Text file
Description: Development data
MD5: b0676cffe4f613578c91a3b33f8d70bc

Download file Preview

File Preview

# ::id a-p3754-p5s1
# ::snt The company started the project in 2001 .
# ::snt_lv Uzņēmums projekta realizāciju sāka 2001.gadā.
# ::alignments 1-2|0.0 2-3|0 4-5|0.1 6-7|0.1.0+0.1.0.0 
# ::alignments_translation 0-0 0-1 3-2 2-3 1-4 5-5 4-6 6-6 7-7
(v2 / start-01
    :ARG0 (v1 / company)
    :ARG1 (v3 / project
        :time (v4 / date-entity
            :year 2001)))

# ::id a-p3754-p5s2
# ::snt Since then , more than seven million lats have been invested in landfill development .
# ::snt_lv Kopš tā laika poligona attīstībā ieguldīti vairāk nekā septiņi miljoni latu.
# ::alignments 0-1|0.1 3-4|0.0.0 7-8|0.0 10-11|0 12-13|0.2.0 13-14|0.2 
# ::alignments_translation 0-0 1-1 2-1 6-2 6-3 7-4 8-5 9-6 10-7 5-8 5-9 5-10 4-11 3-12 4-13 11-14
(v4 / invest-01
    :ARG1 (v3 / lat
        :quant (v2 / more))
    :op1 (v1 / since)
    :ARG2 (v6 / develop-02
        :ARG1 (v5 / landfill)
        :ARG0 v3))

# ::id a-d62-p38s1
# ::snt `` We wo n't be let in there , we do n't have such a fine key ! ''
# . . .

Name: lv-amr-test.txt
Size: 1.22 MB
Format: Text file
Description: Test data
MD5: 118bfbf48eb1d1536091a94c0cb790e0

Download file Preview

File Preview

# ::id a-p3753-p48s1
# ::snt FOR INFORMATION
# ::snt_lv INFORMĀCIJAI
# ::alignments 1-2|0 
# ::alignments_translation 0-0 0-1
(v1 / information)

# ::id a-p15196-p5s1
# ::snt However , he summarized that unpopular decisions in the education sector would have the `` back '' of the Prime Minister if they were rational .
# ::snt_lv Tomēr viņš rezumēja, ka nepopulāriem lēmumiem izglītības nozarē būs premjera "aizmugure", ja tie būs racionāli.
# ::alignments 3-4|0 5-6|0.0.0+0.0.0.0+0.0.0.0.0 6-7|0.0 9-10|0.0.1.0 10-11|0.0.1 14-15|0.0.2 15-16|0.0.2.0 16-17|0.0.3 19-20|0.0.3.0.2 20-21|0.0.3.0+0.0.3.0.0 24-25|0.0.3.0.1 
# ::alignments_translation 0-0 1-1 1-2 2-3 3-4 4-4 5-5 6-6 8-7 7-8 7-9 8-10 9-11 9-12 10-13 11-14 12-15 13-16 10-17 10-18 10-19 10-20 14-21 15-21 16-22 17-23 18-24 19-25
(v1 / summarize-01
    :ARG1 (v4 / decide-01
        :ARG1 (v2 / thing
            :degree-of (v3 / popular-02
                :polarity -))
        :location (v6 / sector
            :mod (v5 / educate-01)) . . .

Name: README.txt
Size: 1.64 KB
Format: Text file
Description: Data format description of automatically derived AMR annotation layer of the FullStack-LV multi-layer text corpus. See also: https://github.com/LUMII-AILab/FullStack/tree/master/AMR.
MD5: ab87e5b03a6721f7e6e82beffb577290

Download file Preview

File Preview

DATA FRORMAT

METADATA

id - the sentence ID according to the Latvian UD Treebank data.
snt_lv - the original sentence (Latvian).
snt - the machine-translated sentence (English).
alignments_translation - token-level alignment between the Latvian and English sentences.
alignments - alignment between tokens (English) and nodes of the AMR graph (see the alignment format: https://github.com/lil-lab/amr/blob/07b581e7d160fa2625eeefa86ae5e9fe5c589be2/utils/jamr/docs/Alignment_Format.md).

AMR GRAPH

The English translation encoded as an AMR graph in the PENMAN notation (see the language specification: https://amr.isi.edu/language.html).

Each record in the dataset is separated by an empty line. The dataset is split into subsets for training, development and testing using the same split as for the Latvian UD Treebank (v2.5).

EXAMPLE

# ::id a-p3315-p3s6
# ::snt Sister Ilze is also a model .
# ::snt_lv Māsa Ilze arī ir modelētāja.
# ::alignments 0-1|0.2+0.2.0+0.2.0.0 1-2|0 . . .

Show simple item record

Files in this item

Partners, Coordination, Funding

Repository

More