Public Release of Open MRP

In November 2021, we are (finally) releasing those parts of the MRP 2019 and 2020 data sets that can be shared publicly, which includes the training, validation, and evaluation splits for all EDS, DRG, and UCCA graphs, as well as the Czech PTG graphs. Please watch this page for further download instructions.

English Data

The table below summarizes the English training, validation, and evaluation data that is provided for the cross-framework track of the shared task. The task operates as what is at times called a closed training regime, i.e. participants are constrained in which additional data and pre-trained models are legitimate to use in system development; see below. While some of the semantic graph frameworks in the task continue to evolve and continuously make available revised and extended data, we anticipate that these selections will provide stable reference points for empirical comparison for at least a couple of years following the task.

	EDS	PTG	UCCA	AMR	DRG
Training Data
Text Type	newspaper	newspaper	mixed	mixed	mixed
Sentences	37,192	42,024	6,872	57,885	6,605
Tokens	725,165	861,719	145,536	915,791	36,394
Validation Data
Text Type	mixed	mixed	mixed	mixed	mixed
Sentences	3,302	1,664	1,585	3,560	885
Tokens	55,360	33,994	22,085	57,542	4,473
Evaluation Data
Text Type	mixed	newspaper	wikipedia	mixed	mixed
Sentences	4,040	2,507	600	2,457	898
Tokens	58,406	49,228	15,405	42,852	4,913

The training data for EDS and PTG draws from a homogeneous source, the venerable WSJ text first annotated in the Penn Treebank (PTB), notably Sections 00–20. As a common point of reference, a smallish sample of WSJ sentences annotated in all five frameworks is available for public download.

UCCA training annotations are over web reviews text from the English Web Treebank, and from English Wikipedia articles on celebrities. While in principle UCCA structures are not confined to a single sentence (about 0.18% of edges cross sentence boundaries), passages are split to individual sentences, discarding inter-relations between them to create a standard setting across the frameworks.

AMR annotations are drawn from a wide variety of texts, with the majority of sentences coming from on-line discussion forums. The training corpus also contains newswire, folktales, fiction, and Wikipedia articles.

The texts annotated in the DRG framework are sourced from a wide range of genres, including Tatoeba, News-Commentary, Recognizing Textual Entailment, Sherlock Holmes stories, and the Bible.

Because some of the semantic graph banks involved in the shared task have originally been released by the Linguistic Data Consortium (LDC), we rely on the LDC to distribute the training data to participants under no-cost evaluation licenses. Registration for the task will be a prerequisite to data access. Upon completion of the competition, we will package all task data (including system submissions and evaluation results) for general release by the LDC, as well as make available those subsets that are copyright-free for public, open-source download.

Additional Languages

Transcending its 2019 predecessor shared task, MRP 2020 introduces an additional track on cross-lingual meaning representation parsing. This track provides training and evaluation data in one additional language for four of the five frameworks represented in the English-only cross-framework track (but regrettably not EDS), albeit different languages per framework (owing to scarcity of gold-standard semantic annotations across languages). Cross-lingual training data will be made available to task participants toward the end of May 2020.

	PTG	UCCA	AMR	DRG
Language	Czech	German	Chinese	German
Training Data
Text Type	newspaper	mixed	mixed	mixed
Sentences	43,955	4,125	18,365	1,157
Tokens	637,084	81,915	428,055	7,479
Evaluation Data
Sentences	5,476	444	1,713	444
Tokens	79,464	8,714	39,228	1,970

Companion Data

At a technical level, training (and evaluation) data is distributed in two formats, (a) as sequences of ‘raw’ sentence strings and (b) in pre-tokenized, PoS-tagged, and lemmatized form. For the latter, we provide premium-quality morpho-syntactic analyses to participants, by training a state-of-the-art dependency parser (the post-futuristic development version of UDPipe; Straka, 2018) on the union of available syntactic training data for each language and using jack-knifing (where required) to avoid overlap of morpho-syntactic training data with the texts underlying the semantic graph banks of the task. These parser outputs, in the context of MRP 2020, are referred to as morpho-syntactic companion trees. Whether as merely a source of fairly decent OntoNotes-style tokenization (the convention used in Universal Dependencies too), or as a vantage point for approaches to meaning representation parsing that start from explicit syntactic structure, this optional resource will hopefully offer community value in its own right. The underlying parsing models and software will become publicly available upon completion of the shared task. Additionally, the companion package will include automatically generated reference anchorings (commonly called ‘alignments’ in AMR parsing) for the English AMR graphs in the training data (obtained from the JAMR and ISI tools of Flanigan et al., 2016, and Pourdamghani et al., 2014,, as well as companion anchorings for the English and German DRG annotations.

For reasons of comparability and fairness, the MRP 2020 shared task constrains which additional data or pre-trained models (e.g. corpora, word embeddings, lexica, or other annotations) can be legitimately used besides the resources distributed by the task organizers. The overall principle is that all participants should in principle be able to use the same range of data. However, the organizers expect to keep such constraints to the minimum required and invite participants to suggest relevant data or models. To make precise which resources can be used in system development in addition to the data provided by the task organizers, there is an official ‘white-list’ of legitimate resources. The organizers welcome suggestions for additional data to white-list; in case you anticipate wanting to use resources that are not currently on the MRP white-list, please contact the organizers no later than June 15, 2020. The list will be closed and frozen after that date.

Evaluation Data

For all five frameworks, there will be held-out (‘unseen’) test sets, for which parser inputs only are made available to participants at the start of the evaluation phase. For EDS, PTG, and UCCA (where training data is relatively homogeneous), the test data will comprise both ‘in-domain’ and ‘out-of-domain’ text, i.e. sentences that are either abstractly similar or dissimilar to the text types represented in the training data. Furthermore, the task organizers will prepare a new (smallish) test set with gold-standard annotations in all frameworks. The instructions for prospective participants provide further information on the nature and scope of evaluation data for MRP 2020.

The evaluation data will be published in the same file format as the training and companion data, viz. the JSON-based uniform MRP interchange format. The target graphs (i.e. the nodes, edges, and tops fields) will of course not be available until completion of the evaluation period, but high-quality tokenization, PoS tags, lemmatization, and syntactic dependency trees will be provided for the evaluation data in the same manner as through the morpho-syntactic companion trees for the training data.

Uniform Graph Interchange Format

Besides differences in anchoring, the frameworks also vary in how they label nodes and edges, and to what degree they allow multiple edges between two nodes, multiple outgoing edges of the same label, or multiple instances of the same property on a node. Node labels for Flavor (0) graphs (present in the MRP 2019 task but not in 2020) typically are lemmas, optionally combined with a (morpho-syntactic) part of speech and a (syntactico-semantic) sense or frame identifier. Node labels for the other graph flavors tend to be more abstract, i.e. are interpreted as concept or relation identifiers (where for the vast majority, of course, there too is a systematic relationship to lemmas, lexical categories, and (sub-)senses). Graph nodes in UCCA are formally unlabeled, and anchoring is used to relate leaf nodes of these graphs to input sub-strings. Conversely, edge labels in all cases come from a fixed and relatively small inventory of (semantic) argument names, though there is stark variation in label granularity (ranging between about a dozen in UCCA and around 90 or 100 in PTG and AMR, respectively). For the shared task, we have for the first time repackaged the five graph banks into a uniform and normalized abstract representation with a common serialization format.

The common interchange format for semantic graphs implements the abstract model of Kuhlmann & Oepen (2016) as a JSON-based serialization for graphs across frameworks. This format describes general directed graphs, with structured node and edge labels, and optional anchoring and ordering of nodes. JSON is easily manipulated in all programming languages and offers parser developers the option of ‘in situ’ augmentation of the graph reprensentations from the task with system-specific additional information, e.g. by adding private properties to the JSON objects. The MRP serialization is based on the JSON Lines format, where a stream of objects is serialized with line breaks as the separator character. Each MRP graph is represented as a JSON object with top-level properties tops, nodes, and edges; these are discussed in more detail below. Additionally, the input property on all graphs presents the ‘raw’ surface string corresponding to this graph; thus, parser inputs for the task are effectively assumed to be sentence-segmented but not pre-tokenized. Additional information about each graph is provided as properties id (a string), flavor (an integer in the range 0–2), framework (a string), version (a decimal number), and time (a string in YYYY-MM-DD form, encoding when the graph was serialized). Optionally, graphs can use string-valued provenance and source properties, to record metadata about the underlying resource from which the MRP encoding has been derived.

The nodes and edges values on graphs each are list-valued, but the order among list elements is only meaningful for the nodes of Flavor (0) graphs. Node objects have an obligatory id property (an integer) and optional properties called label, properties and values, as well as anchors. The label (a string) has a distinguished status in evaluation; the properties and values are both list-valued, such that elements between the lists correspond by position. Together, the two lists present a framework-specific, non-recursive attribute–value matrix (where duplicate properties are in principle allowed). The anchors list, if present, contains pairs of from–to sub-string indices into the input string of the graph. Finally, the edge objects in the top-level edges list all have two integer-valued properties: source and target, which encode the start and end nodes, respectively, to which the edge is incident. For all frameworks except DRG, all edges in the MRP collection further have a (string-valued) label property, although formally this is considered optional. Parallel to graph nodes, edges can carry framework-specific attributes and values lists; in MRP 2020, only the PTG and UCCA framework make use of edge attributes. Starting in June 2020, version 1.1 of the MRP serialization also (optionally) allows the id and anchors fields on edges and introduces a third, order-coded array anchorings on nodes (to record anchors for individual node properties, separate from the node anchoring at large).

Graph Analysis Software

For format conversion, graph analysis, visualization, and evaluation tasks in the MRP 2020 context, we provide the mtool software (the Swiss Army Knife of Meaning Representation), which is hosted in a public Microsoft GitHub repository to stimulate community engagement.