Frameworks and Sample Graphs
The shared task combines five frameworks for graph-based meaning representation, each with its specific formal and linguistic assumptions. Definitions of basic graph theoretic terminology and of relevant formal properties are available as a separate page. Following are some example graphs for sentence #20209013 from the venerable Wall Street Journal Corpus (WSJ):
A similar technique is almost impossible to apply to other crops, such as cotton, soybeans and rice.
The example exhibits some interesting linguistic complexity, including a so-called tough adjective (impossible), an arguably scopal adverb (almost), a tripartite coordinate structure, and apposition. The example graphs below are presented in order of (arguably) increasing ‘abstraction’ from the surface string, i.e. ranging from anchored Flavor (1) semantic graphs to unanchored Flavor (2) graphs.
EDS: Elementary Dependency Structures
The EDS graphs originally derive from the underspecified logical forms computed by the English Resource Grammar (Flickinger et al., 2017, Copestake et al., 2005, Flickinger, 2000), which Flickinger et al. (2015) dub English Resource Semantics (ERS). Elementary Dependency Structures (EDS; Oepen & Lønning, 2016) encode English Resource Semantics in a variable-free semantic dependency graph, where graph nodes correspond to logical predications and edges to labeled argument positions. The EDS conversion from underspecified logical forms to variable-free graphs discards the partial information on semantic scope from the full ERS, which makes these graphs abstractly—if not linguistically—quite similar to Abstract Meaning Representation (see below).
Nodes in EDS are in principle independent of surface lexical units, but for each node there is an explicit, many-to-many anchoring onto sub-strings of the underlying sentence. Thus, EDS instantiates Flavor (1) in our hierarchy of different formal types of semantic graphs. In the EDS analysis for the running example, nodes representing covert quantifiers (on bare nominals, labeled udef_q), the two-place such+as_p relation, as well as the implicit_conj(unction) relation (which reflects recursive decomposition of the coordinate structure into binary predications) do not correspond to individual surface tokens (but are anchored on larger spans, overlapping with anchors from other nodes). Conversely, the meaning contribution of the (inherently comparative) similar is lexically decomposed into two nodes (even though the ARG2 of the comp relation remains unexpressed in this usage), both anchored to the same surface token.
PTG: Prague Tectogrammatical Graphs
These graphs present a conversion from the multi-layered (and somewhat richer) annotations in the tradition of Prague Functional Generative Description (FGD; Sgall et al., 1986), as adopted (among others) in the Prague Czech–English Dependency Treebank (PCEDT; Hajič et al., 2012). PTG graphs essentially recast core predicate–argument structure in the form of mostly anchored dependency graphs, albeit introducing ‘empty’ (or generated, in FGD terminology) nodes, for which there is no corresponding surface token. Thus, these representations instantiate Flavor (1) in our hierarchy of different formal types of semantic graphs.
Although most nodes in PTG correspond to surface tokens, these graphs are neither fully connected nor rooted trees, i.e. some tokens from the underlying sentence remain structurally isolated, for some nodes there are multiple incoming edges, and ‘empty’ nodes have no overt surface correspondence. In the example PTG graph below, there are two generated nodes to represent the unexpressed BEN(efactive) of the impossible relation as well as the unexpressed ACT(or) argument of the three-place apply relation, respectively; these nodes are related by an edge indicating grammatical coreference. In this graph, the indefinite determiner, infinitival to, and the vacuous preposition marking the deep object of apply can be argued to not have a semantic contribution of their own.
Unlike some of the other frameworks represented in the shared task, the PTG graph for our running example analyzes the predicative copula as semantically contentful and does not treat almost as ‘scoping’ over the entire graph. While the conversion to PTG discards some additional information from the original FGD annotations, it also makes explicit what are called effective dependencies, i.e. enhances the graph structure For our running example, the PTG graph has recursively propagated argument dependencies to both elements of the apposition and to all members of the coordinate structure.
UCCA: Universal Conceptual Cognitive Annotation
A second instance of Flavor (1) semantic graphs in the shared task is presented by the Universal Conceptual Cognitive Annotation framework (UCCA; Abend & Rappoport, 2013). UCCA is a comparatively recent initiative, which targets a level of semantic granularity that abstracts away from syntactic paraphrases in a typologically-motivated, cross-linguistic fashion—without relying on language-specific resources—while setting a low threshold for annotator training. UCCA has been the subject of a recently completed parsing task at SemEval 2019. The UCCA parser by Hershcovich et al. (2018) is one of the few instances of multi-task learning across meaning representation frameworks to date.
A UCCA analysis of a text passage is a directed acyclic graph over semantic elements called units, which are the nodes of the graph. A unit corresponds to (is anchored by) one or more tokens and can be related to its parent unit with one or more semantic categories (i.e. edges). The basic unit of annotation is the scene, denoting a situation mentioned in the sentence, typically involving a predicate, participants, and potentially modifiers. Linguistically, UCCA adopts a notion of semantic constituency that transcends pure dependency graphs, in the sense of introducing separate, unlabeled nodes (the units). The shared task limits itself to UCCA annotations at what is called the foundational layer, where there is a comparatively coarse inventory of different relations.
The UCCA graph for the running example (see below) includes a single scene, whose main relation is the Process (P) evoked by apply. It also contains a secondary relation labeled Adverbial (D), almost impossible, which is broken down into its Center (C) and Elaborator (E); as well as two complex arguments, labeled as Participants (A). Unlike the other frameworks in the task, UCCA integrates all surface tokens into the graph, as the targets of semantically bleached Function (F), and Punctuation (U) edges. UCCA graphs need not be rooted trees: argument sharing across units will give rise to reentrant nodes much like in the other frameworks. For example, technique is both a (remote) Participant in the scene evoked by similar and a Center in the parent unit. UCCA in principle also supports implicit (unexpressed) units which do not correspond to any tokens, but these are currently excluded from parsing evaluation and, thus, suppressed in the UCCA graphs distributed in the context of the shared task.
AMR: Abstract Meaning Representation
The task further includes Abstract Meaning Representation (AMR; Banarescu et al., 2013), which in our hierarchy of different formal types of semantic graphs is simply unanchored, i.e. represents what we call Flavor (2). The AMR framework backgrounds notions of compositionality and derivation and, accordingly, declines to make explicit how elements of the graph correspond to the surface utterance. Although most AMR parsing research presupposes a pre-processing step that aligns graph nodes with (possibly discontinuous) sets of tokens in the underlying input, these correspondences are not part of the meaning representation proper. At the same time, AMR frequently invokes lexical decomposition and normalization towards verbal senses, such that AMR graphs quite generally appear to ‘abstract’ furthest from the surface signal. Since the first general release of an AMR graph bank in 2014, the framework has provided a popular target for data-driven meaning representation parsing and has been the subject of two consecutive tasks at SemEval 2016 and 2017.
The AMR example graph below has a topology broadly comparable to EDS, with notable differences. The nodes corresponding to similar and such as, for example, in AMR are analyzed as derived from the resemble-01 and exemplify-01 verbal senses. Furthermore, the AMR representation of the coordinate structure is flat, the directionality of :mod(ifier) edges reversed in comparison to EDS, and there is no meaning contribution annotated for the determiner a (let alone for covert determiners in bare nominals).
DRG: Discourse Representation Graphs
Finally, Discourse Representation Graphs (DRG) provide a graph encoding of Discourse Representation Structure (DRS), the meaning representations at the core of Discourse Representation Theory (DRT; Kamp, 1984; Van der Sandt, 1992; Kamp and Reyle, 1993; Asher, 1993). DRSs can model many challenging semantic phenomena, for example, quantifiers, negation, pronoun resolution, presupposition accommodation, and discourse structure. Moreover, they are translatable into first-order logic formulas to account for logical inference.
DRGs are derived from the DRS annotations in the Parallel Meaning Bank (PMB; Abzianidze et al., 2017; Bos et al., 2017). Concepts are represented by WordNet 3.0 senses and semantic roles by the adapted version of VerbNet roles. Although the annotations in the PMB are compositionally derived from lexical semantics, anchoring information is not explicit in its DRSs; thus, (like AMR) the DRG framework formally instantiates our Flavor (2) of meaning representation graphs. A different (clause- rather than graph-based) encoding of DRSs was used in the 2019 DRS parsing shared task.
The top node in the DRG of the sample sentence corresponds to the main clause headed by the copula is, but there is another node that is structurally a root of the graph, corresponding to the presupposition triggered by other. The node of the embedded clause with apply is subordinated to both these roots, depicted by incoming edges labelled with discourse relations. The square shape of nodes serves as a visual aid: These nodes represent scopal contexts inspired by the shape of DRSs. Similarly, different node shapes are applied to visually differentiate nodes corresponding to discourse referents (oval shapes) vs. ones for reifications of binary relations (diamonds; different DRG node types are directly recoverable from the fixed set of predefined binary relations). For example, the two nodes with the same label of crop.n.01 represent two distinct discourse referents (one of them introduced by the presupposition) since an inequality relation (NEQ) is asserted between them. Finally, entities which semantically behave as constants are represented as nodes labeled with quoted strings, for instance the indexical "now", modelling the time of speech.