1 / 13

An Integrated Annotation DB in OntoNotes

An Integrated Annotation DB in OntoNotes. Sameer Pradhan, Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel. http://www.bbn.com/NLP/OntoNotes. Goals. Capture multiple layers of annotation and modeling Syntax Propositions Word sense Ontology Coreference

sophie
Download Presentation

An Integrated Annotation DB in OntoNotes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Integrated Annotation DB in OntoNotes Sameer Pradhan, Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel http://www.bbn.com/NLP/OntoNotes

  2. Goals • Capture multiple layers of annotation and modeling • Syntax • Propositions • Word sense • Ontology • Coreference • Names • Using an integrated relational database representation • Enforces consistency across the different annotations • Supports integrated models that can combine evidence from different layers

  3. Unified Representation • Provide a bare-bones representation independent of the individual semantics that can • Efficiently capture intra- and inter- layer semantics • Maintain component independence • Provide mechanism for flexible integration • Integrate information at the lowest level of granularity • A Relational Database

  4. Corpus Senses Trees Propositions Coreference Names Unified Relational Representation

  5. Example: DB Representation of Syntax • Treebank tokens (stored in the Token table) provide the common base • The Tree table stores the recursive tree nodes, each with its span • Subsidiary tables define the sets of function tags, phase types, etc.

  6. Advantages of an Integrated Representation • Each layer translates into a common representation • Clean, consistent layers • Resolve the inconsistencies and problems that this reveals • Well defined relationships • Database schema defines the merged structure efficiently • Original representations available as predefined views • Treebank, PropBank, etc. • SQL queries can extract examples based on multiple layers or define new views • Python Object-oriented API allows for programmatic access to tables and queries

  7. major reductions and realignments of troops in central Europe S NP SYNTAX NP PP PP IN NP JJ NNS CC NNS IN NP NNS JJ NNP ... major reductions and realignments of troops in central Europe – ... Syntax Layer • Identifies meaningful phrases in the text • Lays out the structure of how they are related Concerns about the pace of the Vienna talks -- which are aimed at the destruction of some 100,000 weapons , as well as major reductions and realignments of troops in central Europe – also are being registered at the Pentagon .

  8. S NP ARG2 NP PP PP ARGM-LOC ARG1 IN NP JJ NNS CC NNS IN NP NNS JJ NNP ... major reductions and realignments of troops in central Europe – ... Propositional Structure • Tells who did what to whom • For both verbs and nouns Concerns about the pace of the Vienna talks -- which are aimed at the destruction of some 100,000 weapons , as well as major reductions and realignments of troops in central Europe – also are being registered at the Pentagon . Concerns about the pace of the Vienna talks -- which are aimed at the destruction of some 100,000 weapons , as well as major reductions and realignments of troops in central Europe – also are being registered at the Pentagon .

  9. Predicate Frames • Predicate frames define the meanings of the numbered arguments Concerns about the pace of the Vienna talks -- which are aimed at the destruction of some 100,000 weapons , as well as major reductions and realignments of troops in central Europe – also are being registered at the Pentagon . Predicate Frames Predicate Frames aim reduction aim.01 – Plan reduce.01 – Make less reduce.01 – Make less ARG0 – Agent ARG1 – Thing falling ARG2 – Amount fallen ARG3 – Starting point ARG4 – Ending point ARG0 – Aimer ARG1 – Action aim.02 – Directed motion aim.02 – Directed motion ARG0 – Aimer ARG1 – Thing in motion ARG2 – Target

  10. Word Sense Word Sense aim register • Point or direct object, weapon, at something ... • Wish, purpose or intend to achieve something • Enter into an official record • Be aware of, enter into someone’s conciousness • Indicate a measurement • Show in one’s face Word Sense and Ontology • Meaning of nouns and verbs are specified • All the senses are annotatable at 90% inter-annotator agreement • Catalog of possible meanings supplied in the sense inventory files • Ontology links (currently being added) will capture similarities between related senses of different words Concerns about the pace of the Vienna talks -- which are aimed at the destruction of some 100,000 weapons , as well as major reductions and realignments of troops in central Europe – also are being registered at the Pentagon . Concerns about the pace of the Vienna talks -- which are aimed at the destruction of some 100,000 weapons , as well as major reductions and realignments of troops in central Europe – also are being registered at the Pentagon . • Enter into an official record • Wish, purpose or intend to achieve something

  11. President Bush conventional arms talk Vienna talks – which are aimed at the destruction of some 100,000 weapons , as well as major reductions and realignments of troops the Pentagon in central Europe the Pentagon He Pentagon e0 e0 e0 Coreference • Identifies different mentions of the same entity in text – especially links definite, referring noun phrases, and pronouns in text • Two types – Identity as well as Attributive coreference tagged. Concerns about the pace of the Vienna talks -- which are aimed at the destruction of some 100,000 weapons , as well as major reductions and realignments of troops in central Europe – also are being registered at the Pentagon .

  12. Example of DB Query Function What is the distribution of named entities that are ARG0s of the predicate “say”? if (proposition.lemma == “say”): for a_proposition in a_proposition_bank: if(a_proposition.lemma != "say"): arg_in_p_q = "select * from argument where proposition_id = '%s';" % (a_proposition.id) a_cursor.execute(arg_in_p_query) argument_rows = a_cursor.fetchall() for a_argument_row in argument_rows: a_argument_id = a_argument_row["id"] a_argument_type = a_argument_row["type"] if(a_argument_type != "ARG0"): n_in_arg_q = "select * from argument_node where argument_id = '%s';" % (a_argument_id) a_cursor.execute(n_in_arg_q) argument_node_rows = a_cursor.fetchall() for a_argument_node_row in argument_node_rows: a_node_id = a_argument_node_row["node_id"] a_ne_node_query = "select * from name_entity where subtree_id = '%s';" % (a_node_id) a_cursor.execute(a_ne_node_query) ne_rows = a_cursor.fetchall() for a_ne_row in ne_rows: a_ne_type = a_ne_row["type"] ne_hash[a_ne_type] = ne_hash[a_ne_type] + 1 a_tree = a_tree_document.get_tree(a_tree_id) a_node = a_tree.get_subtree(a_node_id) for a_child in a_node.subtrees(): a_ne_subtree_query = "select * from name_entity where subtree_id = '%s';" % (a_child.id) subtree_ne_rows = a_cursor.execute(a_ne_subtree_query) ne_subtree_rows = a_cursor.fetchall() for a_ne_subtree_row in ne_subtree_rows: a_subtree_ne_type = a_ne_subtree_row["type"] ne_hash[a_subtree_ne_type] = ne_hash[a_subtree_ne_type] + 1 query = “select * from argument where proposition_id = '%s';” .. if (argument_type == "ARG0"): for child in node.subtrees():

  13. Conclusion • Integrating the annotation layers using a relational schema • Improves consistency • Allows predictive features that combine evidence from multiple layers • Easily Accessible • Through Python API • SQL queries

More Related