100 likes | 302 Views
AeroDAML Applying Information Extraction to Generate DAML Annotations. Dr. Paul Kogut Lockheed Martin Management & Data Systems. What is Information Extraction?. Information Extraction. Text or web pages. Entities. Relationships. Co-references. Events. Linguistic Knowledge.
E N D
AeroDAMLApplying Information Extractionto Generate DAML Annotations Dr. Paul Kogut Lockheed Martin Management & Data Systems
What is Information Extraction? Information Extraction Text or web pages Entities Relationships Co-references Events Linguistic Knowledge
Extraction and Semantic Annotation • Consumer-side extraction - 3rd party text -> database • Advantages: • Applicable to raw documents (most of the web) • Disadvantages: • Must deal with full complexity of natural language • Semantic annotation proposed to overcome difficulty of consumer-side extraction - but annotation is labor intensive • Producer-side extraction - authored text -> annotation • Advantages: • Partial-automation - reduces manual effort • Human assisted disambiguation • Domain customization for intranets and B2B e-commerce • Disadvantages: • Requires manual effort to correct and add rich set of relationships • Domain customization requires up-front effort from the author/webmaster • Both types of extraction will coexist.
DAML Annotator AeroDAML Architecture UBOT Annotation Editor refined annotation basic annotation basic annotation Extraction to DAML Translation DAML annotated text or web pages DAML Ontologies Text or web pages Text Extraction
Client-Server AeroDAML • Users: • personnel who routinely produce documents (e.g., intelligence analysts) • personnel who have a large collection of legacy documents
Web-based AeroDAML • Users: • novice/infrequent DAML annotators • people who want to do quick/simple annotation of a web page
AeroDAML Output: Entities <aac:ABSOLUTEDATE rdf:about="December19,1997"> <daml:label><![CDATA[December 19, 1997]]></daml:label> </aac:ABSOLUTEDATE> <aac:AIRCRAFT rdf:about="Dash8Series400"> <daml:label><![CDATA[Dash 8 Series 400]]></daml:label> </aac:AIRCRAFT> <aac:MEASURE rdf:about="61-foot"> <daml:label><![CDATA[61-foot]]></daml:label> </aac:MEASURE>
AeroDAML Output: Relationships <aac:NATION rdf:about="Austria"> <daml:label><![CDATA[Austria]]></daml:label> </aac:NATION> <aac:ORGANIZATION rdf:about="TyroleanAirways"> <aac:OrgToLoc rdf:resource="Austria"/> <daml:label><![CDATA[Tyrolean Airways]]></daml:label> </aac:ORGANIZATION>
AeroDAML Output: Co-reference <aac:PERSON rdf:about="PierreLortie"> <aac:PersToOrg rdf:resource="BombardierRegionalAircraft"/> <daml:equivalentTo rdf:resource="Lortie"/> <daml:label><![CDATA[Pierre Lortie]]></daml:label> </aac:PERSON> <aac:PERSON rdf:about="Lortie"> <daml:label><![CDATA[Lortie]]></daml:label> </aac:PERSON>
AeroDAML Plans • Integrate with annotation editor • Improve Web-based AeroDAML • Allow user to select other ontologies besides the current AeroDAML default ontology for annotation generation: • OpenCyc or Cyc Upper Ontology • CIA World Fact Book • IEEE Standard Upper Ontology • Dublin Core • UNSPSC... • Try AeroDAML! • http://ubot.lockheedmartin.com/ubot/hotdaml/aerodaml.html