1 / 27

Semantic issues for (Web) Data

Semantic issues for (Web) Data. Aggregation of sparse data Obama isCommanderInChiefOf “US Army” ; Obama isChildOf “Obama Sr.” Obama differentFrom “SportLocal Primul Loc” Obama expiry “2008-01-09” Obama friend “EscolaDeAikido” Event discovery and recognition

sklar
Download Presentation

Semantic issues for (Web) Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semantic issues for (Web) Data • Aggregation of sparse data • Obama isCommanderInChiefOf “US Army” ; • Obama isChildOf “Obama Sr.” • Obama differentFrom “SportLocal Primul Loc” • Obama expiry “2008-01-09” • Obama friend “EscolaDeAikido” • Event discovery and recognition • { Water in pot ; pot on the hot stove ; egg in water }∴ ??? • Meaning as a relational thing • Does “Egg” have a meaning independently of its relations? no, if meaning is what someone does (or might do) with something? • cf. Dewey, Bridgman, Fillmore, Minsky, Schank, Brian Smith, Gibson, ... • Dynamics (induction, abduction) of general and domain relations • Cook(Egg, Stove) as Absorb_heat(Entity, Heat_source) as Becoming(Entity, Cause) 1

  2. Expertise patterns • Evidence that units of expertise are larger than what we have from average linked data triples, or ontology learning • “Blinking” effects in reacting to events, in evaluating the actions and theories of the others, in understanding context, in interpreting news and ads, etc. • How to create a boundary unifying multiple triples/axioms? • “Competency questions” try to express these units as requirements • Which objects take part in a certain event? • Which tasks should be executed in order to achieve a certain goal? • What’s the function of that artifact? • Does this behaviour conform to a certain rule? • What norms are applicable to a certain case? • What norm is superordinated among these ones? • What inflammation is active in what body part with what morphology? • But how to drafting the design an ontology directly from text? Cf. Gentner’s “uniform relational representation is a hallmark of expertise” 2

  3. Manifestation of frames • In knowledge engineering • Content/domain/semantic patterns • CLIB Components • Competency questions • In linguistic resources • Sentences • Sub-categorization frames • Lexico-syntactic patterns • Lexical frames (FrameNet, VerbNet, etc.) • Question patterns • Factoids • In data • Data patterns • Query types and views • Microformats, schema.org • Infoboxes • Encyclopedic patterns • Tagging patterns • In interaction • Interaction patterns • Lenses • HTML templates 3

  4. How many? • We are collecting, reengineering, and aligning frames from different knowledge formats • Web data: microformats, microdata, HTML templates • Linked data: induced schemas, data patterns • (Semantic web) ontologies: extracted modules, query patterns • Linguistics: FrameNet, VerbNet, NLP-extracted selectional restrictions, etc. • Text: what this presentation is about 4

  5. Ontology learning: what structures? • State of art: what do we learn? • instances (NER): BarackObama • classes (sense tagging): Person, or even TelevisionActor • relations between instances: CabezaDeVaca departed Spain • axioms: mostly taxonomic or disjointness, some work also on learning restrictions: TelevisionActor disjointWith TVSeries • entire ontologies, but only as collections of independently learnt axioms 5

  6. Kinds of text analyses • Lead example: • “In early 1527, Cabeza De Vaca departed Spain as the treasurer of the Narvaez royal expedition to occupy the mainland of North America. After landing near Tampa Bay, Florida on April 15, 1528, Cabeza De Vaca and three other men would be the only survivors of the expedition party of 600 men.” 6

  7. Shallow parsing (Alchemy) Language recognition Relation (action) extraction • Language • english • Person (1) • the only survivors • GeographicFeature (1) • Tampa Bay • Continent (1) • North America • Country (1) • Spain • StateOrCounty (1) • Florida • Concept Tags (8) • United States • Florida • Gulf of Mexico • North America • Tampa, Florida • Narváez expedition • American Civil War • Tampa Bay • Category • Culture & Politics (confidence: 0.6598) Named entity recognition Concept tagging Subject classification

  8. Entity resolution (Wikifier) Wikipedia-based topic detection Named entity resolution

  9. Multimedia enrichment (Zemanta) Related images Related links and geo-referencing Related news

  10. Linked data positioning (RelFinder)

  11. Deep parsing • Usage of syntactic parse trees • Formal semantic interpretation on top of syntax 11

  12. Categorial parsing (c&c) w(1, 1, 'In', 'in', 'IN', 'I-PP', 'O', '(S[X]/S[X])/NP'). w(1, 2, 'early', 'early', 'JJ', 'I-NP', 'I-DAT', 'N/N'). w(1, 3, '1527', '1527', 'CD', 'I-NP', 'I-DAT', 'N'). w(1, 4, ',', ',', ',', 'O', 'I-DAT', ','). w(1, 5, 'Cabeza', 'Cabeza', 'NNP', 'I-NP', 'I-ORG', 'N/N'). w(1, 6, 'De', 'De', 'NNP', 'I-NP', 'I-ORG', 'N/N'). w(1, 7, 'Vaca', 'Vaca', 'NNP', 'I-NP', 'I-ORG', 'N'). w(1, 8, 'departed', 'depart', 'VBD', 'I-VP', 'O', '((S[dcl]\NP)/PP)/NP'). w(1, 9, 'Spain', 'Spain', 'NNP', 'I-NP', 'O', 'N'). w(1, 10, 'as', 'as', 'IN', 'I-PP', 'O', 'PP/NP'). w(1, 11, 'the', 'the', 'DT', 'I-NP', 'O', 'NP[nb]/N'). w(1, 12, 'treasurer', 'treasurer', 'NN', 'I-NP', 'O', 'N'). w(1, 13, 'of', 'of', 'IN', 'I-PP', 'O', '(NP\NP)/NP'). w(1, 14, 'the', 'the', 'DT', 'I-NP', 'O', 'NP[nb]/N'). w(1, 15, 'Narvaez', 'Narvaez', 'NNP', 'I-NP', 'I-ORG', 'N/N'). w(1, 16, 'royal', 'royal', 'NN', 'I-NP', 'O', 'N/N'). w(1, 17, 'expedition', 'expedition', 'NN', 'I-NP', 'O', 'N'). w(1, 18, 'to', 'to', 'TO', 'I-VP', 'O', '(S[to]\NP)/(S[b]\NP)'). w(1, 19, 'occupy', 'occupy', 'VB', 'I-VP', 'O', '(S[b]\NP)/NP'). w(1, 20, 'the', 'the', 'DT', 'I-NP', 'O', 'NP[nb]/N'). w(1, 21, 'mainland', 'mainland', 'NN', 'I-NP', 'O', 'N'). w(1, 22, 'of', 'of', 'IN', 'I-PP', 'O', '(NP\NP)/NP'). w(1, 23, 'North', 'North', 'NNP', 'I-NP', 'I-LOC', 'N/N'). w(1, 24, 'America', 'America', 'NNP', 'I-NP', 'I-LOC', 'N'). w(1, 25, '.', '.', '.', 'O', 'O', '.'). w(1, 26, 'After', 'after', 'IN', 'I-PP', 'O', '(S[X]/S[X])/NP'). w(1, 27, 'landing', 'landing', 'NN', 'I-NP', 'O', 'N'). w(1, 28, 'near', 'near', 'IN', 'I-PP', 'O', '(NP\NP)/NP'). w(1, 29, 'Tampa', 'Tampa', 'NNP', 'I-NP', 'I-LOC', 'N/N'). w(1, 30, 'Bay', 'Bay', 'NNP', 'I-NP', 'I-LOC', 'N'). w(1, 31, ',', ',', ',', 'O', 'O', ','). w(1, 32, 'Florida', 'Florida', 'NNP', 'I-NP', 'I-LOC', 'N'). w(1, 33, 'on', 'on', 'IN', 'I-PP', 'O', '(NP\NP)/NP'). w(1, 34, 'April', 'April', 'NNP', 'I-NP', 'I-DAT', 'N/N[num]'). w(1, 35, '15', '15', 'CD', 'I-NP', 'I-DAT', 'N[num]'). w(1, 36, ',', ',', ',', 'I-NP', 'I-DAT', ','). w(1, 37, '1528', '1528', 'CD', 'I-NP', 'I-DAT', 'N\N'). w(1, 38, ',', ',', ',', 'O', 'O', ','). w(1, 39, 'Cabeza', 'Cabeza', 'NNP', 'I-NP', 'I-ORG', 'N/N'). w(1, 40, 'De', 'De', 'NNP', 'I-NP', 'I-ORG', 'N/N'). w(1, 41, 'Vaca', 'Vaca', 'NNP', 'I-NP', 'I-ORG', 'N'). w(1, 42, 'and', 'and', 'CC', 'O', 'O', 'conj'). w(1, 43, 'three', 'three', 'CD', 'I-NP', 'O', 'N/N'). w(1, 44, 'other', 'other', 'JJ', 'I-NP', 'O', 'N/N'). w(1, 45, 'men', 'man', 'NNS', 'I-NP', 'O', 'N'). w(1, 46, 'would', 'would', 'MD', 'I-VP', 'O', '(S[dcl]\NP)/(S[b]\NP)'). w(1, 47, 'be', 'be', 'VB', 'I-VP', 'O', '(S[b]\NP)/NP'). w(1, 48, 'the', 'the', 'DT', 'I-NP', 'O', 'NP[nb]/N'). w(1, 49, 'only', 'only', 'JJ', 'I-NP', 'O', 'N/N'). w(1, 50, 'survivors', 'survivor', 'NNS', 'I-NP', 'O', 'N'). w(1, 51, 'of', 'of', 'IN', 'I-PP', 'O', '(NP\NP)/NP'). w(1, 52, 'the', 'the', 'DT', 'I-NP', 'O', 'NP[nb]/N'). w(1, 53, 'expedition', 'expedition', 'NN', 'I-NP', 'O', 'N/N'). w(1, 54, 'party', 'party', 'NN', 'I-NP', 'O', 'N'). w(1, 55, 'of', 'of', 'IN', 'I-PP', 'O', '(NP\NP)/NP'). w(1, 56, '600', '600', 'CD', 'I-NP', 'O', 'N/N'). w(1, 57, 'men', 'man', 'NNS', 'I-NP', 'O', 'N'). w(1, 58, '.', '.', '.', 'O', 'O', '.').

  13. Relation extraction with OIE (Etzioni et al.) • Cabeza De Vaca ___departed___Spain • 0.9059944442645228 • In early 1527 , Cabeza De Vaca departed Spain as the treasurer of the Narvaez royal expedition to occupy the mainland of North America . • IN JJ CD , NNP NNP NNP VBD NNP IN DT NN IN DT NNP JJ NN TO VB DT NN IN NNP NNP . • B-PP B-NP I-NP O B-NP I-NP I-NP B-VP B-NP B-PP B-NP I-NP I-NP I-NP I-NP I-NP I-NP B-VP I-VP B-NP I-NP I-NP I-NP I-NP O • cabeza de vaca___depart___spain • three other men___would be the only survivors of___the expedition party of 600 men • 0.36592485864619345 • After landing near Tampa Bay , Florida on April 15 , 1528 , Cabeza De Vaca and three other men would be the only survivors of the expedition party of 600 men . • IN NN IN NNP NNP , NNP IN NNP CD , CD , NNP NNP NNP CC CD JJ NNS MD VB DT JJ NNS IN DT NN NN IN CD NNS . • B-PP B-NP B-PP B-NP I-NP O B-NP B-PP B-NP I-NP I-NP I-NP O B-NP I-NP I-NP O B-NP I-NP I-NP B-VP I-VP B-NP I-NP I-NP I-NP I-NP I-NP I-NP I-NP I-NP I-NP O • # other men___be survivor of___the expedition party of # men 13

  14. DRT • Discourse Representation Theory • Hans Kamp: “A theory of truth and semantic representation”, 1981 • A sentence meaning is taken to be an update operation on a context • DRT represents “the discourse context” as a discourse representation structure (or DRS). A DRS includes: • A set of referents: the entities which have been introduced into the context • A set of conditions: the predicates which are known to hold of these entities • Basically (a fragment of) FOL • Simplified quantification • DRT bound to visible (“in praesentia”) linguistic context • “Every look he gives you I get sicker and sicker.” (Tristan to Isolde) 14

  15. DRT notation

  16. Boxer • Implementation of computational semantics (J. Bos) with DRT output and Davidsonian predicates: reification of n-ary relations • E.g. preparing a coffee: agent, instrument, mix, place, time, method, ... • Semantic role labellingwithVerbNet or FrameNet roles • Pragmatic grasp, with statistical NER and sense tagging (M. Ciaramita’s SST), tense logic, co-reference, presupposition, sentence integration, entailment, ... 16

  17. DRT semantic parsing + semantic role labeling in Boxer

  18. Issues when porting (Boxer) DRT to RDF • Discourse referent variables: implicit or explicit? • No terminology recognition/extraction • No term compositionality • No periphrastic relations for properties (e.g. survivorOf) • Redundant variables • Missing definitional pragmatics • Implicit local restrictions • Many types of redundant “boxing” for embedded propositions, non-standard negation, etc. • How to effectively map to RDF/OWL? 18

  19. Using FRED (EKAW2012) http://wit.istc.cnr.it/stlab-tools/fred/

  20. FRED RDF graph output

  21. Another example The New York Times reported that John McCarthy died. He invented the programming language LISP. here the interesting part is about resolving co-reference across the two sentences, and dealing with “surface” meta-level (the declarative proposition of reporting)

  22. Heuristics to map Boxer DRT to RDF (1/2) • Default predicates • H_pred1: special vocabulary for defaults, e.g. boxer:Perrdf:type owl:Class • H_pred2: mappings to existing vocabularies, e.g. foaf:Person ; dul:associatedWith ; time:Interval • Default roles (SRL binary predicates) • H_rol: VerbNet/FrameNet vocabularies, e.g. verbnet:agent rdf:type owl:ObjectProperty • Domain predicates • H_dom: default namespace, customizable, e.g. domain:Expedition ; domain:of • Discourse referent variables: implicit or explicit? • H_ref: only referents of predicates are materialized, e.g. domain:survivor_1 rdf:type domain:Survivor ; depart_1 rdf:type domain:Depart • Terminology recognition/extraction • H_term: co-referential predicate merging, e.g. domain:ExpeditionParty • Term compositionality • H_termc: create taxonomies by “unmerging” predicates, e.g. domain:ExpeditionParty rdfs:subClassOf domain:Party • Periphrastic relations • H_peri: generate “webby” properties, e.g. domain:survivorOf rdf:type owl:ObjectProperty 22

  23. Heuristics to map Boxer DRT to RDF (2/2) • Getting rid of redundant variables • H_red: unify variables based on local Unique Name Assumption • Boolean constructs • H_boo: special relations between domain predicates or individuals, e.g. domain:man_1 owl:differentFrom domain:man_2 • Propositions as arguments • H_pro: special relation between events, e.g. domain:report_1 vn:theme domain:depart_1 • Definitional pragmatics • H_def: bypassing discourse referents and forcing universal quantification, e.g. domain:WindInstrument rdfs:subClassOf domain:MusicalInstrument • Implicit local restrictions (optional) • H_res: inducing anonymous classes, e.g. domain:WindInstrument rdfs:subClassOf (locationOf some domain:Contain) 23

  24. Wikipedia typing • Typing Wikipedia entities is not easy • A lot of resources, limited alignment, limited coverage, different granularity • We realized a fresh approach by • extracting NL definitions • producing RDF with FRED • refactoring FRED output to distill types • linking to DBpedia entities • linking to WordNet 3.0 • linking to SuperSenses and DOLCE+DnS (DUL) 24

  25. Tìpalo: a FRED app http://wit.istc.cnr.it/stlab-tools/tipalo/

  26. Definition of chaise longue A chaise longue is an upholstered sofa in the shape of a chair that is long enough to support the legs

  27. References • Alchemy (shallow parsing): http://www.alchemyapi.com/api/demo.html • Wikifier (entity resolution): http://wit.istc.cnr.it/stlab-tools/wikifier • Zemanta (multimedia enrichment): http://www.zemanta.com/demo/ • RelFinder (linked data visualization): http://www.visualdataweb.org/relfinder/demo.swf • ReVerb (relation extraction): http://openie.cs.washington.edu/ • C&C (categorial grammar): http://svn.ask.it.usyd.edu.au/trac/candc/wiki/Demo • Boxer (DRT deep parsing): http://svn.ask.it.usyd.edu.au/trac/candc/wiki/boxer • FRED (DRT to RDF): http://wit.istc.cnr.it/stlab-tools/fred/ • Tìpalo (Wikipedia definitions in RDF): http://wit.istc.cnr.it/stlab-tools/tipalo • Aemoo (serendipitous search): http://aemoo.org/ • ontologydesignpatterns.org portal: http://www.ontologydesignpatterns.org • Talmy’s fictive motion: http://en.wikipedia.org/wiki/Fictive_motion • Pustejovsky’s dot objects: http://pages.cs.brandeis.edu/~jamesp/publications.php • Knowledge Architecture paper: http://ceur-ws.org/Vol-782/PresuttiEtAl_COLD2011.pdf • FRED paper: http://ekaw2012.ekaw.org/node/137 • EKP paper: http://www.stlab.istc.cnr.it/documents/papers/wikilinkpatterns.pdf 27

More Related