740 likes | 842 Views
Building and Using Knowledge Bases. Steffen Staab Saqib Mir – European Bioinformatics Institute Ermelinda d‘Oro , Massimo Ruffolo – Univ. Calabria, Italy & WeST Team. Institut WeST – Web Science & Technologies. Semantic Web. Web Retrieval. Social Web. Multimedia Web.
E N D
BuildingandUsingKnowledge Bases Steffen Staab Saqib Mir – European Bioinformatics InstituteErmelindad‘Oro, Massimo Ruffolo – Univ. Calabria, Italy & WeST Team
Institut WeST – Web Science & Technologies Semantic Web Web Retrieval Social Web Multimedia Web Software Web GESIS
PhDthesistrauma 17 yearsago „Nach dem Auspacken der LPS 105 präsentiert sich dem Betrachter ein stabiles Laufwerk, das genauso geringe Außenmaße besitzt wie die Maxtor.“ Havingunwrappedthe LPS 105 – revealsitselftotheonlooker - a stablediskdrive, whichhassimilarlysmallvolumeasthe Maxtor.“
General motivationis not informationextraction, but itissolvingtasks! General Motivation
General objective: Extracting to LOD useAsExample hasLivedIn • Crucialtoknow: Ontologiesnowadaysreflectthisstructure • Ontologiesare • Modular (vsonetorulethem all) • Distributed (vsdefined in oneplace) • Connected (vsisolatedtemplates) • Extensible (vsclaimedtobefinished) • Lightweight (vscomputationallyintractable) • Popularonesareusedmoreoften (vspeopledisagreeing) • Ontologies – LEGO style
Most famousapplications • Steve Macbeth (Microsoft): - discussion wrt Schema.org -“about 7% of pages we crawl have mark-up” • http://www.w3.org/2012/06/06-schema-minutes.html • LOD Cloud • Google Knowledge Graph • Bing getsitsownknowledgegraphhttp://searchengineland.com/bing-britannica-partnership-123930
Exampleontology-basedapplication 1: Analysis ofUrban parameters
General objective: Analysing LOD useAsExample hasLivedIn
http://lisa.west.uni-koblenz.de/lisa-demo/ Family‘sanalysisofKoblenz LOD + Open Street Mapdata
http://lisa.west.uni-koblenz.de/lisa-demo/ Entrepreneur‘sanalysisofKoblenz LOD + Open Street Mapdata 1. Prize German Linked Open Gov Data Competition 2012
Exampleontology-basedapplication: Faceted Multimedia exploration
Making Web 2.0 More Accessible Links Location low- to midlevel features Persons xxxxxxxxx Knowledge Tags [Schenk et al; JoWS 2009] GeoNames
Choosing between Koblenz – and Koblenz Video at: http://vimeo.com/2057249
Persons – Celebrities, FOAFers & Flickr Users Billion Triples Challenge 1. Prize 2008 [Schenk et al; JoWS 2009]
Now on toinformationextraction: Observations on Information Extraction
Challenges & Opportunities for IE Not all web pages are created equal
Challenges & Opportunities for IE Some challenges are the same, e.g. finding type instances
Challenges & Opportunities for IE Some challenges are the same, e.g. finding relation instances
Challenges & Opportunities for IE Some contain concepts and their descriptions, some don‘t No types here, few relation types
Challenges & Opportunities for IE Knowing that they are instances and of which type Positional indication Textual indication
Challenges & Opportunities for IE To some extent positional and layout indications work across languages and sites
Challenges & Opportunities for IE owl:sameAs We should not only think about Web pages, but about Web sites
Challenges & Opportunities for IE We should not only think about Web pages, but about Web sites owl:sameAs
Comparing related work to our objectives Relatedworkobjectives • IE on Web pages • Acquiringinstancesandrelationshipinstances • IE based on linear text Ourobjectives • IE on Web sites • Acquiringitems • Classifyingitems in • Instances • Concepts • Relation instances • Relationships • IE also basedon spatialposition Thereisoverlapandofcoursethereareexceptionsin relatedwork
Outline The Bio-Case The SocialMedia-Case • Motivation • State-of-the-Art • Core ideaofSXPath • Implementation • Evaluation [Oroet al; VLDB 2010]
Presentation-oriented documents • HTML DOM structureissitespecific • Spatialarrangementsarerarely explicit • Spatiallayoutishidden in complexnestingoflayoutelements • Intricate DOM treestructuresareconceptuallydifficulttoqueryfortheuser (or a tool!)
Related Work Web Query languages • Xpath 1.0 and XQuery1.0 • Established • Toodifficulttouseforscrapingfromintricate DOM structures Visual languages • Spatial Graph Grammars[Kong et al.] arequitecomplex in termofbothusabilityandefficiency • Algebrasforcreatingandqueryingmultimediainteractivepresentations (e.g. ppt) [Subrahmanian et al.] Web wrapperinductionexploitingvisualinterface[Gottlob et al.] [Sahuguet et al.] • generateXPathlocationpathsof DOM nodes • canbenefitfromusingSpatialXPath
Outline The Bio-Case The SocialMedia-Case • Motivation • State-of-the-Art • Core ideaofSXPath • Implementation • Evaluation
Querying for Relations Among Nodes Rectangular Cardinal Relations (RCR) r1 E:NE r2 Spatial models allow for expressing disjunctive relations among regions Topological Relations
From XPath 1.0 towards Spatial Querying with SXPath SXPath features • adopts intuitive path notation: • axis::nodetest [pred]* • adds to XPath • spatial axes • spatial position functions • natural semantics for spatial querying
Complexity Results • Formal modeldefined in thepaper[Oro et al; VLDB 2010]
Outline The Bio-Case The SocialMedia-Case • Motivation • State-of-the-Art • Core ideaofSXPath • Implementation • Evaluation
Outline The Bio-Case • Motivation • The (Biochemical) Deep Web • Contributions • Page-level wrapperinduction • Site-widewrappergeneration • Error Correctionby Mutual Reinforcement • Conclusionsand Future Directions The Social Media Case • Motivation • State-of-the-Art • Core ideaofSXPath • SXPath Language • Spatial Data Model • Syntax & Semantics • Complexity • Implementation • Evaluation