310 likes | 420 Views
S w o o g l e. search and metadata for the semantic web. Presented by eBiquity group, UMBC CIKM’04, Nov 12, 2004. Partial research support was provided by DARPA contract F30602-00-0591 and by NSF by awards NSF-ITR-IIS-0326460 and NSF-ITR-IDM-0219649. Outline. Motivation Concepts Demo
E N D
Swoogle search and metadata for the semantic web Presented by eBiquity group, UMBC CIKM’04, Nov 12, 2004 Partial research support was provided by DARPA contract F30602-00-0591 and by NSF by awards NSF-ITR-IIS-0326460 and NSF-ITR-IDM-0219649.
Outline • Motivation • Concepts • Demo • Architecture • document discovery • metadata creation • ontology rank • Status • Summary http://swoogle.umbc.edu/ Swoogle, cikm'04 -- http://swoogle.umbc.edu/
Motivation • (Google + Web) has made us all smarter • something similar is needed by people and software agents for information on the semantic web Swoogle, cikm'04 -- http://swoogle.umbc.edu/
Motivation – Common Questions • Find an ontology • What are the ontologies about “time” ? • Shall I use an existing ontology or create one? • Find instance data • Show me the instances of a class “http://foo.com/Person”? • Gather relevant information for my application. • Characterize the Semantic Web • How many RDF documents are online? • What are the most popular ontologies ? • What graph properties does the semantic web have? • Does namespace URI link to the corresponding ontology? Swoogle, cikm'04 -- http://swoogle.umbc.edu/
Software Agents, Applications uses uses searches Directory/Digest Service Service Finder Data Finder digests digests Data Service Semantic Web Services RDF document SW data service database (Web) document The Role of Swoogle in Semantic Web Swoogle Swoogle, cikm'04 -- http://swoogle.umbc.edu/
Ontology based annotation & search Annotate web documents SHOE (UMCP, 1997) Ontobroker (AIFB, karlsruhe, 1998), WebKB (Martin & Eklund, 1999), QuizRDF (BT,2002) Annotate proper reference & relations CREAM (AIFB,2003) Ontology repositories Ontology level DAML Ontology Library Schema Web SemWebCentral Term level W3C’s Ontaria (2004) Ontology management systems Stanford’s Ontolingua IBM’s Snobase Based on both ontology and instance document Automated discovery Search and rank ontologies and terms Digest but not store Create metadata based on RDF and OWL semantics Provide services to both human and software agents Related work Swoogle aims to be a Google-like online ontology repository Swoogle, cikm'04 -- http://swoogle.umbc.edu/
Concepts • Document • A Semantic Web Document (SWD) is an online document written in semantic web languages (i.e. RDF and OWL). • An ontology document (SWO) is a SWD that contains mostly term definition (i.e. classes and properties). It corresponds to T-Box in Description Logic. • An instance document (SWI or SWDB) is a SWD that contains mostly class individuals. It corresponds to A-Box in Description Logic. • Term • A term is a non-anonymous RDF resource which is the URI reference of either a class or a property. • Individual • An individual refers to a non-anonymous RDF resource which is the URI reference of a class member. In swoogle, a document D is a valid SWD iff. JENA* correctly parses D and produces at least one triple. *JENA is a Java framework for writing Semantic Web applications. http://www.hpl.hp.com/semweb/jena2.htm rdf:type foaf:Person rdfs:Class rdf:type http://.../foaf.rdf#finin foaf:Person Swoogle, cikm'04 -- http://swoogle.umbc.edu/
Concepts Example SWD http://foo.com/foaf.rdf#finin rdf:type foaf:Person SWO SWI http://foo.com/foaf.rdf#finin finin@umbc.edu foaf:mbox http://xmlns.com/foaf/1.0/ Individual rdfs:subClassOf wordNet:Agent Class foaf:Person Term Property rdf:type rdfs:Class rdfs:domain NOTE: Qualified Names (QName) are used to shorten well-known namespaces as follows rdf: => http://www.w3.org/1999/02/22-rdf-syntax-ns#" rdfs: => http://www.w3.org/2000/01/rdf-schema foaf: => http://xmlns.com/foaf/1.0/ wordNet: => http://xmlns.com/wordnet/1.6/ foaf:mbox rdf:type rdf:Property Swoogle, cikm'04 -- http://swoogle.umbc.edu/
Demo Find “Time” Ontology (Swoogle Search) 1 • Digest “Time” Ontology • Document view • Term view 2 3 Find Term “Person” (Ontology Dictionary) • Digest Term “Person” • Class properties • (Instance) properties 4 Swoogle Statistics 5 Swoogle, cikm'04 -- http://swoogle.umbc.edu/
Find “Time” Ontology Demo1 We can use a set of keywords to search ontology. For example, “time, before, after” are basic concepts for a “Time” ontology.
Usage of Terms in SWD http://www.cs.umbc.edu/~finin/foaf.rdf http://foo.com/foaf.rdf rdf:type rdf:type foaf:Person foaf:Person foaf:mbox http://foo.com/foaf.rdf#finin finin@umbc.edu finin@umbc.edu foaf:mbox http://xmlns.com/foaf/1.0/ populated Class rdfs:subClassOf wordNet:Agent populated Property foaf:Person rdf:type rdfs:Class rdfs:domain defined Class foaf:mbox rdf:type defined Property rdf:Property defined Individual Swoogle, cikm'04 -- http://swoogle.umbc.edu/
Demo2(a) Digest “Time” Ontology (term view) TimeZone before …………. intAfter
Web document metadata When/how discovered/fetched Suffix of URL Last modified time Document size SWD metadata Language features OWL species RDF encoding Statistical features Defined/used terms Declared/used namespaces Ontology Ratio Ontology Rank Ontology annotation Label Version Comment Related Relational Metadata Links to other SWDs Imported SWDs Referenced SWDs Extended SWDs Prior version Links to terms Classes/Properties defined/used Document Metadata Swoogle, cikm'04 -- http://swoogle.umbc.edu/
Demo2(b) Digest “Time” Ontology (document view)
Demo3 Find Term “Person” Not capitalized! URIref is case sensitive!
Onto 1 foaf:mbox foaf:name rdfs:domain rdfs:domain Onto 2 SWD3 rdf:type owl:Class rdf:type foaf:name rdfs:subClassOf foaf:Agent “Tim Finin” rdfs:label “Person” Term Metadata: An integrated definition • Class Definition • rdfs:subClassOf -- foaf:Agent • rdfs:label – “Person” • Properties (from SWO) • foaf:mbox • foaf:name • Properties (from SWI) • foaf:name • dc:title foaf:Person Swoogle, cikm'04 -- http://swoogle.umbc.edu/
Demo4 Digest Term “Person” 167 different properties 562 different properties
Demo5 Swoogle Statistics
Swoogle Architecture data analysis interface IR analyzer SWD analyzer Web Server Web Service SWD Cache SWD Metadata metadata creation Agent Service SWD Reader SWD discovery Candidate URLs The Web Web Crawler Swoogle, cikm'04 -- http://swoogle.umbc.edu/
1. SWD Discovery • Swoogle uses three crawlers to discover likely SWD URLs • A Google Crawler uses Google to find URLs using • keywords: http://www.w3.org/2000/01/rdf-schema,... • File type suffices: .rdf, .owl • A Focused Crawler crawls through HTML files recursively within the given website. • A SWD Crawler crawls through SWDs and discover URLs according to term semantics. • To determine the likely SWD URLs: • Non-swd extension filter: .jpg, .mp3, and etc. • Protocol filter: file://, urn:, and etc. • Namespace of RDF resources in SWD Swoogle, cikm'04 -- http://swoogle.umbc.edu/
2. Metadata Creation • Document metadata • General metadata • SWD metadata • Ontology metadata • Term Metadata (definition) • Class property • (Instance) property: i.e. class-property bond • Relational metadata Swoogle, cikm'04 -- http://swoogle.umbc.edu/
2.1 Ontology Ratio • Why? • The fuzzy distinction between ontology and instance document • Given a SWD foo, and let • C(foo): the set of classes defined in foo • P(foo): the set of properties defined in foo • I(foo): the set of instances defined in foo • Ontology Ratio as a heuristic to do the classification • 0: pure SWI • 1: pure SWO • > 0.8: foo is said to be an ontology. Swoogle, cikm'04 -- http://swoogle.umbc.edu/
2.2 Relational Metadata • Inter-document relation • rdfs:seeAlso • IMport (IM) e.g. owl:import • Similar/Equal SWD • Inter-term relation • EXtension (EX) e.g. rdfs:subClassOf • use-TerM (TM) e.g. rdf:range • use-INdividual (IN) e.g. owl:sameAs • Prior Version (PV, IPV, CPV) • Generalized inter-document relations • Generalized from individual level relation • Capture more relations while with less complexity • Usage • Link SWDs • Ontology rank Swoogle, cikm'04 -- http://swoogle.umbc.edu/
Audiofiles Videofiles SWOs HTML documents SWIs Images 3. Data analysis: Ranking SWD • Why? • Ranking captures page importance and popularity • Ranking has been proven useful in HTML search. • SWD is different from HTML and has more semantics • So, a new SWD ranking mechanism is needed ! • Related ideas? • Google’s PageRank • Kleinberg’s HITS Swoogle, cikm'04 -- http://swoogle.umbc.edu/
3.1 Random surfer model (PageRank) • How PageRank is computed? • page A’s rank is • Where • {Ti } are the pages that link to A • C(X): # of page X’s out links • d is a damping factor (e.g., 0.85) • Compute by iterating until converge • Uniform probability of following any link is convention in the Web but not in the SW • Links have semantics that influence the probability of following them • Rational users read an ontology and all ontologies it referenced. Jump to a random page read page bored? yes no Follow arandom link Swoogle, cikm'04 -- http://swoogle.umbc.edu/
3.2 Rational Random Surfer Model • Weighted random behavior • Rational behavior • Rank of a SWI • Rank of a a SWO 1 Jump to a random page read page SWO? no yes 2 Read referenced SWOs 2 bored? yes no 1 Follow arandom link where TC(A) is transitive closure of SWOs referencing A. Swoogle, cikm'04 -- http://swoogle.umbc.edu/
http://xmlns.com/wordnet/1.6/ rdf:type http://www.w3.org/2000/01/rdf-schema rdfs:Class wordNet:Person rdfs:subClassOf TM wordNet:Individual rdf:type rdfs:subClassOf rdf:Property EX TM http://xmlns.com/foaf/1.0/ rdfs:subClassOf wordNet:Person TM foaf:Person rdfs:Class rdf:type 3.3 Ontology Rank Example http://www.cs.umbc.edu/~finin/foaf.rdf rdf:type foaf:Person foaf:mbox finin@umbc.edu Swoogle, cikm'04 -- http://swoogle.umbc.edu/
3.3 Ontology Rank Example (cont’d) http://www.w3.org/2000/01/rdf-schema rawPR =300 PR =403 TM http://xmlns.com/wordnet/1.6/ TM rawPR =3 PR =103 EX http://xmlns.com/foaf/1.0/ TM rawPR =100 PR =100 http://www.cs.umbc.edu/~finin/foaf.rdf rawPR =0.2 PR =0.2 Swoogle, cikm'04 -- http://swoogle.umbc.edu/
Current Status • Swoogle Watch reported (Nov 7, 2004) • 40 M triples • 270 K SWDs: 4k ontologies • 144 K terms: 91K classes & 51K properties • Ongoing work • Ontology Dictionary • Swoogle Statistics • Web Service interface (see Swoogle website) • IR with the Semantic Web (Content search) • Character N-Grams • Bag of URIrefs • Swangling Swoogle, cikm'04 -- http://swoogle.umbc.edu/
Summary 2004 • Automated SWD discovery • SWD metadata creation and search • Ontology rank (rational surfer model) • Swoogle watch • Web Interface Swoogle (Mar, 2004) • Ontology dictionary • Swoogle statistics • Web service interface (WSDL) • Bag of URIref IR search Swoogle2 (Sep, 2004) • Better crawl & refresh strategies • More metadata (ontology mapping) • More IR features • Better web service interfaces • Capture and store all triples • More reasoning 2005 Swoogle3 Swoogle, cikm'04 -- http://swoogle.umbc.edu/
The End • Website: http://swoogle.umbc.edu • Slides at: http://ebiquity.umbc.edu/v2.1/resource/html/id/66/ • Demo: http://ebiquity.umbc.edu/v2.1/resource/html/id/65/ Questions? Swoogle, cikm'04 -- http://swoogle.umbc.edu/