1 / 36

Swoogle Tutorial (Part I: Swoogle R & D)

Swoogle Tutorial (Part I: Swoogle R & D). Presented by eBiquity Lab, CSEE, UMBC. A brief introduction to Swoogle An overview of Swoogle research A summary of Swoogle development. 1. Introduction. Motivation Swoogle in the Semantic Web Glossary Swoogle Architecture. S w o o g l e.

alva
Download Presentation

Swoogle Tutorial (Part I: Swoogle R & D)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Swoogle Tutorial (Part I: Swoogle R & D) Presented by eBiquity Lab, CSEE, UMBC A brief introduction to Swoogle An overview of Swoogle research A summary of Swoogle development eBiquity Lab, CSEE, UMBC

  2. 1. Introduction Motivation Swoogle in the Semantic Web Glossary Swoogle Architecture Swoogle

  3. Motivation • (Google + Web) has made us all smarter • something similar is needed by people and software agents for information on the semantic web eBiquity Lab, CSEE, UMBC

  4. Software Agents, Applications uses uses searches Directory/Digest Service Service Finder Data Finder digests digests Semantic web data Semantic Web Services RDF document SW data service database (Web) document The Role of Swoogle in Semantic Web Swoogle eBiquity Lab, CSEE, UMBC

  5. Concepts Explained SWD http://foo.com/foaf.rdf#finin rdf:type foaf:Person SWO SWI http://foo.com/foaf.rdf#finin finin@umbc.edu foaf:mbox http://xmlns.com/foaf/1.0/ Individual rdfs:subClassOf wordNet:Agent Class foaf:Person Term Property rdf:type rdfs:Class rdfs:domain NOTE: Qualified Names (QName) are used to shorten well-known namespaces as follows rdf: => http://www.w3.org/1999/02/22-rdf-syntax-ns#" rdfs: => http://www.w3.org/2000/01/rdf-schema foaf: => http://xmlns.com/foaf/1.0/ wordNet: => http://xmlns.com/wordnet/1.6/ foaf:mbox rdf:type rdf:Property eBiquity Lab, CSEE, UMBC

  6. Glossary • Document • A Semantic Web Document (SWD) is an online document written in semantic web languages (i.e. RDF and OWL). • An ontology document (SWO) is a SWD that contains mostly term definition (i.e. classes and properties). It corresponds to T-Box in Description Logic. • An instance document (SWI or SWDB) is a SWD that contains mostly class individuals. It corresponds to A-Box in Description Logic. • Term • A term is a non-anonymous RDF resource which is the URI reference of either a class or a property. • Individual • An individual refers to a non-anonymous RDF resource which is the URI reference of a class member. In swoogle, a document D is a valid SWD iff. JENA* correctly parses D and produces at least one triple. *JENA is a Java framework for writing Semantic Web applications. http://www.hpl.hp.com/semweb/jena2.htm rdf:type foaf:Person rdfs:Class rdf:type http://.../foaf.rdf#finin foaf:Person eBiquity Lab, CSEE, UMBC

  7. Swoogle Architecture data analysis interface IR analyzer SWD analyzer Web Server Web Service SWD Cache SWD Metadata metadata creation Agent Service SWD Reader SWD discovery Candidate URLs The Web Web Crawler eBiquity Lab, CSEE, UMBC

  8. 2. Swoogle Research Discovery Digest Search & Navigation Rank Statistics Swoogle

  9. Discovery - research • Discovering URLs of possible SWD automatically • Google-crawler • Focused-crawler • Semantic-Web-crawler, e.g. scutter • Revisiting URLs eBiquity Lab, CSEE, UMBC

  10. Discovery -- results • Crawler performance • Google crawler is the best • Focused crawler needs to be improved • Verified pure SWDs are only 1/3 of discovered URLs • Some NSWDs contains embedded RDF graph. Source: Swoogle (2005-Jan-05) SELECT `discovered_by`, sum(isRDF), sum(1-isRDF), count(*) FROM `digest_url` WHERE 1 group by discovered_by eBiquity Lab, CSEE, UMBC

  11. Digest -- research • Document metadata • Annotative • General metadata • SWD metadata • Ontology metadata • Inter-document relations • Document-term relations • Term metadata • Term Definition • Inter-term Relation • Class-property bond (C-P bond): rdfs:domain • Property-Class bond (P-C bond): rdfs:range eBiquity Lab, CSEE, UMBC

  12. Web document metadata When/how discovered/fetched Suffix of URL Last modified time Document size SWD metadata Language features OWL species RDF encoding Statistical features # of Defined/used terms # of Declared/used namespaces Ontology Ratio Ontology Rank Ontology annotation Label Version Comment Relations Links to other SWDs Imported SWDs Referenced SWDs Extended SWDs Prior version Links to terms Classes/properties defined Classes/properties used Document Metadata eBiquity Lab, CSEE, UMBC

  13. Demo2(a) Digest “Time” Ontology (document view)

  14. Document-Term Relation http://www.cs.umbc.edu/~finin/foaf.rdf http://foo.com/foaf.rdf rdf:type rdf:type foaf:Person foaf:Person foaf:mbox http://foo.com/foaf.rdf#finin finin@umbc.edu finin@umbc.edu foaf:mbox http://xmlns.com/foaf/1.0/ populated Class rdfs:subClassOf wordNet:Agent populated Property foaf:Person rdf:type rdfs:Class rdfs:domain defined Class foaf:mbox rdf:type defined Property rdf:Property defined Individual eBiquity Lab, CSEE, UMBC

  15. Demo2(b) Digest “Time” Ontology (term view) ………….

  16. Onto 1 foaf:mbox foaf:name rdfs:domain rdfs:domain Onto 2 SWD3 rdf:type owl:Class rdf:type foaf:name rdfs:subClassOf foaf:Agent “Tim Finin” rdfs:label “Person” Term Metadata • Term Definition • rdfs:subClassOf -- foaf:Agent • rdfs:label – “Person” • C-P bond (from SWO) • foaf:mbox • foaf:name • C-P bond (from SWI) • foaf:name • dc:title foaf:Person eBiquity Lab, CSEE, UMBC

  17. Demo4 Digest Term “Person”

  18. Term Distribution (grouped by local name) eBiquity Lab, CSEE, UMBC

  19. Digest -- result Ontological Term Distribution (populated, defined) Source: Swoogle (2005-Jan-05) SELECT res_type,sign(cnt_instance_populate>0), sign(cnt_swd_def>0),count(*), sum(cnt_instance_populate) FROM `digest_term` WHERE 1 group by res_type, sign(cnt_instance_populate>0), sign(cnt_swd_def>0) eBiquity Lab, CSEE, UMBC

  20. Search & Navigation -- research The Semantic Web is not the Web • Search service • Document search – RDF document is not free text • Term search – URIref and compound local name • Navigation service • The RDF graph – Typed links • The web of RDF documents – Few hyperlinks • The social network of agents – trust & provenance eBiquity Lab, CSEE, UMBC

  21. Find “Time” Ontology Demo1 We can use a set of keywords to search ontology. For example, “time, before, after” are basic concepts for a “Time” ontology.

  22. Demo3 Find Term “Person” Not capitalized! URIref is case sensitive!

  23. Audiofiles Videofiles SWOs HTML documents SWIs Images Current Swoogle Navigation Model • A URIref refers to • A term, i.e. instance of RDFS class/property • An individual, i.e. populated terms • A SWD could be • SWO: term definition • SWI: individuals • Observations • RDF Resources are semantically linked in RDF graph • SWDs are poorly linked due to the absence of explicit hyperlink concept • Ontologies are more interesting • Approach • Build inter-document relations • Rational surfing model eBiquity Lab, CSEE, UMBC

  24. Semantic Web Navigation Model new! sameNamespace sameLocalname RDF Graph Navigation … Term Search URIref usesNamespace Resource Namespace isUsedBy rdfsOntology owldlOntology isDefinedBy populatesClass populatesProperty refersClass refersProperty definesClass definesProperty URL rdfs:subClassOf RDF Document Ontology owl:imports owl:priorVersion owl:backwardCompatibleWith owl:imcompatiableWith rdfs:seeAlso rdfs:isDefinedBy Document Search eBiquity Lab, CSEE, UMBC

  25. Ranking -- research • Surfing models • Ranking method • PageRank variation eBiquity Lab, CSEE, UMBC

  26. http://xmlns.com/wordnet/1.6/ rdf:type http://www.w3.org/2000/01/rdf-schema rdfs:Class wordNet:Person rdfs:subClassOf TM wordNet:Individual rdf:type rdfs:subClassOf rdf:Property EX TM http://xmlns.com/foaf/1.0/ rdfs:subClassOf wordNet:Person TM foaf:Person rdfs:Class rdf:type Ranking with Rational Surfing Model: An Example http://www.cs.umbc.edu/~finin/foaf.rdf rdf:type foaf:Person foaf:mbox finin@umbc.edu eBiquity Lab, CSEE, UMBC

  27. Demo6 Swoogle’ top 10 Swoogle use PageRank like algorithm to rank semantic web documents. Well-known ontologies are highly ranked. This report is dynamically generated based on the latest data, and it will take 5 to 10 seconds.

  28. Statistics – research • Summarize the dataset collected by Swoogle • Swoogle Watch • Swoogle Today • Distribution of visited URLs • Document discovery log • Term discovery log • Semantic Web Watch • SWD distribution by last-modified month • SWD distribution by website • SWD distribution by suffix • Ontology Watch • Term (class/property) usage • Namespace usage eBiquity Lab, CSEE, UMBC

  29. Demo5(a) Swoogle Today eBiquity Lab, CSEE, UMBC

  30. Demo5(b) Swoogle Statistics FOAF Trustix W3C Stanford

  31. Demo5(c) Swoogle Statistics

  32. Miscellaneous • Submit URL for focused Crawler • Swoogle Web Service (Delivered in Sept.) http://swoogle.umbc.edu/webservice/ • Search document • Search term • Term digest eBiquity Lab, CSEE, UMBC

  33. Demo7 Submit URL for focused crawler When you can’t find your ontologies in Swoogle, it may be the case that your ontologies are not indexed by swoogle yet. Please submit it and increase its visibility. When your query fails From site map

  34. 3. Summary Summary Current Status Swoogle

  35. Summary 2004 • Automated SWD discovery • SWD metadata creation and search • Ontology rank (rational surfer model) • Swoogle watch • Web Interface Swoogle (Mar, 2004) • Ontology dictionary • Swoogle statistics • Web service interface (WSDL) • Bag of URIref IR search Swoogle2 (Sep, 2004) • Better discovery & revisit strategies • Better navigation models • Semantic web dataset • Index Instance data • More metadata (ontology mapping) • Better web service interfaces 2005 Swoogle3 eBiquity Lab, CSEE, UMBC

  36. Current Status • Swoogle Watch reported (Jan 6, 2005) • 46.7 M triples • 336 K SWDs: 4k ontologies • 153 K terms: 94K classes & 59K properties • Ongoing work • Research • Self-adaptive SWD Discovery • Efficient SWD digest and RDF Graph Abstract • Semantic Web navigation model • Engineering • Enhancing Web Service interface eBiquity Lab, CSEE, UMBC

More Related