260 likes | 802 Views
Swoogle. Semantic Search Engine Web-enhanced Information Management Bin Wang. Outline. Background Introduction Semantic Web Semantic Search Swoogle – Semantic Search Engine Swoogle Architecture Semantic Web documents Finding SWDs Ranking SWDs
E N D
Swoogle Semantic Search Engine Web-enhanced Information Management Bin Wang
Outline • Background Introduction • Semantic Web • Semantic Search • Swoogle – Semantic Search Engine • Swoogle Architecture • Semantic Web documents • Finding SWDs • Ranking SWDs • Swoogle Indexing and Retrieval • Conclusion
Background Introduction • What is Semantic Web? • An evolving development of WWW. • The semantics of information and services in the web is well-defined. • It makes it possible for web to understand and satisfy the requests of people and machines to use the web content.
Background Introduction • What is Semantic Web? The Semantic Web Layers
Background Introduction • What is Semantic Search? • A set of techniques on the management of documents, especially semantically supported document retrieval. • Two forms of Search: Navigational Search, Research Search; Semantic Search belongs to the second category. • It attempts to augment and improve traditional search results by using data from Semantic Web.
Swoogle – Semantic Search Engine • Swoogle – A crawler-based indexing and retrieval system for semantic web – RDF and OWL documents encoded in XML and N3 • It automatically discovers SWDs, indexes the metadata and answers queries about it. • SWDs are characterized by semantic annotation and meaningful references to other SWDs; conventional search engines do not take advantage of these features.
Swoogle Search Interface Developed by UMD
Activities that Swoogle can do • Finding appropriate ontologies It allows users to query for ontologies that contain specified terms anywhere in the document. The ontologies returned are ranked by Ontology Rank algorithm. • Finding instance data It helps users to integrate data distributed on the web. • Characterizing the Semantic Web It reveals interesting structural properties about the semantic web by extracting metadata and especially inter-document relations.
Swoogle Architecture • Four main components: SWD discovery, metadata creation, data analysis and interface
SwoogleArchiteture • SWD discovery component: • discovers potential SWDs throughout the web • keeps up-to-date information about SWDs. • Metadata creation component: • generates objective metadata about SWDs at both the syntax level and the semantic level. • Data analysis component: • derives analytical reports, such as classification of SWOs and SWDBs, rank of SWDs and IR index of SWDs • Interface component:
Semantic Web Documents(SWDs) • A SWD is a document in a semantic web language(based on RDF, e.g. RDFS, DAML+OIL, and OWL) that is online and accessible to web users and software agents. • There are two kinds of documents in SWDs: • SWOs (Semantic Web Ontology) • SWDBs (Semantic Web Databases)
Semantic Web Documents(SWDs) • SWOs(Semantic Web Ontology) A SWD with a significant proportion of the statements it makes define new terms or extend the definitions of terms defined in other SWDs. • SWDBs(Semantic Web Databases) A SWD without defining or extending a significant number of terms.
Finding SWDs • Develop a Google Crawler to search URLs using the Google Web Service. • starts with type extensions(e.g. .rdf, .owl, .daml, and .n3, good SWD indicators ) • Develop a Focused Crawler to crawl documents within a given website. • only crawls URLs relative to the given base URL • invites SW community to submit the URLs
Finding SWDs • Develop the JENA2 based Swoogle Crawler. • It verifies if a document is a SWD or not • It revisits discovered URLs to check updates • Some heuristics are used to discover new SWDs through semantic relations. --A URIref is highly likely to be the URL of a SWD. --OWL: imports links to an external ontology, which is a SWD. --etc. .
SWD Metadata • It is collected to make SWD search more efficient and effective. • It is derived from the content of SWDs as well as the relations among SWDs. • Swoogle identifies three categories of metadata: • Basic metadata; • Relations; • Analytical results such as SWO/SWDB classification, and SWD ranking.
SWD Metadata • 1. Basic metadata It considers the syntactic and semantic features of a SWD. • Language feature It refers to the properties describing the syntactic or semantic features of a SWD. • RDF statistics It refers to the properties summarizing node distribution of the RDF graph of a SWD. • Ontology annotation It refers to the properties that describe a SWD as an ontology.
SWD Metadata • 2.Relations among SWDs Swoogle focuses on SWD level relations which generalize RDF node level relations. The following relations are captured: • TM/IN: captures term reference relations between two SWDs; • IM: shows that an ontology imports another ontology; • EX: shows that an ontology extends another ontology; • PV: shows that an ontology is a prior version of another; • CPV: shows that an ontology is a prior version of and is compatible with another; • IPV: shows that an ontology is a prior version of and is incompatible with another.
Ranking SWDs • Rational Random Surfer • A user will arrive at a given page ->by directly addressing it ->by following one of the links pointing to it; • Different links may stand for different relations, thus have different weights. Explore all linked SWOs Jump to arandom page SWO? yes no bored? yes no Follow arandom link
Ranking SWDs • Rational Random Surfer - raw rank T(x): the set of SWDs that x links to; L(a): the set of SWDs that links to a; d: a damping factor, typically set to 0.85.
Ranking SWDs • Rational Random Surfer – final rank TC(A) is the transitive closure of SWOs imported by a. • Swoogle computes the rank for SWDBs using the first one, and computes the rank for SWOs using the sec one.
Swoogle Indexing and Retrival • Swoogle adapts the Sire, a custom indexing and retrieval engine: • It employs a TF/IDF model with a standard cosine similarity metric. • It indexes discovered documents by using either character N-Gram or URIrefs as keywords to find relevant documents and to compute the similarity among a set of documents.
Conclusion • Introduces a prototype crawler-based indexing and retrieval system for Semantic Web documents. • One of the interesting properties computed for each SWD is its ontology rank. Here it uses the rational surfing model, different from what is used in conventional search engine.
References • Li Ding , Tim Finin , Anupam Joshi , Rong Pan , R. Scott Cost , YunPeng , PavanReddivari , VishalDoshi , Joel Sachs, Swoogle: a search and metadata engine for the semantic web, Proceedings of the thirteenth ACM international conference on Information and knowledge management, November 08-13, 2004, Washington, D.C., USA • R. Guha , Rob McCool , Eric Miller, Semantic search, Proceedings of the 12th international conference on World Wide Web, May 20-24, 2003, Budapest, Hungary • Berners-Lee, Tim; James Hendler and OraLassila (May 17, 2001). "The Semantic Web". Scientific American Magazine.