240 likes | 384 Views
Varish Mulwad. Research Problems in Semantic Web Search. ____________________________. ____________________________. Agenda. Introduction Swoogle Swoogle’s Competition – Sindice Semantic Web Search Engine (SWSE) Watson Falcon Research Problems and Issues with Swoogle References.
E N D
Varish Mulwad Research Problems in Semantic Web Search ____________________________
____________________________ Agenda • Introduction • Swoogle • Swoogle’s Competition – • Sindice • Semantic Web Search Engine (SWSE) • Watson • Falcon • Research Problems and Issues with Swoogle • References
____________________________ Introduction Web Your Agent Dr.Finin’s FOAF Profile Possible because: Data is in machine understandable form like – RDF, OWL But how will agent find all this data ? Search Engines ?
____________________________ Introduction Traditional Search Engine Results Semantic Web Search Engine Results
____________________________ Swoogle • Swoogle is a crawler based indexing and retrieval system for Semantic Web • Swoogle crawls and discovers documents written in RDF,OWL • Swoogle classifies a Semantic Web Document(SWD) as – • Semantic Web Ontology (SWO) – Defines new terms • Semantic Web Databases (SWDB) – Makes assertions about individuals
____________________________ Swoogle • SWOOGLE DEMO
____________________________ Swoogle Architecture
____________________________ Swoogle Architecture • SWD Discovery Component • Google crawler using the Google web service • Filetypes with extensions “.rdf”, ”.owl”, “.n3” • Google limits only 1000 results per query • A focussed crawler • Crawls documents within a given website • Extension and Focus constraints • A Swoogle crawler • Jena based crawler • Explores Semantic Links between SWDs
____________________________ Swoogle Architecture • Metadata Creation • Basic Metadata • Encoding – “RDF/XML”, “N-Triple”, “N3” • Language – RDF, RDFS, OWL, DAML + OIL • OWL Species – OWL-LITE, OWL-DL, OWL-FULL • Relations among SWDs • Reference relationship among SWDs • Inter ontology relationships
____________________________ Swoogle Architecture • Data analysis component • Classification of SWD as SWO or SWDB • Compute rank of SWD • Web based interface • Human User Interface – http://swoogle.umbc.edu • Web Services using REST interface • Agent Service
____________________________ Sindice • Created at Digital Enterprise Research Institute (DERI) • Key features of Sindice include – • Sindice collects SWDs and indexes them on resource URIs, Inverse Functional Properties(IFPs) and keywords • Sindice uses the Hadoop parallel architecture
____________________________ Sindice • Inverse Functional Property (IFP) – An OWL cardinality restriction • Sincdice uses three indexes – • URI index • IFP index • Keyword index • Benefits - Faster retrieval of data
____________________________ Sindice • Hadoop architecture is used in the following manner – • Sindice employs Hadoop/Nutch to distribute crawling job across multiple machines • Collected data is stored in the Hbase distributed column – based store • Efficient handling of large datasets across the cluster using a MapReduce implementation
____________________________ Sindice • SINDICE DEMO
____________________________ SWSE • Semantic Web Search Engine (SWSE) is also a Semantic Web Search Engine created at Digital Enterprise Research Institute (DERI) • SWSE uses a “Multicrawler” – a pipelined architecture for crawling
____________________________ Watson • Created at Knowledge Management Institute at the UK Open University • Major Design Principles – • Considers explicit and implicit relations between Ontologies • Ranking of Ontologies with focus on quality over popularity
____________________________ Watson • WATSON DEMO
____________________________ Falcon • Falcon is a Semantic Web Search engine created at the Institute of Web Science in China • Falcon allows keyword based queries on : • Objects • Concepts • Documents • Falcon performs class subsumption reasoning
____________________________ Falcon • FALCON DEMO
Summary ____________________________ Swoogle Others Sindice Indexes on URI, IFP, keywords Use of Hadoop Architecture SWSE Pipelined Architecture for Crawling Watson Implicit relations between SWDs Falcon Class Subsumption Reasoning • Keyword based search • Searches Ontologies and Instance Data
____________________________ Issues • Crawling • Swoogle’s crawler is running as a single thread on one machine • Limits the number of SWDs dicovered and revisted • Possible Solutions • Use of Hadoop Architecture • Use of Grub
____________________________ Other Issues • Crawling large structured Datasets like DBPedia • More reasoning • More services
____________________________ References • Li Ding et al., "Swoogle: A Search and Metadata Engine for the Semantic Web", Proceedings of the Thirteenth ACM Conference on Information and Knowledge Management, November 2004. • P. Mika, G. Tummarello “Web Semantics in the Clouds”, IEEE Intelligent Systems, Volume 23 , Issue 5 (September 2008) • E. Oren, R.Delbru, M. Catasta, R. Cyganiak, H. Stenzhorn, G. • Tummarello “Sindice.com: A document-oriented lookup index for open linked data.” In International Journal of Metadata, Semantics and Ontologies, 3(1), 2008. • Mathieu d’Aquin et al., “Watson: A Gateway for the Semantic Web” ,Poster session of the European Semantic Web Conference, ESWC 2007 • Gong Cheng, Weiyi Ge, Honghan Wu, Yuzhong Qu , “Searching Semantic Web Objects Based on Class Hierarchies” In WWW 2008 Workshop on Linked Data on the Web, 2008
Questions? ____________________________