200 likes | 342 Views
SEMANTIC KNOWLEDGE ACQUISITION OF INFORMATION FOR SYNTACTIC WEB By G.Nagarajan and K.K.Thyagharajan International Journal of Web & Semantic Technology. By Raef Mchaymech. Outline. Introduction. 1. The Problem. 2. The Proposed Solution. 3. 4. The Proposed Architecture. 5. Conclusion.
E N D
SEMANTIC KNOWLEDGE ACQUISITION OF INFORMATION FOR SYNTACTIC WEB By G.Nagarajan and K.K.Thyagharajan International Journal of Web & Semantic Technology By Raef Mchaymech
Outline Introduction 1 The Problem 2 The Proposed Solution 3 4 The Proposed Architecture 5 Conclusion 6 Critics
Introduction • People are using the web for everything
The First Problem • Search engines are returning: • Billions of results, informative and non informative
Current Solutions The introduction of the semantic web in 2000, had encouraged researchers to create the concept of semantic search engines Semantic Web Semantic search engines are indeed widely adopted by developers and engineers Semantic search engines Querying the semantic web, using the semantic search engines returned expected results. Expected Result So What is the Problem NOW !!!
The Second Problem • There is no enough resources to search: • Searches and queries are very domain-dependent • E.g.: • Dbpedia to search Wikipedia • LinkedMDB to search IMDB
Current Solutions VS. Proposed Solution Semantic Web Current Web Write ontologies Write ontologies Transformation
The Proposed Architecture WWW Ontology Repository Conversion to XML Web Crawler Filtering Conversion to RDF/OWL List Of URLs
About the Crawler Web Crawler Templates A Web crawler starts with a list of URLs to visit, called the seeds. As the crawler visits these URLs, it identifies all the hyperlinks in the page and adds them to the list of URLs to visit. URLs from the list arerecursively visited according to a set of policies.
HTML to XML Conversion To XML The conversion is based on Natural Language Processing (NLP), specifically on Name Entity Recognition (NER) 1 NER can classify the entity as: Person Name, Organization Name, Location… 2
HTML to XML Conversion The Proposed Framework HTML Document Preprocessing HTML Web Page Entity Recognition Domain Hierarchy Corresponding XML file Lexicon & pattern Repository
XML to RDFS/OWL Conversion • Two main definitions should take place: • RDFS which provide the rules of the web page • OWL which define the conceptual ontology of the web page • Two techniques should used: • Syntactic Analysis • Semantic Analysis
Syntactic Analysis • It’s a simple mapping between XML elements and OWL elements For more rules please refer to the paper
Semantic Analysis • Strongly based on NLP techniques: • The analyzer works on identifying nouns, verbs, etc… • Probability Reasoner is used to separate concepts and relations • Relationships also consists of is-a and part-of • T box and A box are used to define logic and rules • T box provides the classes and property • A box provides instances
Conclusion • Intelligent information retrieval system • Projecting the reusability concept here • The authors reused the html pages • Convert them to ontologies
Critics • The English is a complete disaster • Authors did not show any real example: • They did not convert from syntactic pages to semantic ones • An example about the conversion of HTML to XML is provided but • not from XML to OWL • The mapping from XSD elements to OWL is not efficient • and is error-prone, irrelevant elements could be easily • inserted in the ontology.
Critics • The authors did not benefit from the expressive power of ontology (restrictions, type of properties…) • They wrote exactly: They talked about the architecture of the syntactic/semantic conversion. But no search engine was designed • No evaluation at all: speed of the solution, the amount of resource consumption…