140 likes | 362 Views
Inside semantic Web search engines: between semantic annotation and Natural Language Processing Dentro i motori di ricerca semantici: tra annotazione semantica ed elaborazione della lingua naturale. Incontro ISKO Italia - Torino 3 aprile 2009. Intervento di Mela Bosch melabosch@europe.com.
E N D
Inside semantic Web search engines: between semantic annotation and Natural Language ProcessingDentro i motori di ricerca semantici: tra annotazione semantica ed elaborazione della lingua naturale Incontro ISKO Italia - Torino 3 aprile 2009 Intervento di Mela Bosch melabosch@europe.com
Terminology on Web Search Engines Text Search Engine: based on Lexical analysis. The main aim of the lexical analysis is to divide the text into paragraphs, sentences and words and also entities such as e-mail addresses or URLs. All these elements are knows as tokens, and the Search Engine makes a parsing with statistical parameters to develop a range of links as a response to a query. Latent semantic indexing (LSI): based on Latent semantic analysis (LSA); LSI is a technique of Natural Language Processing (NLP) which uses an indexed database of documents to find similar terms. It can find a synonym and then return the best matched websites for the query. LSI does not require exact matching words for ranking result. Semantic Web search engines: take the sense of a word as a factor in its ranking lists oroffers the user a choice as to the sense of a word or phrase.
Semantic Web search engines or Search engines of 3rd generation Three types: User oriented Semantic Web search engine: It returns web page links. It can use internally both Semantic Web technologies and LSI. Ex.: True Knowledge, Hakia and PowerSet. Semantic Web Services oriented engine: It returns links to ontologies, OWL files, RDF instances. It is inadequate for end users. Ex.: SOWL, WSE, Watson, Falcons, Sindice and Swoogle. The idea is to provide ways for businesses to inter-operate across domains or services. Social-semantic Web oriented engine: The socio-semantic web (s2w) uses classification and ontologies in very practical situations. S2w search engines’ aim is tocomplement the formal Semantic Web vision adding a pragmatic collaborative tagging (folksonomy) approach. The main interest is to to enable users to share knowledge. Ex.: http://www.stumpedia.com/
The components of Semantic Web search engines Semantic Web search engines. What are all these differences for? • Natural Language Processing (NLP) • Annotation “Semantic Web means many things to different people: It is about artificial intelligence, computer programs solving complex optimization problems It is about web services, in terms of end user value It is the web of data, where information is represented in RDF or microformats and OWL.” See: http://www.readwriteweb.com/archives/semantic_web_patterns_a_guide_redux.php
Annotation Free-text annotation: The annotations can be comments, notes, explanations, references, examples, advice, corrections or any other type of external remark that can be attached to or embedded in a Web document or a selected part of the document. See: http://www.ncb.ernet.in/groups/dake/annotate/intro.shtml • Semantic annotation in general • Semantic annotation is the association of a data entity with an • element from a classification scheme, ontology or other knowledge repository • Examples of semantic annotation: • the assignment of MeSH descriptors to citations in MEDLINE • the assignment of Gene Ontology terms to gene products in UniProt
Semantic Web Annotation Is the technique for uploading machine understandable data on the Web by creating metadata through semantic tagging It is crucial to the fulfillment of the Semantic Web to give useful meaning to data or to unstructured text • A semantic annotation is a formal annotation, where the predicate is an ontological term, and the object conforms to an ontological definition. • The term “annotation” can denote both the process of annotating and the result of that process.
Semantic Web Annotation The Semantic Web Annotation process includes three components: • an ontology which describes the domain of interest • a data instance recognition process thatdiscovers all instances of interest in target web documents based on the defined ontology • an annotation generation process creates a semantic meaning disclosure file for each annotated document. Through the semantic meaning disclosure file, any ontology-aware machine agent can understand the target document. See: http://www.deg.byu.edu/ding/research/SemanticAnnotation.html
<html> … <annot> … </html> Annotation: can be manually, automatically or semi-automatically generated Types of semantic annotation tools The process of annotating requires semantic annotation tools: • Inline annotation means that the original document is augmented with metadata information. It focuses on annotating information on pages using RDF so that it is machine readable Embedded metadata Also called: Semantic Authoring or Bottom-up approach
<html> … </html> annotation Types of semantic annotation tools: • Standoff annotationmeans that the metadata is stored separately from the original document. It is generally preferable from the point of view of inter-operability Attached metadata The annotations are then stored in a database that is made available tousers via websites and sometimes via web services Also called: top-down approach. Its focus is leveraging information from existing web pages, to derive meaning automatically
The components of Semantic Web search engines • Natural Language Processing (NLP) A powerful method for the investigation and evaluation of human language itself. i.e. enhanced study over large corpora of texts Initially NLP is conceived as a support for Linguistics studies aims at using computers to interpret and manipulate words as a part of a language • Then • Artificial Intelligence defines NLP as the act of using computers to process written and spoken languages for some practical purpose such as translating languages, or carrying conversations with machines.
The components of Semantic Web search engines • Natural Language Processing (NLP) • Now • Thanks to the NLP techniques different algorithms such as chunking, clustering, parsing, spellchecking, tagging, and word sense disambiguation are used to handle text intelligently and to get information from the Web on text data banks in order to answer questions After the Web explosion NLP has been used for the developmentof natural language understanding systems that convert samples of human language into more formal representations that are easier to manipulate for computer programs.
Conclusion • However, both methodologies are now being combined: • semantic web search engines need many pages to be annotated (which requires an enormous effort), • so that NLP becomes an important help in automatic or semi-automatic annotation. • At the same time the precision of text analysis may be optimized by means of techniques of assignment provided by users and professionals. In conclusion, the trend is the development of collective knowledge systems that improve as more people participate, as they are based on human contributions. All of this will possibly be integrated by NLP algorithms.
References Iskold, Alex. (2006) Semantic Web Patterns: A Guide to Semantic Technologies. http://www.readwriteweb.com/archives/semantic_web_patterns_a_guide_redux.php Atanas, K. et al. (2005) Semantic Annotation, Indexing, and Retrieval. Ontotext Lab. http://www.ontotext.com/publications/SemAIR_ISWC169.pdf Vehvilainen, A. et al. (2006) SemiAutomatic Semantic Annotation and Authoring, Tool for a Library Help Desk Service. Helsinki University. http://www.seco.tkk.fi/publications/2006/vehvilainen-hyvonen-alm-semi-automatic-semantic-annotation-and-authoring-tool.pdf Diana Maynard (2005) Benchmarking ontology-based annotation tools for the Semantic Web. Department of Computer Science, University of Sheffield, UK.http://gate.ac.uk/sale/ahm05/ahm.pdf Good, Benjamin M ; Kawas, Edward ; Wilkinson, Mark. (2007) Bridging the gap between social tagging and semantic annotation: E.D. the Entity Describer. http://precedings.nature.com/documents/945/version/2/html Useful links: http://www.semanticfocus.com/ http://logic.stanford.edu/oem/projects.html#_Coordinating_Collective_Work http://semantic-mediawiki.org/wiki/Semantic_MediaWiki