100 likes | 212 Views
N LP And The Semantic Web. Dainis Kiusals dvk2102@columbia.edu COMS E6125 Spring 2010. Na tural Language Processing. 1950s and 1960s – researchers began developing techniques aimed at understanding the ways computers could be used to provide Natural Language Processing.
E N D
NLP And The Semantic Web Dainis Kiusals dvk2102@columbia.edu COMS E6125 Spring 2010
Natural Language Processing • 1950s and 1960s – researchers began developing techniques aimed at understanding the ways computers could be used to provide Natural Language Processing. • The ability to capture context was studied by Noam Chomsky. His theory is based upon the use of Generative grammars - constructs used to describe how a sentence is formed,which may be used to create formal grammars through which an input stream of words may be parsed as a first step toward extracting their meaning [1] [sentence] [noun phrase], [verb phrase] [determiner], [noun], [verb], [article], [adjective], [adverb]
NLP Issues / Challenges • Morphology – different forms of words(singular/plural, tense) • Syntax – grammatical structure(verbs, nouns) • Spelling – different spelling(and misspelling) of words • Text Segmentation – identifying word boundaries • Word Sense Disambiguation – multiple word meanings The company is ready to sell. color/colour, organize/organise bow (bend forward, weapon, ribbon, front of ship)? runs, ran, running
Semantic Web [2] • Proposed by Tim Berners-Lee (W3C Director) as a method for adding concepts via semantic annotation to Web content. • W3C standardizing the RDF and OWL protocols. • At lowest level, concepts stored as triples, defined at higher levels by ontologies. [3]
Keyword Search Queries are only processed as statistical analysis of keyword appearance in documents, with some advanced logical features. Does not distinguish between different interpretations of a word in given context in searched data (corpus) – search results might contain different uses of a word. [4]
NLP / Semantic Search • Increased relevancy of results vs. keyword search. • Longer query phrases and questions yield better results. • Makes use of semantic information to attain better results. • Users need to change (used to keywords). • NLP Search pages need to encourage use of complex queries. • Web Search vs. Enterprise Search? • NLP Search may be better suited for smaller size domains. • Top-Down or Bottom-Up approach? • Top-Down approach relies more on NLP processing. • Creating the Semantic Web (Bottom-Up) will be more costly.
NLP / Semantic Web Relationship • NLP and the Semantic Web compliment each other and will grow together. • As Semantic Web (RDL and OWL) annotation is added to Web pages, NLP search engines can take advantage of this information. • NLP processes can be used to automate the generation of content to be used to populate new Semantic Web annotation. • Global and domain-specific ontologies (which represent concepts and their relationships) combined with NLP techniques define the search process.
Case Study: • Founded in San Francisco in 2005 with a goal to create a NLP Search Engine. • In 2007 obtained exclusive rights to several decades of Xerox/PARC NLP research. • Launched first public software beta in May 2008 – NLP search website covering approx. 2.5 million Wikipedia web pages (also referenced Freebase). • Created innovative user interface which leveraged NLP/semantic search results (ex: highlighting of relevant phrases/sentences within a larger document). • Two months after public beta was acquired by Microsoft in order to be incorporated into the Bing! Search engine.
Resources • An Executive's Guide to Information Technology: Principles, Business Models and Terminology by Robert Plant and Stephen Murrell, Cambridge University Press, 2007 • Enterprise 2.0 Implementation, Chapter 13 by Aaron C. Newman and Jeremy Thomas McGraw-Hill/Osborne, 2009 • Encyclopedia of Knowledge Management, RDF and OWL by David G. Schwartz (ed) IGI Global, 2006 • Semantic Knowledge Management: An Ontology-Based Framework by Antonio Zilli (ed) et al. IGI Global, 2009 • http://www.parc.com/work/focus-area/NLP/ • http://www.powerset.com/ Resources/information taken from Full Paper submitted 3/12/10.