190 likes | 304 Views
Seminar: Efficient Natural Language Processing. Overview of Topics. Overview. The following is a short ! o verview of each topic to give you a rough idea Exact definition of the problem : your task Some topics leave room for suggestions from your side
E N D
Seminar: Efficient Natural Language Processing Overviewof Topics
Overview • The followingis a short!overviewofeachtopictogiveyou a roughidea • Exactdefinitionoftheproblem: yourtask • Sometopicsleaveroomforsuggestionsfromyourside • In general: weareinterested in solutionsthatprovidereasonablequality AND efficiency (speed)
1. POS Tagging • POS = part-of-speech (“word category”) • POS tagging: assign each word in a sentence its word-category • Example:
2. Text Chunking • Task of chunking a text into phrases that contain “syntactically related words” • syntactically related words: noun-phrases, verb-phrases, … • Example: • “(SBARWhile) (PPin) (NPPrague) (NPhe) (VPmet) (NPAlbert Einstein) (PPfor) (NPthefirsttime) (O.)“
3. ClauseIdentification • Task of “dividing text into clauses” • Clausesusuallycontainsubjectandpredicate • Clauses form a hierarchicalstructure (a clausecancontainanotherclause) • Example: • “(Coach them in (handling complaints) (so that (they can resolve problems immediately)).)”
4. Entity Recognition • Identify entites(persons, locations, organizations …) in a text, based on a list of known entities • Example: • “[ENT1 Steve Jobs] was the CEO of [ENT2 Apple], situated in [ENT3 California].”
5. SemanticRoleLabeling • “recognize arguments of verbs in a sentence, and label them with their semantic role” • Example: • “[A0Chaplin] added [A1Eric Campbell, Henry Bergman, and Albert Austin] [A2to his stock company].” • Arguments of“added“: • Subject (A0): “Chaplin“ • Object (A1): “Eric Campbell, Henry Bergman, and Albert Austin” • IndirectObject(A2): “to his stock company”
6. ConstituentParsing • Recursivelyidentify all grammaticalparts/constituentsof a sentence • Output thesyntaxtree (constituenttree) • Example:
7. DependencyParsing • Foreachwordof a sentenceidentifyits „head“ - theword in thesentenceitdepends on • Output is a (di)graph • Example:
8. Link GrammarParsing • Link Grammar: a specialkindofgrammarexpressingrelationshipsbetweenthewordsof a sentence • Example: • “MV connects verbs (and adjectives) to modifying phrases like adverbs” -> “added” modified by “to his stock company” +--------------------------------------Xp-----------------------------------------+ | +-----------------MVp-----------------+--------Jp-------+ | | +------Os-----+ | +------D*u-----+ | +---Wd---+---Ss--+ +---G--+ | | +---AN---+ | | | | | | | | | | | LEFT-WALL Chaplin added.v Eric Campbell and Albert Austin to his stock.ncompany.n .
9. LTAG Parsing • LTAG = LexicalizedTree-AdjoiningGrammar • Tree-adjoininggrammar: a grammarthatconsistsoftrees • Parsing: treeoperations • Resultsimilarto a constituent parse
10. Text Simplification • Simplifycomplexsentencebyapplyinglexicalandsyntacticaloperations • Main motivation: maketextreadableforhumanswithreadingproblems (aphasics) • Other motivation: shortersentencesareeasier/fasterto parse “Mr. Anthony, who runs an employment agency, decries program trading, but he isn’t sure it should be strictly regulated.” “Mr Anthony runs an employment agency. MrAnthony decries program trading. But he isn’t sure it should be strictly regulated.”
11. Machine Learning • Importantmachinelearningtechniquesfor NLP • especially Support Vector Machines • Mapinstancesintovectorspace • Find hyperplane separatingclasses • andperceptrons • Train a neuralnetworktoclassifyexamples
12. QuestionAnswering • Answer a questiongiven in naturallanguage • WhatisthecapitalofMongolia? DEMO
13. EntityRetrieval • Answer a factoidquestionwith a listofentities • What cities in Germany have more than 100.000 inhabitants? DEMO
14. Apache UIMA • UIMA = Unstructured Information Management Applications • Aframeworkfor NLP applicationsproviding well-definedinterfaces • Managesthedataflowbetweencomponents • Usedby“Watson“, theIBM computerplayingtheJeapordy! Competition
15. NLP Applications • Anyapplicationrelyingheavily on NLP, likeContentus • Contentus: a researchprojectsponsoredbythegermangovernment • Goal: • Digitaliseandanalyzeinformation (text, pictures, audio, video … archived in libraries) • Providing an efficientwayofthematicresearch indiscovered information
List of Topics • POS tagging • Text chunking • Clauseidentification • Entityrecognition • Semanticrolelabeling • Constituentparsinging • Dependencyparsing • Link grammarparsing • LTAG parsing • Text simplification • Machinelearning • Questionanswering • Entityretrieval • Apache UIMA • NLP application
References • Barbella, D., Benzaid, S., Christensen, J. M., Jackson, B., Qin, X. V., and Musicant, D. R. 2009. Understanding support vector machine classifications via a recommender system-like approach. In Int. Conf. on Data Mining. 305–311. • Burges, C. 1999. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery. • Carreras, X. 2005. Learning and inference in phrase recognition: A filteringranking architecture using perceptron. Ph.D. thesis, UniversitatPolitècnicade Catalunya. • Carreras, X. and Màrques, L. 2004. Introduction to the conll-2004 shared task: Semantic role labeling. In Proceedings of CoNLL-2004. Boston, MA, USA, 89–97. • Chandrasekar, R., Doran, C., and Srinivas, B. 1996. Motivations and methods for text simplification. In Proceedings of the 16th conference on Computational linguistics - Volume 2. COLING ’96. Association for Computational Linguistics, Stroudsburg, PA, USA, 1041–1044. • Sang, E. F. T. K. and Déjean, H. 2001. Introduction to the conll-2001 shared task: Clause identification. Computing Research Repository cs.CL/0107. • Siddharthan, A. 2003. Syntactic simplification and text cohesion. Ph.D. thesis. • Tjong Kim Sang, E. F. and Buchholz, S. 2000. Introduction to the conll-2000 shared task: chunking. In Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7. ConLL ’00. Association for Computational Linguistics, Stroudsburg, PA, USA, 127–132.