790 likes | 1.47k Views
Natural Language Processing Lecture 1 : Introduction to NLP. Winter 2014-15 Lecturer: Prof. Roi Reichart. Course Information. Lecture: Roi Reichart. TA: Ira Leviant Class: Thursday 11:30 - 14:30, TA: Thursday 10:30 - 11:30. Course Prerequisites.
E N D
Natural Language ProcessingLecture 1: Introduction to NLP Winter 2014-15 Lecturer: Prof. Roi Reichart
Course Information • Lecture: Roi Reichart. TA: Ira Leviant • Class: Thursday 11:30 - 14:30, TA: Thursday 10:30 - 11:30
Course Prerequisites • Basic Linear Algebra, Probability, Algorithms • Machine learning will help (but some of it will be reviewed in class and TA sessions) • Programming skills
Course Grading • Class and TA attendance – 5% (mandatory, contact me if you cannot attend) • Student lecture – 20% (last hour every week) • 40% - four home work assignments (one programming question plus a few theoretical ones) • 35% - final project (?)
Talk Outline • Language and Language Technology • Language as a test case for scientific modeling • What is NLP ? • Basic NLP tasks (focus of this course) • NLP Applications • Why is NLP hard (ambiguity)
Talk Outline • Language and Language Technology • Language as a test case for scientific modeling • What is NLP ? • Basic NLP tasks (focus of this course) • NLP Applications • Why is NLP hard (ambiguity)
Modeling Language learning and Processing The four components of scientific description of a phenomenon • Theory • Model • Parameter estimation, a.k.a learning • Prediction, a.k.a inference
Modeling Language Learning and Processing The four components of scientific description of a phenomenon • Theory – many layers of language analysis are hidden (i.e. does not exist in the world) • Model • Parameter estimation, a.k.a learning • Prediction, a.k.a inference
Modeling Language Learning and Processing The four components of scientific description of a phenomenon • Theory – many layers of language analysis are hidden (i.e. does not exist in the world) • Model – since many of the layers are hidden there can be strong disagreement on their modeling • Parameter estimation, a.k.a learning • Prediction, a.k.a inference
Modeling Language Learning and Processing The four components of scientific description of a phenomenon • Theory – many layers of language analysis are hidden (i.e. does not exist in the world) • Model – since many of the layers are hidden there can be strong disagreement on their modeling • Parameter estimation, a.k.a learning – as models contain hidden layers this can be complicated • Prediction, a.k.a inference – as models contain hidden layers this can be complicated
Talk Outline • Language and Language Technology • Language as a test case for scientific modeling • What is NLP ? • Basic NLP tasks (focus of this course) • NLP Applications • Why is NLP hard (ambiguity)
What is in NLP ? • Machine Learning and Optimization (especially structured prediction)
What is in NLP ? • Machine Learning and optimization (especially structured prediction) • Statistical Modeling (parameter estimation, inference)
What is in NLP ? • Machine Learning and optimization (especially structured prediction) • Statistical Modeling (parameter estimation, inference) • Linguistics
What is in NLP ? • Machine Learning and optimization (especially structured prediction) • Statistical Modeling (parameter estimation, inference) • Linguistics • Cognitive Science
Talk Outline • Language and Language Technology • Language as a test case for scientific modeling • What is NLP ? • Basic NLP tasks (focus of this course) • NLP Applications • Why is NLP hard (ambiguity)
Basic NLP Tasks • Basic input analysis (word segmentation):
Basic NLP Tasks • Basic input analysis (word segmentation):
Basic NLP Tasks • Word level analysis (morphological segmentation):
Basic NLP Tasks • Sentence level analysis: Syntax, Part-of-Speech (POS) tagging: I gave the smaller balls to Itamar PRP V DT JJ N IN N PRP – Personal Pronoun JJ - Adjective V – Verb N – Noun DT – Determiner IN - Preposition
Basic NLP Tasks • Sentence level analysis: Syntax, syntactic parsing: I gave the ball to Itamar PRP VBD DT NN TO NNP
Basic NLP Tasks • Sentence level analysis: lexical semantics:
Basic NLP Tasks • Word meaning (semantics):
Basic NLP Tasks • Multilingual Processing:
Talk Outline • Language and Language Technology • Language as a test case for scientific modeling • What is NLP ? • Basic NLP tasks (focus of this course) • NLP Applications • Why is NLP hard (ambiguity)
Text Processing A powerful car bomb exploded today in Baghdad inside the holiest shite shrine. As many as 95 people were killed in the event, according to sources in Washington. The blast came only two days after another car bomb exploded in a crowdedstreet in Mosul in the northern part of Iraq, killing 13 pedestrians, in an attack carried by Al Qaeda. Together with the shooting in Najaf three weeks ago that killed 15 American soldiers, violence seemed to spike to its highest level. The bombing today in the capital of Iraq …
Text Processing – Field Labeling A powerful car bomb exploded today in Baghdadinside the holiest shite shrine. As many as 95 people were killed in the event, according to sources in Washington. The blast came only two days after another car bomb exploded in a crowdedstreet in Mosul in the northern part of Iraq, killing 13 pedestrians, in an attack carried by Al Qaeda. Together with the shooting in Najaf three weeks ago that killed 15 American soldiers, violence seemed to spike to its highest level. The bombing today in the capital of Iraq ….
Text Processing - Event Segmentation Event 1 • A powerful car bomb exploded today in Baghdadinside the holiest shite shrine. As many as 95 people were killed in the event, according to sources in Washington. The blast came … Event 2 • only two days after another car bomb exploded in a crowdedstreet in Mosulin the northern part of Iraq, killing 13 pedestrians, in an attack carried by Al Qaeda. Event 3 • Together with the shooting in Najafthree weeks ago that killed 15 American soldiers, violence seemed to spike to its highest level. The bombing today in the capital of Iraq …. Event 1
T1 H1 T1 T1 “At the end of the year, all solid companies pay dividends.” “At the end of the year, all solid companies pay dividends.” H1 H2 “At the end of the year, all solid insurance companies pay dividends.” “At the end of the year, all solid companies pay cash dividends.” T1 H2 Textual Entailment
Textual Entailment • Things can be much more challenging ….. Context Sensitive Paraphrasing: • Can speak replace command? • The general commanded his troops. • The general spoke to his troops. • The soloist commandedattention. • The soloist spoke to attention.
This Course • This course will focus on the basic NLP tasks: • Language modeling • Tagging tasks (sequence learning) • Syntax (as the prototype of complex structure learning) • Semantics (with an emphasize on lexical semantics) • We will take a data-driven, machine learning based approach
This Course • Why do we take this approach: • Allows us to develop a principled model based approach to NLP • Allows us to develop the fundamental machine learning tools of NLP • Particularly, allows us to learn about predicting structure • Focuses on the core of NLP – the models we will develop make NLP applications possible
Talk Outline • Language and Language Technology • Language as a test case for scientific modeling • What is NLP ? • Basic NLP tasks (focus of this course) • NLP Applications • Why is NLP hard (ambiguity)