1 / 31

Lecture 1: Introduction

Lecture 1: Introduction. Methods in Computational Linguistics II Queens College. Methods in Computational Linguistics II. 2 nd semester of a two semester course providing instruction in The basics of computer science and programming (via python)

aviva
Download Presentation

Lecture 1: Introduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 1: Introduction Methods in Computational Linguistics II Queens College

  2. Methods in Computational Linguistics II • 2nd semester of a two semester course providing instruction in • The basics of computer science and programming (via python) • An introduction to techniques in computational linguistics

  3. My background • Research • Speech Synthesis, and Recognition • Prosody (Intonation) • Speech Segmentation • Non-native speech • Political speech, and other paralinguistics • Computer Science professor at Queens and CUNY GC. • Worked at IBM and Google

  4. Your Background • Name. • What are your research interests in linguistics? • How do you expect computational linguistics to fit into your work? • Are there techniques or applications that you are particularly looking to learn • Programming background? • 1 semester? more? • Are you simultaneously taking Language Technologies

  5. Outline • NLTK • Overview • Major Capabilities • Searching and Sorting. • Linear (Sequential) search • Binary Search • Insertion sort • MergeSort • Course Policies • Syllabus Review

  6. NLTK Natural Language Toolkit. A set of utilities in python that facilitate the processing of text.

  7. NLTK Functionality Accessing corpora String processing Collocation discovery Part of speech tagging Classification and Clustering Evaluation Metrics Chunking Parsing

  8. NLTK Functionality • Semantic interpretation • first order logic, lambda calculus, model checking • Probability and estimation • WordNet Browsing • Chatbots

  9. NLTK as a resource This range of functionality is quite broad, and not necessarily cohesive. However, there are resources and tools (functions and objects) that underpin most major computational linguistics tasks.

  10. Major Computational Linguistics Tasks • Syntax • Tagging • Parsing • Semantics • Information Extraction • Semantic Role Labeling • Phonology • Sentence Processing • Segmentation • Summarization • Speech Recognition • Speech Synthesis • Information Retrieval • Sentiment Analysis • Authorship studies • Co-reference resolution

  11. NLTK Resources • NLTK also contained lexical material • Project Gutenberg • WordNet • Penn Treebank (subset) • Named Entity Recognition data • Inaugural addresses • Sentiment data • Names corpus • Switchboard (subset) • TIMIT • Webtext

  12. Quick Assignment • Methods I used NLTK. • Homework 0 • Make sure that NLTK is installed and working correctly • Install matplotlib to use nltk’s graphing functions. • “Due” asap.

  13. One Question Pop Quiz Solve for p

  14. Math • Computational Linguistics requires a not-quite-trivial amount of math. • Statistics and probabilistic modeling form the pillars underlying these computational techniques. • This involves counting and algebra. • Machine learning governs the classification and clustering techniques that CL makes heavy use of. • Requires calculus, statistics, linear algebra.

  15. Math in this course • Overview of probability. • Next class • Algebra for evaluation, some common features • Statistics for Naïve Bayes classification • Entropy in Decision Trees

  16. Outline • NLTK • Overview • Major Capabilities • Searching and Sorting. • Linear (Sequential) search • Binary Search • Insertion sort • MergeSort • Course Policies • Syllabus Review

  17. Data Structures, Algorithms, etc. • In computer science, there is a tight relationship between data structures and algorithms • In general, the more complex the data structure • the more general or flexible the data and relationships that can be represented • the faster algorithms can run

  18. Searching and Sorting Searching and sorting is a frequent example of the relationship between algorithm runtimes, and data structuring. Search: identify the location of a value, x, in a list, A. Sort: manipulate a list A, such that the values in A are increasing. A[i] <= A[i+1]

  19. Sequential Search def search(A, x): for i in xrange(len(A)): if A[i] == x: return i return -1

  20. How long does sequential search take to run? Best case? Worst case? Average case?

  21. Binary Search def search(A, x): top = len(A) bottom = 0 while bottom < top: mid = (top + bottom) / 2 if A[mid] < x: bottom = mid + 1 elif A[mid] > x: top = mid else: return mid return -1 If the list A is in increasing order, large chunks of the list can be be ignored.

  22. How long does binary search take to run? Best Case? Worst Case? Average Case?

  23. Improvement of Binary Search • Binary search is a significant improvement • log n < n • However, Binary search requires that A is sorted. • How long does it take to sort an Array and how does this impact the total runtime?

  24. Insertion Sort definsertionSort(A): for j in xrange(1, len(A)): key = A[j] i = j - 1 while i > -1 and A[i] > key: A[i + 1] = A[i] i = i - 1 A[i + 1] = key Sort the list [5, 2, 4, 6, 1, 3]

  25. How long does Insertion sort take to run? Best Case? Worst Case? Average Case?

  26. Can we sort faster? Yes. This requires recursion. We’ll come back to this, but here is a first example.

  27. Merge Sort defmergeSort(A): if len(A) == 1: return A mid = len(A) / 2 Abottom = mergeSort(A[1:mid]) Atop = mergeSort(A[mid + 1:len(A)]) return merge(Abottom, Atop)

  28. Merge def merge(A, B): C = [] i = 0 j = 0 A.append(float('inf')) B.append(float('inf')) for k in xrange(len(A) + len(B)): if A[i] < B[j]: C.append(A[i]) i = i + 1 else: C.append(B[j]) j = j + 1 return C

  29. How long does Merge Sort take to run? Hint: This is a (much) harder question. Best Case? Worst Case? Average Case?

  30. Comparison of run times How much searching do you need to do to make it worth sorting?

  31. Class Structure and Policies • Course website: • http://eniac.cs.qc.cuny.edu/andrew/methods2/syllabus.html • Email list • Banner does not have an email function • Put your email address on the sign up sheet.

More Related