1 / 20

LING 408/508: Computational Techniques for Linguists

LING 408/508: Computational Techniques for Linguists. Lecture 7 9 /5/2012. Outline. Sets For loops. Creating sets. >>> s1 = {1,2,3} >>> s1 {1, 2, 3} >>> s2 = set() # empty set >>> s2 {} >>> s3 = set([1,2,3]) # call set constructor >>> s3 # on a list {1, 2, 3}

andra
Download Presentation

LING 408/508: Computational Techniques for Linguists

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LING 408/508: Computational Techniques for Linguists Lecture 7 9/5/2012

  2. Outline • Sets • For loops

  3. Creating sets >>> s1 = {1,2,3} >>> s1 {1, 2, 3} >>> s2 = set()# empty set >>> s2 {} >>> s3 = set([1,2,3]) # call set constructor >>> s3 # on a list {1, 2, 3} >>> list(s3) # convert back to list [1, 2, 3]

  4. Properties of sets • Elements of a set are unordered; order in which they are printed doesn’t matter >>> {1, 2, 3} == {3, 2, 1} True • A set represents each member only once >>> {1, 1, 2, 2, 3, 3} == {1, 2, 3} True • Doesn’t necessarily display in sorted order >>> {6,65,4,21,3,4,7,1} {1, 3, 4, 6, 65, 7, 21}

  5. Set operators >>> A = {1,2,3,4} >>> B = {3,4,5,6} >>> A & B # intersection: in A and B {3, 4} >>> A | B # union: in A or B {1, 2, 3, 4, 5, 6} >>> A – B # difference: remove B from A {1, 2}

  6. in operator and len function >>> S = {1,2,3} >>> 3 in S # in: membership operator True >>> 4 in S False >>> len(S) # built-in length function 3

  7. Set methods (not covering most of them) >>> S = {1, 2} >>> S.add(3) # add an element to the set >>> S {1, 2, 3} >>> S.remove(3) # remove element from set >>> S {1, 2} >>> S.update([2,3,4]) # add multiple >>> S # elements to a set {1, 2, 3, 4}

  8. Membership in sets vs. liststhrough the in operator >>> myset = {1,2,3} >>> 2 in myset True >>> mylist = [1,2,3] >>> 2 in mylist True • Constant O(1) time to search for a value in a set • Through a computational technique called hashing • Discussed in Miller & Ranum • Linear O(N) time to search for a value in a list • Compares starting at first element in list • Goes through entire list until match is found

  9. import time • mylist = list(range(10000000)) • myset = set(range(10000000)) • start = time.clock() • print(9999999 in mylist) • end = time.clock() • print(end-start, 'seconds') • start = time.clock() • print(9999999 in myset) • end = time.clock() • print(end-start, 'seconds') • # Output: • # • # True • # 0.7740879078206859 seconds • # True • # 0.010139137795448505 seconds

  10. Searching for values in lists or sets • If a list is sorted, it is fastest to use binary search: O(log2 N) • If a list is not sorted, and you will perform many searches, it is best to convert it to a set, and then search for values in the set • Each search in a set is O(1) • Common error for novice Python programmers!

  11. Example application • Vocabulary of a language: 10,000s of words • Often want to take an input sentence, determine if a word is in the vocabulary • If the vocabulary is represented as a list: O(N) for each search • If the vocabulary is represented as a list: O(1) for each search

  12. Outline • Sets • For loops

  13. while loop i = 0 # initialize loop counter while <condition>: # Boolean test <statements> i += 1 # increment loop counter <statements>

  14. for loop • Syntax: for <item> in <iterable>: <statements> • Iterable: list, set, iterator • Example: >>> for i in [0,1,2,3]: print(i) 0 1 2 3

  15. Compare for loop and while loop • Any loop can be coded with either while or for • for loop combines initialization and increment of the loop counter in one line for i in [0,1,2,3]: # i takes on values print(i) # of 0, 1, 2, and 3 i = 0 while i <= 3: # i takes on values print(i) # of 0, 1, 2, and 3 i += 1

  16. Iterate over values of a list:list indexing, and directly over contents of list >>> L = ['a', 'b', 'c'] >>> range(len(L)) [0, 1, 2] >>> for i in range(len(L)): print(L[i]) a b c >>> for x in L: # x takes on values of print(x) # items in the list a b c

  17. 4 ways to sum a list # 3. for loop over # list indices s = 0 for i in range(len(L)): s += L[i] # 4. for loop over # list contents s = 0 for x in L: s += x # 1. call built-in # sum function s = sum(L) # 2. while loop s = 0 i = 0 while i < len(L): s += L[i] i += 1

  18. Advantage of explicit index:control over loop counter >>> L = ['A','B','C','D','E','F','G','H','I'] >>> # print every 3rd item, begin at index 1 >>> # can only be done with loop counter >>> for i in range(1, len(L), 3): print(L[i]) B E H

  19. Advantage of explicit index:index multiple lists • Write a function to compute the dot product of two vectors a and b. • Let a = [a1, a2, …an] and b = [b1, b2, …, bn]. The dot product of a and b is a1*b1 + a2*b2 + … + an*bn. def dot_product(a, b): dp = 0.0 for i in range(len(a)): dp += a[i] * b[i] return dp

  20. range function and iterators • range returns an iterator, which, in the context of a for loop, produces a value in each iteration of the loop • Why: lower overhead (i.e., less computation & memory) • Otherwise, to get a list from range, must call list constructor • for x in range(5): • okay • for x in list(range(5)): • no need to convert to list; use previous iterator version • for x in range(5000000000): • ok • for x in list(range(5000000000)): • Bad, will (try to) construct a big huge list

More Related