200 likes | 348 Views
LING 408/508: Computational Techniques for Linguists. Lecture 7 9 /5/2012. Outline. Sets For loops. Creating sets. >>> s1 = {1,2,3} >>> s1 {1, 2, 3} >>> s2 = set() # empty set >>> s2 {} >>> s3 = set([1,2,3]) # call set constructor >>> s3 # on a list {1, 2, 3}
E N D
LING 408/508: Computational Techniques for Linguists Lecture 7 9/5/2012
Outline • Sets • For loops
Creating sets >>> s1 = {1,2,3} >>> s1 {1, 2, 3} >>> s2 = set()# empty set >>> s2 {} >>> s3 = set([1,2,3]) # call set constructor >>> s3 # on a list {1, 2, 3} >>> list(s3) # convert back to list [1, 2, 3]
Properties of sets • Elements of a set are unordered; order in which they are printed doesn’t matter >>> {1, 2, 3} == {3, 2, 1} True • A set represents each member only once >>> {1, 1, 2, 2, 3, 3} == {1, 2, 3} True • Doesn’t necessarily display in sorted order >>> {6,65,4,21,3,4,7,1} {1, 3, 4, 6, 65, 7, 21}
Set operators >>> A = {1,2,3,4} >>> B = {3,4,5,6} >>> A & B # intersection: in A and B {3, 4} >>> A | B # union: in A or B {1, 2, 3, 4, 5, 6} >>> A – B # difference: remove B from A {1, 2}
in operator and len function >>> S = {1,2,3} >>> 3 in S # in: membership operator True >>> 4 in S False >>> len(S) # built-in length function 3
Set methods (not covering most of them) >>> S = {1, 2} >>> S.add(3) # add an element to the set >>> S {1, 2, 3} >>> S.remove(3) # remove element from set >>> S {1, 2} >>> S.update([2,3,4]) # add multiple >>> S # elements to a set {1, 2, 3, 4}
Membership in sets vs. liststhrough the in operator >>> myset = {1,2,3} >>> 2 in myset True >>> mylist = [1,2,3] >>> 2 in mylist True • Constant O(1) time to search for a value in a set • Through a computational technique called hashing • Discussed in Miller & Ranum • Linear O(N) time to search for a value in a list • Compares starting at first element in list • Goes through entire list until match is found
import time • mylist = list(range(10000000)) • myset = set(range(10000000)) • start = time.clock() • print(9999999 in mylist) • end = time.clock() • print(end-start, 'seconds') • start = time.clock() • print(9999999 in myset) • end = time.clock() • print(end-start, 'seconds') • # Output: • # • # True • # 0.7740879078206859 seconds • # True • # 0.010139137795448505 seconds
Searching for values in lists or sets • If a list is sorted, it is fastest to use binary search: O(log2 N) • If a list is not sorted, and you will perform many searches, it is best to convert it to a set, and then search for values in the set • Each search in a set is O(1) • Common error for novice Python programmers!
Example application • Vocabulary of a language: 10,000s of words • Often want to take an input sentence, determine if a word is in the vocabulary • If the vocabulary is represented as a list: O(N) for each search • If the vocabulary is represented as a list: O(1) for each search
Outline • Sets • For loops
while loop i = 0 # initialize loop counter while <condition>: # Boolean test <statements> i += 1 # increment loop counter <statements>
for loop • Syntax: for <item> in <iterable>: <statements> • Iterable: list, set, iterator • Example: >>> for i in [0,1,2,3]: print(i) 0 1 2 3
Compare for loop and while loop • Any loop can be coded with either while or for • for loop combines initialization and increment of the loop counter in one line for i in [0,1,2,3]: # i takes on values print(i) # of 0, 1, 2, and 3 i = 0 while i <= 3: # i takes on values print(i) # of 0, 1, 2, and 3 i += 1
Iterate over values of a list:list indexing, and directly over contents of list >>> L = ['a', 'b', 'c'] >>> range(len(L)) [0, 1, 2] >>> for i in range(len(L)): print(L[i]) a b c >>> for x in L: # x takes on values of print(x) # items in the list a b c
4 ways to sum a list # 3. for loop over # list indices s = 0 for i in range(len(L)): s += L[i] # 4. for loop over # list contents s = 0 for x in L: s += x # 1. call built-in # sum function s = sum(L) # 2. while loop s = 0 i = 0 while i < len(L): s += L[i] i += 1
Advantage of explicit index:control over loop counter >>> L = ['A','B','C','D','E','F','G','H','I'] >>> # print every 3rd item, begin at index 1 >>> # can only be done with loop counter >>> for i in range(1, len(L), 3): print(L[i]) B E H
Advantage of explicit index:index multiple lists • Write a function to compute the dot product of two vectors a and b. • Let a = [a1, a2, …an] and b = [b1, b2, …, bn]. The dot product of a and b is a1*b1 + a2*b2 + … + an*bn. def dot_product(a, b): dp = 0.0 for i in range(len(a)): dp += a[i] * b[i] return dp
range function and iterators • range returns an iterator, which, in the context of a for loop, produces a value in each iteration of the loop • Why: lower overhead (i.e., less computation & memory) • Otherwise, to get a list from range, must call list constructor • for x in range(5): • okay • for x in list(range(5)): • no need to convert to list; use previous iterator version • for x in range(5000000000): • ok • for x in list(range(5000000000)): • Bad, will (try to) construct a big huge list