350 likes | 575 Views
Learning to Program With Python. Advanced Class 8. Topics. The standard library!. Topics. Going to look at a bunch of modules! re (regular expressions), pretty print, copy, time, pickle, csv . First up: re (regular expressions). re is python’s regex module.
E N D
Learning to ProgramWith Python Advanced Class 8
Topics • The standard library!
Topics • Going to look at a bunch of modules! • re (regular expressions), pretty print, copy, time, pickle, csv.
First up: re (regular expressions) • re is python’s regex module. • Before we dive into this, let’s introduce raw strings. • Python simply expresses regex patterns as strings, there are no forward slashes as delimiters.
Raw strings • Imagine the regular expression “\\”, which matches a backslash character. • But Python would treat the first backslash as an escape character, so the regex engine just gets a literal “\”. Which is an error, because to match a literal backslash, the engine needs to escape the backlash, so it would have to be “\\”….which means the original string has to be “\\\\”.
Raw strings • The solution is to use Python’s raw strings, in which the backslash is just another character and has no special meaning. • Raw strings look like this: r“text” • The raw string r“\n” is not a newline….it’s two characters: a backslash and an ‘n’.
Raw strings • This way Python’s use of the \ character doesn’t interfere with our regex….we just write them as we normally would. • So, just to be clear: ALWAYS USE RAW STRINGS WITH REGULAR EXPRESSIONS IN PYTHON.
The re module • The first common function is re.search(pattern, text). match = re.search(r'[\w.-]+@[\w.-]+', text) #emailif match:print(match.group()) ## 'alice-b@google.com else: print(“No match found.”)
re.search(pattern, text) • The search function returns a Match object if a match is found, or else returns None, hence we checked if the match variable had been initialized as a Match. If it has, we call the .group() method on the Match, which returns the matched text. • If it hasn’t, it’s None, which is treated as False by the if statement.
The Match object • The .group() method returns the matched text. • The .start() method returns the index location where the match begins in the original text. • The .end() method returns the index location where the match ends in the original text.
Note on re.search() • re.search() returns the first match it finds. Any other subsequent matches are not mentioned. • If you want to find Match objects for every non-overlapping match, use re.finditer(pattern, text). Best used like this: for match in re.finditer(pattern, text): #do something with Match object in match
re.findall(pattern, text) • This is really straightforward– it returns a list of strings representing all the non-overlapping matches found for pattern in text. • No Match objects involved. There’s no way to find the location of the matches using this method, so .finditer() may be more appropriate depending on the situation.
re.sub(pattern, replacement, text) • This function finds all non-overlapping occurrences of pattern within textand replaces them with replacement (another string).
re.compile(pattern) • For reasons no longer clear to me, this was a concept I had initially found immensely challenging. • The way RE works, a regular expression is “compiled” into an object that is used for matching. • Every time you use re.search, re.findall, re.match, etc, the pattern you provide is compiled for use by the engine.
re.compile(pattern) • That compilation can be time-consuming if done repeatedly many times. • So if you need to search for a particular pattern over and over, do this instead: pattern = re.compile(r“whatever”) match = pattern.search(“text to be searched)
The pattern object • That’s what’s returned by re.compile. • The pattern object has method versions of the module functions we just discussed– search, findall, sub, finditer. They work the exact same, except of course they don’t need a pattern as an argument. • This way, the pattern isn’t compiled every single time. You just reuse the compiled pattern.
Next: The pretty print module • This is mainly for neatly printing nested data structured. • Try executing print([[[2, 3, 4, 5], [5, 6, 7, 8], [4, 5, 6, 7]] [2, 3, 4, 5] [5, 6, 7, 8] [4, 5, 6, 7] [2, 3, 4, 5] [5, 6, 7, 8] [4, 5, 6, 7]) • It’s ugly.
Pretty print • But try pprint.pprint([[2, 3, 4, 5], [5, 6, 7, 8], [4, 5, 6, 7]] [2, 3, 4, 5] [5, 6, 7, 8] [4, 5, 6, 7] [2, 3, 4, 5] [5, 6, 7, 8] [4, 5, 6, 7]) • It’s so pretty! • Very useful for displaying this kind of output. So readable!
Pretty print • There’s a bunch of extra optional parameters to the pprint function, letting you adjust column width and spacing and whatever. • The module has a couple other related functions but that’s mainly it.
The copy module • Remember our discussion of mutability from advanced class 2? • The copy module can be used to make a true copy of anything (except primitives like ints, floats, and bools.) • These slides assume you have retained your knowledge from advanced class 2.
shallow copies with the .copy() function >>> import copy >>> m = [1,2,3] >>> x = copy.copy(m) >>> m.append(4) >>> x [1, 2, 3] >>> #totally separate lists….not tied in any way
deep copies withthe .deepcopy() function • The deepcopy function not only copies the object itself, but copies all its member objects….so if you have a list containing lists of dictionaries, ALL of them are copied. • The deepcopy function eliminates all ties to the original object and gives you a true and complete copy. >>> m = [1,2,3] >>> x = copy.deepcopy(m) #very simple to use…nothing to it
The pickle module • One thing to talk about first….
Object Serialization • To “serialize” an object is to render it into a storable form in a file or database. While it is of course straightforward to write a string or int or list to a text file, how would you write an instance of our Deck class? • You would have to devise a format and a whole system of reading and writing it.
The pickle module • Luckily, python does this for you with the pickle module. • It “pickles” objects so they can be saved long-term. • It’s extremely easy to use.
The pickle module • Let’s assume we’re using my code for the Deck class. >>> import cards, pickle >>> deck = Deck() >>> with open(‘cards.pickle’, ‘wb’) as f: pickle.dump(deck, f) #Done! it’s been saved to a .pickle file. We can close the #shell and walk away.
The pickle module • Now let’s open the shell again and load that object back out of the file. >>> with open(‘cards.pickle’, ‘rb’) as f: deck = pickle.load(f) >>> deck.age 5 #It’s the same object from before!
The pickle module • What if we have a bunch of different objects we want to store? Do we make a different pickle file for each one? • http://stackoverflow.com/questions/20716812/saving-and-loading-multiple-objects-in-python-pickle-file • Tim Peters is a Python core developer….he helps build the language. He gives magnificently clear answers.
pickle vs. json • There’s another module called json which is for storing and loading files in the JSON format. • JSON is “JavaScript Object Notation”, and it’s been quickly replacing XML over the past however many years. It has a different format for accomplishing the same purpose, and is apparently quite superior. It’s certainly more readable.
pickle vs. json • According to some testing, the json module is considerably faster than the pickle module. • Some would argue you should just use JSON in all cases. • JSON can be read by any program in any language, whereas pickle is only for python. • It’s worth looking further into.
The csv module • This is for reading and writing Comma Separated Value files. • So easy to use….
The time module • All sorts of functionality for interacting with time. • Two very useful functions are: • time.localtime() – returns an object with attributes representing the year, month, day, hour, minute, and second, as well as which day of the week it is and which day of the year it is (such as the 212th day), and finally whether it’s daylight savings time or not. • time.sleep(x) – do nothing for x seconds.