230 likes | 370 Views
Session 2 Wharton Summer Tech Camp. 1: Basic Python 2: Regex. Announcement. If you did not get an email from me saying that the slides have been uploaded, please email me and I’ll add you to the list. Why ?.
E N D
Session 2Wharton Summer Tech Camp 1: Basic Python 2: Regex
Announcement If you didnot get an email from me saying that the slides have been uploaded, please email me and I’ll add you to the list
Why ? • Has many great packages useful for us (Scientific computing, Machine Learning, NLP, Scraping etc) • One of the easiest and concise language yet powerful • Memory consumption was often "better than Java and not much worse than C or C++” • Has IDLE ("Interactive DeveLopmentEnvironment") • Read-Eval-Print-Loop • It’s very similar to R • Great OOP (Compared to other comparable languages, say PERL. bless() those who use it) • Highly scalable • Easy incorporation of other languages (Cython, Jython). Wrappers. • Named after Monty Python Used by many companies as prototyping and "duct-tape" language as well as the main language: Wall Street, Con Edison, Yahoo, CERN, NASA, Google, etc. Also Youtube and Dropboxis written in Python!
Bit More Background on Python • Does few things EXCELLENTLY (OOP, Sci Comp, etc) and is generally good for lot of things • Guido van Rossum – late 1980s • Programmer oriented (easy to write and read). Use of white space. • Automatic memory management • Can be interpreted or compiled (PyPy – Just-in-time compiler) • Direct opposite of PERL when it comes to programming philosophy • PERL "there is more than one way to do it" -> Super fun when writing your own code. Rage when you debug other people’s PERL code (there is even a contest Obfuscated PERL) • Python "there should be one—and preferably only one—obvious way to do it" -> Writing your own & Reading others’ = Fun • Would you like to know more? • http://www.youtube.com/watch?v=ugqu10JV7dk • Van Rossumtalksabouthistory of python for 110 min!
Editor for Python • Recommendation • idlex – more advanced IDLE. • http://idlex.sourceforge.net/download.html • Spyderand Canopy IDE also has some good reviews • IDEs are usually heavy • Great for big projects and professional developers but for simple scripting, I’d stick with idle/idlex • If you want to feel like a badass programmer/hacker, you can learn to use EMACS or VIM editor. You have to learn to use them.
Installing Packages for Python • Enthought distribution includes many packages but you will need to download additional packages later on. • Easy_Install • https://pypi.python.org/pypi/setuptools/0.9.8#installing-and-using-setuptools • Pip • http://www.pip-installer.org/en/latest/installing.html# • These are equivalent to “install.packages()” in R • Mac users • Open up a terminal and type either and see if you have that • If you do, you can automatically download and install python packages using • sudoeasy_installpackagename • sudo pip install packagename
Let’s start coding in Python! • I’ll quickly go through basic ideas • Don’t try to get everything in the first pass • Try to just get overarching theme here • Point is to get exposed to this multiple times before it settles in • It’s good to get an overview when you first learn it before you jump into the tutorial • Followed by 10 min in-class lab with Q&A • Go home and do extensive interactive tutorial Fire up your IDLE(X). Load the file called basicpython.py from the camp website
Basic Data Types • All the standard types • Integers, floating • 2, 2.2, 3.14 etc • Strings • “Hi, I am a string” • Booleans • True • False
Hello World & Arithmetic Helloworld.py >>> print "hello, world!" #that's it # <- used for commenting Simple Arithmetic (+ - * ** / %) >>> 1+1 >>> 5**2 Booleans (operators: and, or, not, >, <, <=, ==, !=, etc) >>> True >>> False
Strings string="hello"; string+string string*3 string[0] string[-1] string[1:4] len(string)
Lists, Tuples, and Dictionaries Data structures – there are many but 4 most commonly used. Each has pros and cons. • List – list of values • Sets – set(list). You can do set operations which can be faster than going through list element one at a time. • Tuples – just like list but not mutable and fixed size. Also, style-wise, list usually consist of homogeneous stuff while tuples can consist of heterogeneous stuff and make a some sort of structure. (firstname, lastname) (name, age) • Dictionaries – Hash look up table. Index of stuff. Basic book keeping "Key->Value". Fast look up O(1).
Lists, Tuples, and Dictionaries • List – [] >>> TPlayersList=["Federer","Nadal","Murray", "Djokovic"] range(), append(),pop(),insert(),reverse(),sort() e.g. TPlayersList.sort() • Tuples – () >>> TPlayersTuple=("Federer","Nadal","Murray", "Djokovic") • Dictionaries – {} >>> TPlayersDict={ "Federer": 5, "Nadal": 4, "Murray":2, "Djokovic":1} >>>TPlayersDict["Ferrer"]=3 >>>TPlayersDict["Ferrer"] >>>del TPlayersDict["Ferrer"] let d be a dictionary then d.keys(), d.values(), d.items()
Lists, Tuples, and Dictionaries • When you are first reading in Data • Think carefully about what you want to do with the data • Then decide what data structures to use • It is common to have things like • List of List • Listof tuples - e.g., list of names • Dictionary of List – e.g., ID-> items bought • Dictionary of dictionaries – e.g., ID-> View -> Product_id – e.g., ID-> Purchase-> Product_id • Dictionary made of (tuple keys) • However, once you need things like dictionary of dictionary of dictionary of List or similar ridiculous structures, consider using object-oriented programming • Look up python Classes (http://docs.python.org/2/tutorial/classes.html)
Basic Control Flow • Boils down to • If (elif, else) • While • For • Python has better syntactic sugar for control flow to iterate through different data structure
Basic Control Flow • True Things • True • Any non-zero numbers • Any non-empty string or data structure • False Things • False • 0 • “” • Empty data structures
If and while if True: print "everything is good” else: print "?! HUHHHHH?" i=1 while (i<=5): print "Hellodoctornamecontinueyesterdaytomorrow" i+=1 if i>5: print "good morning dr. chandra"
Basic Control Flow - for for player in TPlayersList: print player for player in sorted(TPlayersList): print player for index, player in enumerate(TPlayersList): print index, player for i in xrange(1,10,2): print i for key, value in TPlayersDict.iteritems(): print key, value
continue and break • While running loops, you may need to skip or stop at some point, look up • continue • break
Defining a function def fib(n): # write Fibonacci series up to n """Print a Fibonacci series up to n.""" a, b = 0, 1 while a < n: print a, a, b = b, a+b
Importing Libraries • Import library • E.g. “import sys” • Some useful libraries • sys • re • csv • scipy • numpy • http://wiki.python.org/moin/UsefulModules#Useful_Modules.2C_Packages_and_Libraries
File IO • Reading data files into the memory • open() – returns a file object which can read or write files • open(filename, mode) • filehandle= open(filename, mode) • filehandle.readline() Mode • r= read w=write a=append rb=read in binary (windows makes that distinction)
Python Example 1 • Reading a CSV and saving each row in a list • Dealing with CSV can be very painful. • Sometimes different character encoding causes problem when reading csv • If CSV reading just doesn’t work, suspect that you have an encoding issue. Look up encodings (ISO-8859-1/latin1 to UTF-8) • This is why no serious programs really use csv as a storage mechanism • Fire up csvRead.py
Lab Do Interactive tutorials at home at http://www.codecademy.com/en/tracks/python http://www.learnpython.org/ For now, do this for 10 minutes