220 likes | 406 Views
Python Crash Course File I/O. 3 rd year Bachelors V1.0 dd 04-09-2013 Hour 1. File I/O. Types of input/output available Interactive Keyboard Screen Files Ascii/text txt csv Binary Structured FITS > pyFITS, astropy.io.fits URL Pipes. Interactive I/O, fancy output.
E N D
Python Crash CourseFile I/O 3rd year Bachelors V1.0 dd 04-09-2013 Hour 1
File I/O • Types of input/output available • Interactive • Keyboard • Screen • Files • Ascii/text • txt • csv • Binary • Structured • FITS > pyFITS, astropy.io.fits • URL • Pipes
Interactive I/O, fancy output >>> s = 'Hello, world.' >>> str(s) 'Hello, world.' >>> repr(s) "'Hello, world.'" >>> str(1.0/7.0) '0.142857142857' >>> repr(1.0/7.0) '0.14285714285714285' >>> x = 10 * 3.25 >>> y = 200 * 200 >>> s = 'The value of x is ' + repr(x) + ', and y is ' + repr(y) + '...' >>> print s The value of x is 32.5, and y is 40000... >>> # The repr() of a string adds string quotes and backslashes: ... hello = 'hello, world\n' >>> hellos = repr(hello) >>> print hellos 'hello, world\n' >>> # The argument to repr() may be any Python object: ... repr((x, y, ('spam', 'eggs'))) "(32.5, 40000, ('spam', 'eggs'))"
Interactive I/O, fancy output Old string formatting >>> import math >>> print 'The value of PI is approximately %5.3f.' % math.pi The value of PI is approximately 3.142. New string formatting >>> table = {'Sjoerd': 4127, 'Jack': 4098, 'Dcab': 7678} >>> for name, phone in table.items(): ... print '{0:10} ==> {1:10d}'.format(name, phone) ... Jack ==> 4098 Dcab ==> 7678 Sjoerd ==> 4127
Formatting I/O A conversion specifier contains two or more characters and has the following components, which must occur in this order: • The "%" character, which marks the start of the specifier. • Mapping key (optional), consisting of a parenthesised sequence of characters (for example, (somename)). • Conversion flags (optional), which affect the result of some conversion types. • Minimum field width (optional). If specified as an "*" (asterisk), the actual width is read from the next element of the tuple in values, and the object to convert comes after the minimum field width and optional precision. • Precision (optional), given as a "." (dot) followed by the precision. If specified as "*" (an asterisk), the actual width is read from the next element of the tuple in values, and the value to convert comes after the precision. • Length modifier (optional). • Conversion type. >>> print '%(language)s has %(#)03d quote types.' % \ {'language': "Python", "#": 2} Python has 002 quote types.
Interactive I/O >>> print “Python is great,”, ”isn’t it?” >>> str = raw_input( “Enter your input: ”) >>> print “Received input is: “,str Enter your input: Hello Python Received input is: Hello Python >>> str = input("Enter your input: "); >>> print "Received input is: ", str Enter your input: [x*5 for x in range(2,10,2)] Received input is: [10, 20, 30, 40] If the readline modules was loaded the raw_input() will use it to provide elaborate line editing and history features.
File I/O >>> fname = ‘myfile.dat’ >>> f = file(fname) >>> lines = f.readlines() >>> f.close() >>> f = file(fname) >>> firstline = f.readline() >>> secondline = f.readline() >>> f = file(fname) >>> for l in f: ... print l.split()[1] >>> f.close() >>> outfname = ‘myoutput’ >>> outf = file(outfname, ‘w’) # second argument denotes writable >>> outf.write(‘My very own file\n’) >>> outf.close()
Read File I/O >>> f = open("test.txt") >>> # Read everything into single string: >>> content = f.read() >>> len(content) >>> print content >>> f.read() # At End Of File >>> f.close() >>> # f.read(20) reads (at most) 20 bytes Using with block: >>> with open(’test.txt’, ’r’) as f: ... content = f.read() >>> f.closed CSV file: >>> import csv >>> ifile = open(’photoz.csv’, "r") >>> reader = csv.reader(ifile) >>> for row in reader: ... print row, >>> ifile.close()
Read and write text file >>> from numpy import * >>> data = loadtxt("myfile.txt") # myfile.txt contains 4 columns of numbers >>> t,z = data[:,0], data[:,3] # data is a 2D numpy array, t is 1st col, z is 4th col >>> t,x,y,z = loadtxt("myfile.txt", unpack=True) # to automatically unpack all columns >>> t,z = loadtxt("myfile.txt", usecols = (0,3), unpack=True) # to select just a few columns >>> data = loadtxt("myfile.txt", skiprows = 7) # to skip 7 rows from top of file >>> data = loadtxt("myfile.txt", comments = '!') # use '!' as comment char instead of '#' >>> data = loadtxt("myfile.txt", delimiter=';') # use ';' as column separator instead of whitespace >>> data = loadtxt("myfile.txt", dtype = int) # file contains integers instead of floats >>> from numpy import * >>> savetxt("myfile.txt", data) # data is 2D array >>> savetxt("myfile.txt", x) # if x is 1D array then get 1 column in file. >>> savetxt("myfile.txt", (x,y)) # x,y are 1D arrays. 2 rows in file. >>> savetxt("myfile.txt", transpose((x,y))) # x,y are 1D arrays. 2 columns in file. >>> savetxt("myfile.txt", transpose((x,y)), fmt='%6.3f') # use new format instead of '%.18e' >>> savetxt("myfile.txt", data, delimiter = ';') # use ';' to separate columns instead of space
String formatting for output >>> sigma = 6.76/2.354 >>> print(‘sigma is %5.3f metres’%sigma) sigma is 2.872 metres >>> d = {‘bob’: 1.87, ‘fred’: 1.768} >>> for name, height in d.items(): ... print(‘%s is %.2f metres tall’%(name.capitalize(), height)) ... Bob is 1.87 metres tall Fred is 1.77 metres tall >>> nsweets = range(100) >>> calories = [i * 2.345 for i in nsweets] >>> fout = file(‘sweetinfo.txt’, ‘w’) >>> for i in range(nsweets): ... fout.write(‘%5i %8.3f\n’%(nsweets[i], calories[i])) ... >>> fout.close()
File I/O, CSV files • CSV (Comma Separated Values) format is the most common import and export format for spreadsheets and databases. • Functions • csv.reader • csv.writer • csv.register_dialect • csv.unregister_dialect • csv.get_dialect • csv.list_dialects • csv.field_size_limit
File I/O, CSV files • Reading CSV files • Writing CSV files import csv # imports the csv module f = open('data1.csv', 'rb') # opens the csv file try: reader = csv.reader(f) # creates the reader object for row in reader: # iterates the rows of the file in orders print row # prints each row finally: f.close() # closing import csv ifile = open('test.csv', "rb") reader = csv.reader(ifile) ofile = open('ttest.csv', "wb") writer = csv.writer(ofile, delimiter='\t', quotechar='"', quoting=csv.QUOTE_ALL) for row in reader: writer.writerow(row) ifile.close() ofile.close()
File I/O, CSV files • The csv module contains a the following quoting options. • csv.QUOTE_ALL Quote everything, regardless of type. • csv.QUOTE_MINIMAL Quote fields with special characters • csv.QUOTE_NONNUMERIC Quote all fields that are not integers or floats • csv.QUOTE_NONE Do not quote anything on output
File I/O, Pickle • Pickle: powerful algorithm for serializing and de-serializing a Python object structure • can transform a complex object into a byte stream • can transform the byte stream into an object with the same internal structure • most obvious thing to do with these byte streams is to write them onto a file • also conceivable to send them across a network or store them in a database • The following types can be pickled: • None, True, and False • integers, long integers, floating point numbers, complex numbers • normal and Unicode strings • tuples, lists, sets, and dictionaries containing only picklable objects • functions defined at the top level of a module • built-in functions defined at the top level of a module • classes that are defined at the top level of a module • instances of such classes whose __dict__ or the result of calling __getstate__() is picklable (see section The pickle protocol for details).
File I/O, Pickle • Example save import pickle data1 = {'a': [1, 2.0, 3, 4+6j], 'b': ('string', u'Unicode string'), 'c': None} selfref_list = [1, 2, 3] selfref_list.append(selfref_list) output = open('data.pkl', 'wb') # Pickle dictionary using protocol 0. pickle.dump(data1, output) # Pickle the list using the highest protocol available. pickle.dump(selfref_list, output, -1) output.close()
File I/O, Pickle (dp0 S'a' p1 (lp2 I1 aF2.0 aI3 ac__builtin__ complex p3 (F4.0 F6.0 tp4 Rp5 asS'c' p6 NsS'b' p7 (S'string' p8 VUnicode string p9 tp10 s.�]q • Example load import pprint, pickle pkl_file = open('data.pkl', 'rb') data1 = pickle.load(pkl_file) pprint.pprint(data1) data2 = pickle.load(pkl_file) pprint.pprint(data2) pkl_file.close()
File I/O, Pickle • Real live example AstroWise cluster job submission • client – server model • exchanging code & data for remote processing def dpu_packit(*args): return pickle.dumps(args) def dpu_unpackit(data): return pickle.loads(data) # Sender side: def submitremotejobs(self, key, zip=None, jobs=[], env=None): if not len(jobs): return False return self.senddata(key, dpu_packit((zip, env), jobs)) # Receiver side: data = self.get_data() ((code, env), jobdictlist) = dpu_unpackit(data) make_code_file(key, code)
URL • URLS can be used for reading >>> import urllib2 >>> url = 'http://python4astronomers.github.com/_downloads/data.txt' >>> response = urllib2.urlopen(url) >>> data = response.read() >>> print data RAJ DEJ Jmag e_Jmag 2000 (deg) 2000 (deg) 2MASS (mag) (mag) ---------- ---------- ----------------- ------ ------ 010.684737 +41.269035 00424433+4116085 9.453 0.052 010.683469 +41.268585 00424403+4116069 9.321 0.022 010.685657 +41.269550 00424455+4116103 10.773 0.069 010.686026 +41.269226 00424464+4116092 9.299 0.063 010.683465 +41.269676 00424403+4116108 11.507 0.056 010.686015 +41.269630 00424464+4116106 9.399 0.045 010.685270 +41.267124 00424446+4116016 12.070 0.035
URL • URLS sometimes need input data. Such as POST data for a form import urllib import urllib2 url = 'http://www.someserver.com/cgi-bin/register.cgi' values = {'name' : 'Michael Foord', 'location' : 'Northampton', 'language' : 'Python' } data = urllib.urlencode(values) req = urllib2.Request(url, data) response = urllib2.urlopen(req) the_page = response.read()
URL • And for GET type of parameter passing: import urllib import urllib2>>> import urllib2 >>> import urllib >>> data = {} >>> data['name'] = 'Somebody Here' >>> data['location'] = 'Northampton' >>> data['language'] = 'Python' >>> url_values = urllib.urlencode(data) >>> print url_values # The order may differ. name=Somebody+Here&language=Python&location=Northampton >>> url = 'http://www.example.com/example.cgi' >>> full_url = url + '?' + url_values >>> handler = urllib2.urlopen(full_url) Note that the full URL is created by adding a ? to the URL, followed by the encoded values.
End Introduction to language