220 likes | 448 Views
Python Crash Course File I/O. Sterrenkundig Practicum 2 V1.0 dd 08-01-2014 Hour 5. File I/O. Types of input/output available Interactive Keyboard Screen Files Ascii/text txt csv Binary Structured FITS > pyFITS, astropy.io.fits URL Pipes. Interactive I/O, fancy output.
E N D
Python Crash CourseFile I/O Sterrenkundig Practicum 2 V1.0 dd 08-01-2014 Hour 5
File I/O • Types of input/output available • Interactive • Keyboard • Screen • Files • Ascii/text • txt • csv • Binary • Structured • FITS > pyFITS, astropy.io.fits • URL • Pipes
Interactive I/O, fancy output >>> s = 'Hello, world.' >>> str(s) 'Hello, world.' >>> repr(s) "'Hello, world.'" >>> str(1.0/7.0) '0.142857142857' >>> repr(1.0/7.0) '0.14285714285714285' >>> x = 10 * 3.25 >>> y = 200 * 200 >>> s = 'The value of x is ' + repr(x) + ', and y is ' + repr(y) + '...' >>> print s The value of x is 32.5, and y is 40000... >>> # The repr() of a string adds string quotes and backslashes: ... hello = 'hello, world\n' >>> hellos = repr(hello) >>> print hellos 'hello, world\n' >>> # The argument to repr() may be any Python object: ... repr((x, y, ('spam', 'eggs'))) "(32.5, 40000, ('spam', 'eggs'))"
Interactive I/O, fancy output Old string formatting >>> import math >>> print 'The value of PI is approximately %5.3f.' % math.pi The value of PI is approximately 3.142. New string formatting >>> table = {'Sjoerd': 4127, 'Jack': 4098, 'Dcab': 7678} >>> for name, phone in table.items(): ... print '{0:10} ==> {1:10d}'.format(name, phone) ... Jack ==> 4098 Dcab ==> 7678 Sjoerd ==> 4127
Formatting I/O A conversion specifier contains two or more characters and has the following components, which must occur in this order: • The "%" character, which marks the start of the specifier. • Mapping key (optional), consisting of a parenthesised sequence of characters (for example, (somename)). • Conversion flags (optional), which affect the result of some conversion types. • Minimum field width (optional). If specified as an "*" (asterisk), the actual width is read from the next element of the tuple in values, and the object to convert comes after the minimum field width and optional precision. • Precision (optional), given as a "." (dot) followed by the precision. If specified as "*" (an asterisk), the actual width is read from the next element of the tuple in values, and the value to convert comes after the precision. • Length modifier (optional). • Conversion type. >>> print '%(language)s has %(#)03d quote types.' % \ {'language': "Python", "#": 2} Python has 002 quote types.
Interactive I/O >>> print “Python is great,”, ”isn’t it?” >>> str = raw_input( “Enter your input: ”) >>> print “Received input is: “,str Enter your input: Hello Python Received input is: Hello Python >>> str = input("Enter your input: "); >>> print "Received input is: ", str Enter your input: [x*5 for x in range(2,10,2)] Received input is: [10, 20, 30, 40] If the readline modules was loaded the raw_input() will use it to provide elaborate line editing and history features.
File I/O >>> fname = ‘myfile.dat’ >>> f = file(fname) >>> lines = f.readlines() >>> f.close() >>> f = file(fname) >>> firstline = f.readline() >>> secondline = f.readline() >>> f = file(fname) >>> for l in f: ... print l.split()[1] >>> f.close() >>> outfname = ‘myoutput’ >>> outf = file(outfname, ‘w’) # second argument denotes writable >>> outf.write(‘My very own file\n’) >>> outf.close()
Read File I/O >>> f = open("test.txt") >>> # Read everything into single string: >>> content = f.read() >>> len(content) >>> print content >>> f.read() # At End Of File >>> f.close() >>> # f.read(20) reads (at most) 20 bytes Using with block: >>> with open(’test.txt’, ’r’) as f: ... content = f.read() >>> f.closed CSV file: >>> import csv >>> ifile = open(’photoz.csv’, "r") >>> reader = csv.reader(ifile) >>> for row in reader: ... print row, >>> ifile.close()
Read and write text file >>> from numpy import * >>> data = loadtxt("myfile.txt") # myfile.txt contains 4 columns of numbers >>> t,z = data[:,0], data[:,3] # data is a 2D numpy array, t is 1st col, z is 4th col >>> t,x,y,z = loadtxt("myfile.txt", unpack=True) # to automatically unpack all columns >>> t,z = loadtxt("myfile.txt", usecols = (0,3), unpack=True) # to select just a few columns >>> data = loadtxt("myfile.txt", skiprows = 7) # to skip 7 rows from top of file >>> data = loadtxt("myfile.txt", comments = '!') # use '!' as comment char instead of '#' >>> data = loadtxt("myfile.txt", delimiter=';') # use ';' as column separator instead of whitespace >>> data = loadtxt("myfile.txt", dtype = int) # file contains integers instead of floats >>> from numpy import * >>> savetxt("myfile.txt", data) # data is 2D array >>> savetxt("myfile.txt", x) # if x is 1D array then get 1 column in file. >>> savetxt("myfile.txt", (x,y)) # x,y are 1D arrays. 2 rows in file. >>> savetxt("myfile.txt", transpose((x,y))) # x,y are 1D arrays. 2 columns in file. >>> savetxt("myfile.txt", transpose((x,y)), fmt='%6.3f') # use new format instead of '%.18e' >>> savetxt("myfile.txt", data, delimiter = ';') # use ';' to separate columns instead of space
String formatting for output >>> sigma = 6.76/2.354 >>> print(‘sigma is %5.3f metres’%sigma) sigma is 2.872 metres >>> d = {‘bob’: 1.87, ‘fred’: 1.768} >>> for name, height in d.items(): ... print(‘%s is %.2f metres tall’%(name.capitalize(), height)) ... Bob is 1.87 metres tall Fred is 1.77 metres tall >>> nsweets = range(100) >>> calories = [i * 2.345 for i in nsweets] >>> fout = file(‘sweetinfo.txt’, ‘w’) >>> for i in range(nsweets): ... fout.write(‘%5i %8.3f\n’%(nsweets[i], calories[i])) ... >>> fout.close()
File I/O, CSV files • CSV (Comma Separated Values) format is the most common import and export format for spreadsheets and databases. • Functions • csv.reader • csv.writer • csv.register_dialect • csv.unregister_dialect • csv.get_dialect • csv.list_dialects • csv.field_size_limit
File I/O, CSV files • Reading CSV files • Writing CSV files import csv # imports the csv module f = open('data1.csv', 'rb') # opens the csv file try: reader = csv.reader(f) # creates the reader object for row in reader: # iterates the rows of the file in orders print row # prints each row finally: f.close() # closing import csv ifile = open('test.csv', "rb") reader = csv.reader(ifile) ofile = open('ttest.csv', "wb") writer = csv.writer(ofile, delimiter='\t', quotechar='"', quoting=csv.QUOTE_ALL) for row in reader: writer.writerow(row) ifile.close() ofile.close()
File I/O, CSV files • The csv module contains a the following quoting options. • csv.QUOTE_ALL Quote everything, regardless of type. • csv.QUOTE_MINIMAL Quote fields with special characters • csv.QUOTE_NONNUMERIC Quote all fields that are not integers or floats • csv.QUOTE_NONE Do not quote anything on output
Handling FITS files - PyFITS http://www.stsci.edu/resources/software_hardware/pyfits Read, write and manipulate all aspects of FITS files extensions headers images tables Low-level interface for details High-level functions for quick and easy use
PyFITS - reading >>> import pyfits >>> imgname = “testimage.fits” >>> img = pyfits.getdata(imgname) >>> img array([[2408, 2408, 1863, ..., 3660, 3660, 4749], [2952, 2408, 1863, ..., 3660, 3115, 4204], [2748, 2748, 2204, ..., 4000, 3455, 4000], ..., [2629, 2901, 2357, ..., 2261, 2806, 2261], [2629, 2901, 3446, ..., 1717, 2261, 1717], [2425, 2697, 3242, ..., 2942, 2125, 1581]], dtype=int16) >>> img.mean() 4958.4371977768678 >>> img[img > 2099].mean() 4975.1730909593043 >> import numpy >>> numpy.median(img) 4244.0
PyFITS – reading FITS images >>> x = 348; y = 97 >>> delta = 5 >>> print img[y-delta:y+delta+1, ... x-delta:x+delta+1].astype(numpy.int) [[5473 5473 3567 3023 3295 3295 3839 4384 4282 4282 3737] [3295 4384 3567 3023 3295 3295 3295 3839 3737 3737 4282] [2478 3567 4112 3023 3295 3295 3295 3295 3397 4486 4486] [3023 3023 3023 3023 2750 2750 3839 3839 3397 4486 3941] [3295 3295 3295 3295 3295 3295 3839 3839 3397 3941 3397] [3295 3295 2750 2750 3295 3295 2750 2750 2852 3397 4486] [2887 2887 2887 2887 3976 3431 3159 2614 3125 3669 4758] [2887 2887 3431 3431 3976 3431 3159 2614 3669 4214 4214] [3159 3703 3159 3703 3431 2887 3703 3159 3941 4486 3669] [3703 3159 2614 3159 3431 2887 3703 3159 3397 3941 3669] [3431 3431 2887 2887 3159 3703 3431 2887 3125 3669 3669]] row = y = first index column = x = second index numbering runs as normal (e.g. in ds9) BUT zero indexed!
PyFITS – reading FITS tables >>> tblname = ‘data/N891PNdata.fits’ >>> d = pyfits.getdata(tblname) >>> d.names ('x0', 'y0', 'rah', 'ram', 'ras', 'decd', 'decm', 'decs', 'wvl', 'vel', 'vhel', 'dvel', 'dvel2', 'xL', 'yL', 'xR', 'yR', 'ID', 'radeg', 'decdeg', 'x', 'y') >>> d.x0 array([ 928.7199707 , 532.61999512, 968.14001465, 519.38000488,… 1838.18994141, 1888.26000977, 1516.2199707 ], dtype=float32) >>> d.field(‘x0’) # case-insensitive array([ 928.7199707 , 532.61999512, 968.14001465, 519.38000488,… 1838.18994141, 1888.26000977, 1516.2199707 ], dtype=float32) >>> select = d.x0 < 200 >>> dsel = d[select] # can select rows all together >>> print dsel.x0 [ 183.05000305 165.55000305 138.47999573 158.02999878 140.96000671 192.58000183 157.02999878 160.1499939 161.1000061 136.58999634 175.19000244]
PyFITS – reading FITS headers >>> h = pyfits.getheader(imgname) >>> print h SIMPLE = T /FITS header BITPIX = 16 /No.Bits per pixel NAXIS = 2 /No.dimensions NAXIS1 = 1059 /Length X axis NAXIS2 = 1059 /Length Y axis EXTEND = T / DATE = '05/01/11 ' /Date of FITS file creation ORIGIN = 'CASB -- STScI ' /Origin of FITS image PLTLABEL= 'E30 ' /Observatory plate label PLATEID = '06UL ' /GSSS Plate ID REGION = 'XE295 ' /GSSS Region Name DATE-OBS= '22/12/49 ' /UT date of Observation UT = '03:09:00.00 ' /UT time of observation EPOCH = 2.0499729003906E+03 /Epoch of plate PLTRAH = 1 /Plate center RA PLTRAM = 26 / PLTRAS = 5.4441800000000E+00 / PLTDECSN= '+ ' /Plate center Dec PLTDECD = 30 / PLTDECM = 45 / >>> h[‘KMAGZP’] >>> h['REGION'] 'XE295‘ # Use h.items() to iterate through all header entries
PyFITS – writing FITS images >>> newimg = sqrt((sky+img)/gain + rd_noise**2) * gain >>> newimg[(sky+img) < 0.0] = 1e10 >>> hdr = h.copy() # copy header from original image >>> hdr.add_comment(‘Calculated noise image’) >>> filename = ‘sigma.fits’ >>> pyfits.writeto(filename, newimg, hdr) # create new file >>> pyfits.append(imgname, newimg, hdr) # add a new FITS extension >>> pyfits.update(filename, newimg, hdr, ext) # update a file # specifying a header is optional, # if omitted automatically adds minimum header
PyFITS – writing FITS tables >>> import pyfits >>> import numpy as np >>> # create data >>> a1 = numpy.array(['NGC1001', 'NGC1002', 'NGC1003']) >>> a2 = numpy.array([11.1, 12.3, 15.2]) >>> # make list of pyfits Columns >>> cols = [] >>> cols.append(pyfits.Column(name='target', format='20A', array=a1)) >>> cols.append(pyfits.Column(name='V_mag', format='E', array=a2)) >>> # create HDU and write to file >>> tbhdu=pyfits.new_table(cols) >>> tbhdu.writeto(’table.fits’) # these examples are for a simple FITS file containing just one # table or image but with a couple more steps can create a file # with any combination of extensions (see the PyFITS manual online)