950 likes | 1.08k Views
COSC 1306—COMPUTER SCIENCE AND PROGRAMMING PYTHON FUNCTIONS. Jehan-François Pâris jfparis@uh.edu. Module Overview. We will learn how to read, create and modify files Pay special attention to pickled files They are very easy to use!. The file system.
E N D
COSC 1306—COMPUTER SCIENCE AND PROGRAMMINGPYTHON FUNCTIONS Jehan-François Pâris jfparis@uh.edu
Module Overview We will learn how to read, create and modify files Pay special attention to pickled files They are very easy to use!
The file system • Provides long term storage of information. • Will store data in stable storage (disk) • Cannot be RAM because: • Dynamic RAM loses its contents when powered off • Static RAMis too expensive • System crashes can corrupt contents of the main memory
Overall organization • Data managed by the file system are grouped in user-defined data sets called files • The file system must provide a mechanism for naming these data • Each file system has its own set of conventions • All modern operating systems use a hierarchical directory structure
Windows solution • Each device and each disk partition is identified by a letter • A: and B: were used by the floppy drives • C: is the first disk partition of the hard drive • If hard drive has no other disk partition,D: denotes the DVD drive • Each device and each disk partition has its own hierarchy of folders
Second diskD: Windows solution Flash driveF: C: Windows Users Program Files
UNIX/LINUX organization • Each device and disk partition has its own directory tree • Disk partitions are glued together through theoperation to form a single tree • Typical user does not know where her files are stored
UNIX/LINUX organization Root partition / Other partition usr The magicmount bin Second partition can be accessed as /usr
Mac OS organization • Similar to Windows • Disk partitions are not merged • Represented by separate icons on the desktop
Accessing a file (I) • Your Python programs are stored in a folder AKA directory • On my home PC it is C:\Users\Jehan-Francois Paris\Documents\Courses\1306\Python • All files in that directory can be directly accessed through their names • "myfile.txt"
Accessing a file (II) • Files in subdirectories can be accessed by specifying first the subdirectory • Windows style: • "test\\sample.txt" • Note the double backslash • Linux/Unix/Mac OS X style: • "test/sample.txt" • Generally works for Windows
Why the double backslash? • The backslash is an escape character in Python • Combines with its successor to represent non-printable characters • ‘\n’ represents a newline • ‘\t’ represents a tab • Must use ‘\\’ to represent a plain backslash
Accessing a file (III) • For other files, must use full pathname • Windows Style: • "C:\\Users\\Jehan-Francois Paris\\Documents\\Courses\\1306\\Python\\myfile.txt"
Accessing file contents • Two step process: • First we open the file • Then we access its contents • Read • Write • When we are done, we close the file.
What happens at open() time? • The system verifies • That you are an authorized user • That you have the right permission • Read permission • Write permission • Execute permission exists but doesn’t apply and returns a file handle /file descriptor
The file handle • Gives the user • Direct access to the file • No directory lookups • Authority to execute the file operations whose permissions have been requested
Python open() • open(name, mode = ‘r’, buffering = -1)where • name is name of file • mode is permission requested • Default is ‘r’ for read only • buffering specifies thebuffer size • Use system default value (code -1)
The modes • Can request • ‘r’ for read-only • ‘w’ for write-only • Always overwrites the file • ‘a’ for append • Writes at the end • ‘r+’ or ‘a+’ for updating (read + write/append)
Examples • f1 = open("myfile.txt") same asf1 = open("myfile.txt", "r") • f2 = open("test\\sample.txt", "r") • f3 = open("test/sample.txt", "r") • f4 = open("C:\\Users\\Jehan-Francois Paris\\Documents\\Courses\\1306\\Python\\myfile.txt")
Reading a file • Three ways: • Global reads • Line by line • Pickled files
Global reads • fh.read() • Returns whole contents of file specified by file handle fh • File contents are stored in a single string that might be very large
Example • f2 = open("test\\sample.txt", "r") bigstring = f2.read()print(bigstring)f2.close() # not required
Output of example • To be or not to be that is the questionNow is the winter of our discontent • Exact contents of file ‘test\sample.txt’
Line-by-line reads • for line in fh : # do not forget the column #anything you wantfh.close() # not required
Example • f3 = open("test/sample.txt", "r") for line in f3 : # do not forget the column print(line)f3.close() # not required
Output • To be or not to be that is the questionNow is the winter of our discontent • With one or more extra blank lines
Why? • Each line ends with an end-of-line marker • print(…)adds an extra end-of-line
Trying to remove blank lines • print('----------------------------------------------------')f5 = open("test/sample.txt", "r") for line in f5 : # do not forget the column print(line[:-1]) # remove last charf5.close() # not requiredprint('-----------------------------------------------------')
The output • ----------------------------------------------------To be or not to be that is the questionNow is the winter of our disconten----------------------------------------------------- • The last line did not end with an EOL!
A smarter solution (I) • Only remove the last character if it is an EOL • if line[-1] == ‘\n’ : print(line[:-1]else print line
A smarter solution (II) • print('----------------------------------------------------')fh = open("test/sample.txt", "r")for line in fh : # do not forget the column if line[-1] == '\n' : print(line[:-1]) # remove last char else : print(line)print('-----------------------------------------------------')fh.close() # not required
It works! • ----------------------------------------------------To be or not to be that is the questionNow is the winter of our discontent-----------------------------------------------------
Making sense of file contents • Most files contain more than one data item per line • COSC 713-743-3350UHPD 713-743-3333 • Must split lines • mystring.split(sepchar)where sepchar is a separation character • returns a list of items
Splitting strings • >>> text = "Four score and seven years ago">>> text.split()['Four', 'score', 'and', 'seven', 'years', 'ago'] • >>>record ="1,'Baker, Andy', 83, 89, 85">>> record.split(',')[' 1', "'Baker", " Andy'", ' 83', ' 89', ' 85'] Not what we wanted!
Example # how2split.py print('----------------------------------------------------') f5 = open("test/sample.txt", "r") for line in f5 : words = line.split() for xxx in words : print(xxx) f5.close() # not required print('-----------------------------------------------------')
Output • ----------------------------------------------------Tobe…ofourdiscontent-----------------------------------------------------
Other separators (I) • Commas • CSV Excel format • Values are separated by commas • Strings are stored without quotes • Unless they contain a comma • “Doe, Jane”, freshman, 90, 90 • Quotes within strings are doubled
Other separators (II) • Tabs( ‘\t’) • Advantages: • Your fields will appear nicely aligned • Spaces, commas, … are not an issue • Disadvantage: • You do not see them • They look like spaces
Why it is important • When you must pick your file format, you should decide how the data inside the file will be used: • People will read them • Other programs will use them • Will be used by people and machines
An exercise • Converting our output to CSV format • Replacing tabs by commas • Easy • Will use string replace function
First attempt • fh_in = open('grades.txt', 'r') # the 'r' is optionalbuffer = fh_in.read()newbuffer = buffer.replace('\t', ',')fh_out = open('grades0.csv', 'w')fh_out.write(newbuffer)fh_in.close()fh_out.close()print('Done!')
The output • Alice 90 90 90 90 90Bob 85 85 85 85 85Carol 75 75 75 75 75 becomes • Alice,90,90,90,90,90Bob,85,85,85,85,85Carol,75,75,75,75,75
Dealing with commas (I) • Work line by line • For each line • split input into fields using TAB as separator • store fields into a list • Alice 90 90 90 90 90becomes[‘Alice’, ’90’, ’90’, ’90’, ’90’, ’90’]
Dealing with commas (II) • Put within double quotes any entry containing one or more commas • Output list entries separated by commas • ['"Baker, Alice"', 90, 90, 90, 90, 90] becomes"Baker, Alice",90,90,90,90,90
Dealing with commas (III) • Our troubles are not over: • Must store somewhere all lines until we are done • Store them in a list
Dealing with double quotes • Before wrapping items with commas with double quotes replace • All double quotes by pairs of double quotes • 'Aguirre, "Lalo" Eduardo'becomes'Aguirre, ""Lalo"" Eduardo'then'"Aguirre, ""Lalo"" Eduardo"'
General organization (I) • linelist = [ ] • for line in file • itemlist = line.split(…) • linestring = '' # empty string • for each item in itemlist • remove any trailing newline • double all double quotes • if item contains comma, wrap • add to linestring
General organization (II) • for line in file • … • for each item in itemlist • double all double quotes • if item contains comma, wrap • add to linestring • append linestring to stringlist
General organization (III) • for line in file • … • remove last comma of linestring • add newline at end of linestring • append linestring to stringlist • for linestring in in stringline • write linestring into output file
The program (I) • # betterconvert2csv.py""" Convert tab-separated file to csv"""fh = open('grades.txt','r') #input filelinelist = [ ] # global data structurefor line in fh : # outer loop itemlist = line.split('\t') # print(str(itemlist)) # just for debugging linestring = '' # start afresh