COSC 1306—COMPUTER SCIENCE AND PROGRAMMING PYTHON FUNCTIONS

COSC 1306—COMPUTER SCIENCE AND PROGRAMMINGPYTHON FUNCTIONS Jehan-François Pâris jfparis@uh.edu

Module Overview We will learn how to read, create and modify files Pay special attention to pickled files They are very easy to use!

The file system • Provides long term storage of information. • Will store data in stable storage (disk) • Cannot be RAM because: • Dynamic RAM loses its contents when powered off • Static RAMis too expensive • System crashes can corrupt contents of the main memory

Overall organization • Data managed by the file system are grouped in user-defined data sets called files • The file system must provide a mechanism for naming these data • Each file system has its own set of conventions • All modern operating systems use a hierarchical directory structure

Windows solution • Each device and each disk partition is identified by a letter • A: and B: were used by the floppy drives • C: is the first disk partition of the hard drive • If hard drive has no other disk partition,D: denotes the DVD drive • Each device and each disk partition has its own hierarchy of folders

Second diskD: Windows solution Flash driveF: C: Windows Users Program Files

UNIX/LINUX organization • Each device and disk partition has its own directory tree • Disk partitions are glued together through theoperation to form a single tree • Typical user does not know where her files are stored

UNIX/LINUX organization Root partition / Other partition usr The magicmount bin Second partition can be accessed as /usr

Mac OS organization • Similar to Windows • Disk partitions are not merged • Represented by separate icons on the desktop

Accessing a file (I) • Your Python programs are stored in a folder AKA directory • On my home PC it is C:\Users\Jehan-Francois Paris\Documents\Courses\1306\Python • All files in that directory can be directly accessed through their names • "myfile.txt"

Accessing a file (II) • Files in subdirectories can be accessed by specifying first the subdirectory • Windows style: • "test\\sample.txt" • Note the double backslash • Linux/Unix/Mac OS X style: • "test/sample.txt" • Generally works for Windows

Why the double backslash? • The backslash is an escape character in Python • Combines with its successor to represent non-printable characters • ‘\n’ represents a newline • ‘\t’ represents a tab • Must use ‘\\’ to represent a plain backslash

Accessing a file (III) • For other files, must use full pathname • Windows Style: • "C:\\Users\\Jehan-Francois Paris\\Documents\\Courses\\1306\\Python\\myfile.txt"

Accessing file contents • Two step process: • First we open the file • Then we access its contents • Read • Write • When we are done, we close the file.

What happens at open() time? • The system verifies • That you are an authorized user • That you have the right permission • Read permission • Write permission • Execute permission exists but doesn’t apply and returns a file handle /file descriptor

The file handle • Gives the user • Direct access to the file • No directory lookups • Authority to execute the file operations whose permissions have been requested

Python open() • open(name, mode = ‘r’, buffering = -1)where • name is name of file • mode is permission requested • Default is ‘r’ for read only • buffering specifies thebuffer size • Use system default value (code -1)

The modes • Can request • ‘r’ for read-only • ‘w’ for write-only • Always overwrites the file • ‘a’ for append • Writes at the end • ‘r+’ or ‘a+’ for updating (read + write/append)

Examples • f1 = open("myfile.txt") same asf1 = open("myfile.txt", "r") • f2 = open("test\\sample.txt", "r") • f3 = open("test/sample.txt", "r") • f4 = open("C:\\Users\\Jehan-Francois Paris\\Documents\\Courses\\1306\\Python\\myfile.txt")

Reading a file • Three ways: • Global reads • Line by line • Pickled files

Global reads • fh.read() • Returns whole contents of file specified by file handle fh • File contents are stored in a single string that might be very large

Example • f2 = open("test\\sample.txt", "r") bigstring = f2.read()print(bigstring)f2.close() # not required

Output of example • To be or not to be that is the questionNow is the winter of our discontent • Exact contents of file ‘test\sample.txt’

Line-by-line reads • for line in fh : # do not forget the column #anything you wantfh.close() # not required

Example • f3 = open("test/sample.txt", "r") for line in f3 : # do not forget the column print(line)f3.close() # not required

Output • To be or not to be that is the questionNow is the winter of our discontent • With one or more extra blank lines

Why? • Each line ends with an end-of-line marker • print(…)adds an extra end-of-line

Trying to remove blank lines • print('----------------------------------------------------')f5 = open("test/sample.txt", "r") for line in f5 : # do not forget the column print(line[:-1]) # remove last charf5.close() # not requiredprint('-----------------------------------------------------')

The output • ----------------------------------------------------To be or not to be that is the questionNow is the winter of our disconten----------------------------------------------------- • The last line did not end with an EOL!

A smarter solution (I) • Only remove the last character if it is an EOL • if line[-1] == ‘\n’ : print(line[:-1]else print line

A smarter solution (II) • print('----------------------------------------------------')fh = open("test/sample.txt", "r")for line in fh : # do not forget the column if line[-1] == '\n' : print(line[:-1]) # remove last char else : print(line)print('-----------------------------------------------------')fh.close() # not required

It works! • ----------------------------------------------------To be or not to be that is the questionNow is the winter of our discontent-----------------------------------------------------

Making sense of file contents • Most files contain more than one data item per line • COSC 713-743-3350UHPD 713-743-3333 • Must split lines • mystring.split(sepchar)where sepchar is a separation character • returns a list of items

Splitting strings • >>> text = "Four score and seven years ago">>> text.split()['Four', 'score', 'and', 'seven', 'years', 'ago'] • >>>record ="1,'Baker, Andy', 83, 89, 85">>> record.split(',')[' 1', "'Baker", " Andy'", ' 83', ' 89', ' 85'] Not what we wanted!

Example # how2split.py print('----------------------------------------------------') f5 = open("test/sample.txt", "r") for line in f5 : words = line.split() for xxx in words : print(xxx) f5.close() # not required print('-----------------------------------------------------')

Output • ----------------------------------------------------Tobe…ofourdiscontent-----------------------------------------------------

Other separators (I) • Commas • CSV Excel format • Values are separated by commas • Strings are stored without quotes • Unless they contain a comma • “Doe, Jane”, freshman, 90, 90 • Quotes within strings are doubled

Other separators (II) • Tabs( ‘\t’) • Advantages: • Your fields will appear nicely aligned • Spaces, commas, … are not an issue • Disadvantage: • You do not see them • They look like spaces

Why it is important • When you must pick your file format, you should decide how the data inside the file will be used: • People will read them • Other programs will use them • Will be used by people and machines

An exercise • Converting our output to CSV format • Replacing tabs by commas • Easy • Will use string replace function

First attempt • fh_in = open('grades.txt', 'r') # the 'r' is optionalbuffer = fh_in.read()newbuffer = buffer.replace('\t', ',')fh_out = open('grades0.csv', 'w')fh_out.write(newbuffer)fh_in.close()fh_out.close()print('Done!')

The output • Alice 90 90 90 90 90Bob 85 85 85 85 85Carol 75 75 75 75 75 becomes • Alice,90,90,90,90,90Bob,85,85,85,85,85Carol,75,75,75,75,75

Dealing with commas (I) • Work line by line • For each line • split input into fields using TAB as separator • store fields into a list • Alice 90 90 90 90 90becomes[‘Alice’, ’90’, ’90’, ’90’, ’90’, ’90’]

Dealing with commas (II) • Put within double quotes any entry containing one or more commas • Output list entries separated by commas • ['"Baker, Alice"', 90, 90, 90, 90, 90] becomes"Baker, Alice",90,90,90,90,90

Dealing with commas (III) • Our troubles are not over: • Must store somewhere all lines until we are done • Store them in a list

Dealing with double quotes • Before wrapping items with commas with double quotes replace • All double quotes by pairs of double quotes • 'Aguirre, "Lalo" Eduardo'becomes'Aguirre, ""Lalo"" Eduardo'then'"Aguirre, ""Lalo"" Eduardo"'

General organization (I) • linelist = [ ] • for line in file • itemlist = line.split(…) • linestring = '' # empty string • for each item in itemlist • remove any trailing newline • double all double quotes • if item contains comma, wrap • add to linestring

General organization (II) • for line in file • … • for each item in itemlist • double all double quotes • if item contains comma, wrap • add to linestring • append linestring to stringlist

General organization (III) • for line in file • … • remove last comma of linestring • add newline at end of linestring • append linestring to stringlist • for linestring in in stringline • write linestring into output file

The program (I) • # betterconvert2csv.py""" Convert tab-separated file to csv"""fh = open('grades.txt','r') #input filelinelist = [ ] # global data structurefor line in fh : # outer loop itemlist = line.split('\t') # print(str(itemlist)) # just for debugging linestring = '' # start afresh

COSC 1306—COMPUTER SCIENCE AND PROGRAMMING PYTHON FUNCTIONS

COSC 1306—COMPUTER SCIENCE AND PROGRAMMING PYTHON FUNCTIONS

Presentation Transcript

Computer Science

Guide to Programming with Python

Introduction to Computing and Programming in Python: A Multimedia Approach

Python Programming: An Introduction to Computer Science

INTRODUCTION TO FUNCTIONAL PROGRAMMING

CSC 308 – Graphics Programming

CS 122 – Jan. 9

CS 484 Parallel Programming spring 2014

Python and COM

A PHILOSOPHY OF COMPUTER SCIENCE

General Information

Programming Paradigms - JAVA

Introduction to Computer Science

Introduction to MATLAB

Python Programming

Introduction to C #

Structure Programming Programming in Java

Department of Computer Science

Computer System

COP 3330: Object-Oriented Programming Summer 2007

Network Programming