330 likes | 446 Views
COS 131: Computing for Engineers Chapter 8: File Input and Output. Douglas R. Sizemore, Ph.D. Professor of Computer Science. This lecture was given in Fall, 2008 by Professor Sizemore and refers to an older Version of MATLAB than R2011A. Introduction.
E N D
COS 131: Computing for EngineersChapter 8: File Input and Output Douglas R. Sizemore, Ph.D. Professor of Computer Science This lecture was given in Fall, 2008 by Professor Sizemore and refers to an older Version of MATLAB than R2011A.
Introduction • Addresses three levels of capability for reading and writing files in MATLAB • Saving and restoring the workspace • High-level functions for accessing files in specific formats • Low-level file access programs for general-purpose file processing • Consider conditions under which each is appropriate
Introduction • Consider three types of activities that read and write data files. • MATLAB has basic ability to save your workspace (or parts of) to a file and restore it later for further processing • Have high-level functions in MATLAB that take the name of a file in any one of a number of popular formats and produce an internal representation of the data from that file in a form ready for processing • Need to deal with lower-level capabilities for manipulating text files that do not have recognizable structures
Introduction • Will consider files containing: • Workspace variables • Spreadsheet data • Text files with delimited numbers • Text files with plain text • MATLAB also has the ability to access binary files – files whose data are not in text form; we will not consider binary files here.
Concept: Serial Input and Output (I/O) • Refer to the process of reading and writing data files as Input/Output or I/O • All computer file systems save and retrieve data as a sequential stream of characters; remember these characters are small sets of ones and zeros corresponding to digital electronic signals of +/- 5 volts dc which represent the binary number system or 1s and 0s • Input and output streams depicted in the slides on the next slide
Concept: Serial Input and Output (I/O) • Input and Output Streams: Input Stream Output Stream
Concept: Serial Input and Output (I/O) • Input and Output • Data control characters are mixed in with the regular characters – we an make sense of what is happening; specify the organization of the data
Concept: Serial Input and Output (I/O) • File Processing Scenario for INPUT • Program opens the file for reading • Continually requests values from the file data stream until the end of file (EOF) is reached • As the data is received, the program uses the delimiting characters included in the data stream to reformat the data to reconstruct the organization of the data as represented in the file
MATLAB Workspace (I/O) • MATLAB allows you to save your workspace to a file with the SAVE command; allows you to reload your workspace from a file with the LOAD command • File will be the name you give it with a .mat extension • The default filename is matlab.mat
MATLAB Workspace (I/O) • Can also identify specific variables that you want to save by • listing them explicitly • Providing logical expressions to indicate the variable names • Example: >>save mydata.mat a b c* • The above example would save the variables a b and any variable beginning with the letter c. • Not practical as it only saves the results and not the code • Almost always better to save the scripts and raw data that created the workspace
High-Level I/O Functions • Now examine the general case of file I/O • Will need to load data from external sources • Will need to process those data • Will need to save those data back to the file system
High-Level I/O Functions • When we attempt to read or write data from an external file this is extremely difficult without knowing something of the • Types of data contained in the file • The organization of the data in the file • Good habit: • explore the data in a file by whatever tools you have at your disposal • Commit to processing the data according to your observations • Table in following slide shows the file readers and writers available in MATLAB
High-Level I/O Functions • File I/O Functions
High-Level I/O Functions • Exploration • Most common files encountered are text files and spreadsheets • Delimited text files are presumed to contain numerical values • Spreadsheet data may be either numerical data stored as doubles (typically 64 bits or 8 bytes per number) or string data stored in cell arrays. • Text files are usually delimited by a special character: • Comma • Tab • Space • or another designated character • Designates the column divider • New-line character designates the rows
High-Level I/O Functions • Exploration • Exception is the plain text reader that requires a format to define columns and rows • The file extension as in .txt gives you a significant clue to the nature of the data • For plain text files you can use a simple editor like Notepad in Windows to examine the organization of the data and obtain clues as to how to proceed
High-Level I/O Functions • Excel spreadsheets • Rectangular arrays containing labeled rows and columns of cells
High-Level I/O Functions • Excel spreadsheets • MATLAB xlsread(…) function separates the text and numerical portions of a spreadsheet • The input parameter of xlsread(…) is the name of the file • Can have up to three return variables • First return variable will hold all numerical values in an array of doubles • Second return variable will hold all tlhe text data in cell arrays • Third return variable (optional) will hold both string and numerical data in cell arrays • Exercise 8.1: Reading Excel Data • Smith text, page 189-190, bottom-top
High-Level I/O Functions • Excel spreadsheets • Observations from Exercise 8.1 • Excel reader function determines the smallest rectangle on the spreadsheet containing all of the numerical data; referred to as the number rectangle • First result is essentially this number rectangle; if there are any non-numeric values within the rectangle, they are replaced by NaN, the built-in MATLAB name for something that is not a number • Second result is all character data as strings in a cell array; numbers encountered are given as empty strings • Third result consists of cell arrays of both numbers and character strings; missing values are assumed to be numeric and are assigned the value, NaN
High-Level I/O Functions • Excel spreadsheets • Will likely want to write back to the file or to another new or existing file • Excel spreadsheets can be written using: • Xlswrite(<filename>, <array>, <sheet>, <range>) • Where <filename> is the name of the file • <array> is the data source, a cell array • <sheet> is the sheet name • <range> is the range of cells in Excel identify notation
High-Level I/O Functions • Delimited Text Files – Numerical Data Only • Data are frequently presented in text file form • If data in a text file are all numerical values, MATLAB can read the file directly into an array • Necessary for data to be separated or delimited by commas, spaces, or tab characters • Numerical data of this type can be read using • Dlmread(file, delimiter) • Delimiter is a single character that ca be used to specify an unusual delimiting character • Function produces a numerical array containing the data values • Array elements where data are not supplied are filled with zeros
High-Level I/O Functions • Delimited Text Files – Numerical Data Only • Exercise 8.2: Reading delimited files • Smith text, page 191, bottom • Listing 8.1: Sample delimited text file: • Delimited data files can be written using: • dlmwrite( <filename>, <array>, <dlm>) • <filename> is the name of the file • <array> is the data source – a numerical array • <dlm> is the delimiting character; not specified is a comma (CSV)
Lower-Level File I/O • Introduction • You may encounter text files that cannot be read or written by the higher level functions defined above • MATLAB includes functions for general purpose reading and writing of data files • When we open these files we return a file handle • A file handle is used by any functions employed in the reading from and writing to the file • Once the read and write activities have been completed, the file must be closed
Lower-Level File I/O • Opening and Closing Files • To open a file for reading or writing: • fh = fopen( <filename>, <purpose> ) • fh is a file handle used in subsequent function calls to identify the particular I/O stream • <filename> is the name of the file • <purpose> is a string specifying the purpose for opening the file • r – file must already exist • w – file will be overwritten if it exists • a – data will be appended to the file if it exists • To close the file, • fclose( fh )
Lower-Level File I/O • Reading Text Files • Three levels of support are provided when reading text files: • Reading whole lines with or with out the new line character • Parsing into tokens with delimiters • Parsing into cell arrays using a format string • To read a whole line including the new line character, use: • str = fgets( fh ); • Will return each line as a string until the end of the file (EOF) • Use fgetl(…) to leave out each new line character • To parse each line into tokens (elementary text strings) separated by white space delimiters, use a combination of fgetl(…) and the tokenizer function: • [tk, rest] = strtok( ln ); where tk is a string token, rest is the remainder of the line, and ln is a string to be parsed into tokens
Lower-Level File I/O • Reading Text Files • To parse a line according to a specific format string into a cell array, use: • ca = textscan( fh, <format> ); where ca is the resulting cell arrray, fh is the file handle, and <format> is a format control string we used for sscanf(…). (Chapter 6)
Lower-Level File I/O • Examples of Reading Text Files • Listing 8.2 shows a script that will list any text file in the Command window Refer to notations On Listing 8.2 on page 193-194 of the Smith text
Lower-Level File I/O • Examples of Reading Text Files • Listing 8.3 shows the difference in output results between the conventional listing script and the tokenizing lister Refer to notations On Listing 8.2 on page 194 of the Smith text
Lower-Level File I/O • Examples of Reading Text Files • Exercise 8.3: Using file listers – illustrates both traditional and tokenizer approaches to file listing • Smith text, pages 194-195, bottom-top
Lower-Level File I/O • Writing Text Files • Must have file open • The fprintf(…) function used to write to it by including its file handle as the first parameter • Listing 8.4 alters Listing 8.2, copys a text file instead of listing it Refer to notations On Listing 8.2 on page 195 of the Smith text
Engineering Example: Spreadsheet Data • Adaptation of the structure assembly problem form Chapter 7 • In this example the data are presented in a spreadsheet as given here:
Engineering Example: Spreadsheet Data • Start by considering the layout of the data • Also consider the process necessary to extract what we need • Which of the three forms of data returned from xlsread(…) for our use? • Numerical data are not really important in this application • Not exclusively a text processing problem either • Will process the raw data provided by xlsread(…), giving bot the string and numerical data • Create a function the will read this file and produce the same model/structure as in Chapter 7
Engineering Example: Spreadsheet Data • Listing 8.5: Reading structure data
Engineering Example: Spreadsheet Data • Observations on Listing 8.5 • Note at line 2 the function reads the spreadsheet and only keeps the raw data • In traversing the array, note that we begin with an offset that ignores column 1 and row 1 • As the function cycles through the rows, it is important to empy the array CONN before each pass to avoid “inheriting” data from a previous row • You can test this function by replacing the structure array construction in lines 1-11 of Listing 7.7 in Chapter7 with the following line: • data = readStruct(‘Structure_array.xls’);