480 likes | 495 Views
Learn about storing data to the hard disk using files and streams, including file hierarchy, naming conventions, absolute vs. relative paths, and file formats. Explore the Java library support and how to read and write text files using the Scanner and PrintWriter classes.
E N D
CSE 501NFall ‘0918: Files and Streams 06 November 2009 Nick Leidenfrost
Lecture Outline • Storing data to the hard disk • Files • Streams • Storage Decisions • Serialization
File SystemFile Hierarchy • In general, files in computers are organized in a directory tree • A directory is a virtual container that holds files and other directories • A.k.a folder
FilesNaming Conventions Windows extension “full path” • Filenames are unique and case-sensitive • Extensions serve as a “hint” or “shortcut” for the type of data contained in the file • For Us • For the Operating System (OS) H:/workspace/Lab6/Ship.java path filename drive Mac / Linux / Unix /home/username/workspace/Lab6/Ship.java
Referring to Files in ProgramsAbsolute vs. Relative Paths • Absolute Path • The “full path” to the file • Relative Path • A path that specifies a file’s location relative to another location • “another location” = the location of our program H:/workspace/Lab6/images/mothership.gif images/Ship.java
Relative Paths./ And ../ • Some special notation specific to relative paths lets us refer to our own directory, as well as our parent directory • ./ and ../ • ./ = The current directory • ../ = The parent directory of the current directory (or the directory above the current directory) • This notation is fairly standard in computing [ cmd example ]
Relative PathsExamples • Relative Paths • Inside our directory • Inside a subdirectory • In our “parent” directory • In a “sibling directory” • In directory above our parent directory mothership.gif ./mothership.gif images/mothership.gif images/gif/small/mothership.gif ../motherhsip.gif ../images/motherhsip.gif ../../../motherhsip.gif
Absolute vs. Relative • Absolute • Path will rely on exactly the same directory structure being in place on every computer • Application can be moved independently of resources, and resources will still be found • Generally not portable (System-specific) • Usually set on installation • Relative • Will be correct as long as resources stay in the same place relative to the application • Application and resources can be moved and still function • From computer to computer • From directory to directory on the same computer
File FormatsText and Binary • Two ways to store data: • Text format (a.k.a. plain-text) • Data stored as characters • Human readable • Less efficient with respect to storage • A.k.a. ASCII (ask · ee) • (American Standard Code For Information Interchange) • Binary format • Data stored as bytes • Looks like gibberish to Humans • Relies on a defined structure • More compact / efficient than plain-text • // Let’s look at some examples!
ASCII Format • Let’s look at storing an integer in a plain-text file • An int in Java is 4 bytes • We want to store the number 12,345 • Our file actually holds the character ‘1’, followed by ‘2’, then ‘3’, ‘4’ and finally, ‘5’ • This takes at least 5 bytes
Binary Format • Data items are represented in bytes • Integer 12,345 stored as a sequence of four bytes: • 0 0 48 57 • 48*256¹ + 57*256º • Why the zeros? Why not just use 2 bytes to store it? • More compact and more efficient
Files in JavaLibrary Support • Support for file interaction (and more) can be found in Java’s java.io library • “io”: Input / Output • A file is represented in the library by the File class in java.io.File • We can create file objects with either absolute or relative paths • // Let’s have a look at java.io.File
Reading Text Files • Simplest way to read text: use Scanner class • To read from a file on disk, construct a FileReader • Then, use the FileReader to construct a Scanner object • Use the Scanner methods to read data from file • next, nextLine, nextInt, and nextDouble • // Let’s look at the Scanner API FileReader reader = new FileReader("input.txt“); Scanner in = new Scanner(reader);
Writing Text Files • To write to a file, construct a PrintWriter object • // Let’s look at the PrintWriter API • If file already exists, it is emptied before the new data are written into it • If file doesn't exist, an empty file is created PrintWriter out = new PrintWriter("output.txt");
Writing Text Files • Use print and println to write into a PrintWriter: • You must close a file when you are done processing it: • Otherwise, not all of the output may be written to the disk file out.println(29.95); out.println(new Rectangle(5, 10, 15, 25)); out.println("Hello, World!"); out.close();
A Sample Program • Reads all lines of a file and sends them to the output file, preceded by line numbers • Sample input file: Mary had a little lamb Whose fleece was white as snow. And everywhere that Mary went, The lamb was sure to go!
A Sample Program • Program produces the output file: • (Program could be used for numbering Java source files, etc.) /* 1 */ Mary had a little lamb /* 2 */ Whose fleece was white as snow. /* 3 */ And everywhere that Mary went, /* 4 */ The lamb was sure to go!
Write the code for this program • // Code Example: FileNumberer.java
File Dialog Boxes • More user friendly way of selecting files • // Let’s integrate this into our FileNumberer
Text Format • Human-readable form • Data stored as sequence of characters • Integer 12345 stored as characters '1' '2' '3' '4' '5' • Use Reader and Writer and their subclasses to process input and output • To read: • To write FileReader reader = new FileReader("input.txt"); FileWriter writer = new FileWriter("output.txt");
Binary Format • Reading and writing binary files • Use subclasses of InputStream and OutputStream • To read: • To write FileInputStream inputStream = new FileInputStream("input.bin"); FileOutputStream outputStream = new FileOutputStream("output.bin");
Streams: Input and Output • We read from an InputStream • We write to an OutputStream • As with System.out • Imagine the “Stream” as a hose connecting the data source and the data destination Output (writing) The stream Input (reading)
Reading a Single Character from a File in Text Format • Use various read methods in InputStream class to read a single byte / array of bytes • returns the next byte as an int • Returned value 0 <= x <= 255 • or the integer -1 at end of file InputStream in = . . .; int next = in.read(); byte b; if (next != -1) b = (byte) next;
Text and Binary Format • Use variations of the write method to write a single byte / array of bytes • read and write are the only input and output methods provided by the file input and output classes • Java stream package principle: each class should have a very focused responsibility • Use Library of subclasses for more high-level behavior
Text and Binary Format • Job of InputStream / OutputStream: interact with data sources and get bytes • To read numbers, strings, or other objects, combine a Stream with other classes • E.g. java.util.Scanner
File ExampleA Simple Encryption Program • File encryption • To scramble it so that it is readable only to those who know the encryption method and secret keyword • To use Caesar cipher • Choose an encryption key–a number between 1 and 25 • Example: If the key is 3, replace A with D, B with E, . . .
An Encryption Program • Example text: • To decrypt, use the negative of the encryption key
Storage OptionsRandom Access vs. Sequential Access • Sequential access • A file is processed a byte at a time • It can be inefficient • Random access • Allows access at arbitrary locations in the file • Only files on disk support random access • System.in and System.out (normal input and output streams) do not • Each disk file has a special file pointer position • You can read or write at the position where the file pointer is
Storage OptionsRandom Access vs. Sequential Access • Each disk file has a special file pointer position • You can read or write at the position where the pointer is
RandomAccessFile • You can open a file either for • Reading only ("r") • Reading and writing ("rw") • To move the file pointer to a specific byte • (moves file pointer to nth byte) RandomAccessFile f = new RandomAcessFile("bank.dat","rw"); f.seek(n);
RandomAccessFile • To get the current position of the file pointer: • To find the number of bytes in a file: long n = f.getFilePointer(); // of type "long" because files can be very large long fileLength = f.length();
Random AccessA Sample Program • Use a random access file to store a set of bank accounts • Program lets you pick an account and deposit money into it • To manipulate a data set in a file, pay special attention to data formatting • Suppose we store the data as text Say account 1001 has a balance of $900, and account 1015 has a balance of 0
Random AccessA Sample Program What if we want to deposit $100 into account 1001? If we now simply write out the new value, the result is
Random AccessA Sample Program • What if money becomes too big? • This is caused by one of the downsides of human-readable file formats • Better way to manipulate a data set in a file: • Give each value a fixed size that is sufficiently large • Every record has the same size • Easy to skip quickly to a given record • To store numbers, use binary format for scalability
Random AccessA Sample Program • RandomAccessFile class stores binary data • readInt and writeInt read/write integers as four-byte quantities • readDouble and writeDouble use 8 bytes double x = f.readDouble(); f.writeDouble(x);
Random AccessA Sample Program • To find out how many bank accounts are in the file public int numAccounts () throws IOException {return (int) (file.length() / RECORD_SIZE); // RECORD_SIZE is 12 bytes: // 4 bytes for the account number and // 8 bytes for the balance }
Random AccessA Sample Program • To read the nth account in the file public BankAccount read (int n) throws IOException { file.seek(n * RECORD_SIZE); int accountNumber = file.readInt(); double balance = file.readDouble(); return new BankAccount(accountNumber, balance); }
Random AccessA Sample Program • To write the nth account in the file public void writeNth (int n, BankAccount account) throws IOException { file.seek(n * RECORD_SIZE); file.writeInt(account.getAccountNumber()); file.writeDouble(account.getBalance()); }
Object StreamsReading and Writing Objects? WTF? • Writing Objects directly to streams • ObjectOutputStream class can write a entire objects to disk • ObjectInputStream class can read objects back in from disk • Objects are saved in binary format; hence, you use streams • // Let’s look at the APIs
Writing a BankAccount Object to a File • The object output stream saves all instance variables BankAccount b = . . .; OutputStream os = new FileOutputStream("bank.dat"); ObjectOutputStream out = new ObjectOutputStream(fos); out.writeObject(b);
Reading a BankAccount Object From a File • readObject returns an Object reference • Hence, we must remember the types of the objects that you saved and use a cast InputStream is = new FileInputStream("bank.dat"); ObjectInputStream in = new ObjectInputStream(is); BankAccount b = (BankAccount) in.readObject();
Reading a BankAccount Object From a File • readObject method can throw a ClassNotFoundException • Why is this? • It is a checked exception • You must catch or declare it
Writing Complex ObjectsWrite and Read an ArrayList to a File • Write • Read ArrayList<BankAccount> bl = new ArrayList<BankAccount>(); // Now add many BankAccount objects into bl out.writeObject(bl); ArrayList<BankAccount> bl = (ArrayList<BankAccount>) in.readObject();
Serializable • Objects that are written to an object stream must belong to a class that implements the Serializable interface. • Serializable interface has no methods. • What is it good for then!? class BankAccount implements Serializable { . . . }
Serializable • Implementing Serializable tells Java that a class can be serialized • Most issues of serialization cannot be: • Entirely identified with an interface • Detected by the compiler • Therefore Java is forced to trust us • The interface has only Semantic meaning • Exceptions will be thrown if problems arise • If you want more control over serialization, implement java.io.Externizable
Serializable • Serialization: process of saving objects to a stream • Each object is assigned a serial number on the stream • If the same object is saved twice, only serial number is written out the second time • When reading, duplicate serial numbers are restored as references to the same object
Conclusion • Questions? • I will be in Lab now until 6:30