350 likes | 359 Views
Learn the UNIX approach to file processing, basic manipulation commands, and techniques to extract information from files. Understand file types, structures, processing concepts, and use input/output redirection effectively. Explore file manipulation, creation, deletion, copying, moving, combining, sorting, and more. Discover script files, join command for linking files, and generating professional reports with awk.
E N D
Chapter Four UNIX File Processing
Lesson A Extracting Information from Files
Objectives • Explain the UNIX approach to file processing • Use basic file manipulation commands • Extract characters and fields from a file using the cut command
Objectives • Rearrange fields inside a record using the paste command • Merge files using the sort command • Create a new file by combining cut, paste, and sort
UNIX Approach toFile Processing • Based on the approach that files should be treated as nothing more than character sequences • Because you can directly access each character, you can perform a range of editing tasks – this offers flexibility in terms of file manipulation
Understanding UNIX File Types • Regular files, also known as ordinary files • Create information that you maintain and manipulate, and include ASCII and binary files • Directories • System files for maintaining file system structure • Special files • Character special files relate to serial I/O devices • Communicates one character at a time • Block special files relate to devices such as disks • Communicates using blocks of data
File Structures • Files can be structured in many ways depending on the kind of data they store • UNIX stores data, such as letters and product records, as flat ASCII files • Three kinds of regular files are • Unstructured ASCII character • Unstructured ASCII records • Unstructured ASCII trees
Processing Files • When performing UNIX commands, UNIX processes data by receiving input from a standard input device (e.g. keyboard) and sends it to a standard output device (e.g.monitor) • System administrators and programmers refer to standard input as stdin, standard output as stdout • A third standard device is called standard error, or stderr. When UNIX detects errors, it directs the data to stderr, which is the monitor
Using Input and Error Redirection • You can use redirection operators to retrieve input from something other than the standard input device and send output to something other than the standard output device • Examples of redirection: • Redirect the ls command output to a file, instead of to the monitor (or screen) • Redirect a program that receives input from the keyboard to receive input from a file instead • Redirect error messages to files, instead of to the screen by default
Using Input and Error Redirection Create a file by: typing in all the commands,or by redirecting the cat command output to a file
Manipulating Files • When you manipulate files, you work with the files themselves, as well as their contents • Create files using output redirection • cat command - concatenate text via output redirection • touch command - used to create empty files
Manipulating Files • Delete files when you no longer needed • rm command - permanently removes a file or an empty directory • The -r option of the rm command will remove a directory and everything it contains • Copy files as a means of back-up or as a means to assist with new file creation • cp command - copies the file(s) specified by the source path to the location specified by the destination path
Manipulating Files • Moving a file in order to change the directory that contains it • mv command - removes file from one directory and places it in another • Finding a file helps you locate it in the directory structure • find command - searches for the file that has the name you specify
Manipulating Files • Combining files using output redirection • cat command - concatenate text of two different files via output redirection • paste command - joins text of different files in side by side fashion • Extracting fields of a file using output redirection • cut command - removes specific columns or fields from a file
Manipulating Files • Re-arranging the contents of a file • sort command - sorts a file’s contents alphabetically or numerically • The sort command offers many options: • You can sort the contents of a file and redirect the output to another file • Utilizing a sort key which provides the option of sorting on a field position within each line
Lesson B Assembling Extracted Information
Objectives • Create a script file • Use the join command to link files using a common field • Use the awk command to create a professional-looking report
Using Script Files • UNIX users create shell script files to contain commands that can be run sequentially as a set – this helps with the issues of command automation and re-use of command actions • UNIX users use the vi editor to create script files, then make the script executable using the chmod command with the x argument
Using Script Files Type out the script and then make it executable using the chmod command.
Using the Join Command • The join command is used in relational database processing • Relational databases consider files as tables and records as rows • Relational databases also consider fields as columns that can be joined to create new records • The UNIX join command lets you extract information from files sharing a common field
Using the Join Command to Create the Vendor Report Use the join command to create reports showing the relationship between two files
A Brief Introduction to theAwk Program • Awk, a pattern-scanning and processing language helps to produce professional-looking reports • The awk command lets you do the same things as the cat command (in conjunction with the join command), but more quickly and easily
A Brief Introduction to theAwk Program Awk uses a print formatting function from the C programming language to achieve a more professional-looking report
Using the awk Command toRefine the Vendor Report • To refine and automate the vendor report, create a shell script that includes only the awk command, not a series of separate commands. To have awk perform the automation properly, redirect its input to come from a disk file and not from the keyboard.
Using the awk Command toRefine the Vendor Report Awk has many features that let you manage your report output to your specification
Chapter Summary • UNIX supports regular files, directories, and character and block special files • File’s structures depend on data being stored and three kinds of regular files are unstructured ASCII characters, records and trees • When running, UNIX receives input from the standard input device (keyboard) also known as stdin, and sends output to the standard output device (monitor) also known as stdout. Another standard device, stderr, refers to the error file that defaults to the monitor
Chapter Summary • The touch command updates a file’s time and date stamps and creates empty files • The rmdir command removes empty directories • The cut command extracts specific columns or fields from a file • To combine two or more files, use the paste command • Use the sort command to sort a file’s contents alphabetically or numerically
Chapter Summary • To automate command processing, include commands in a script file that you can later execute as a program • Use the join command to extract data from two files sharing a common field and use this field to join the two files • Awk is a pattern-scanning and processing language useful for creating a formatted report with a professional look