220 likes | 332 Views
CSC 140: Introduction to IT. Advanced File Processing. Topics. Compressing files: compress, gzip, bzip2 Archiving Files: tar Sorting files: sort. Data Compression. Problem: How can we store X bytes using only Y < X bytes? Solution: Find redundancies in the data.
E N D
CSC 140: Introduction to IT Advanced File Processing CIT 140: Introduction to IT
Topics • Compressing files: • compress, • gzip, • bzip2 • Archiving Files: tar • Sorting files: sort CIT 140: Introduction to IT
Data Compression Problem: How can we store X bytes using only Y < X bytes? Solution: Find redundancies in the data. • Run-length encoding Encode reptitions as the repeated value and a count. Ex: thethethe -> the3 • Dictionary encoding Build dictionary of words. Encode each with a number. Common words: the, an, is, this CIT 140: Introduction to IT
"Ask not what your country can do for you -- ask what you can do for your country." Dictionary: 1 ask 2 not 3 what 4 your 5 country 6 can 7 do 8 for 9 you Data Compression Encoded version: “1 2 3 4 5 6 7 8 9 – 1 3 9 6 7 8 4 5.” CIT 140: Introduction to IT
Compressing Files: compress compress [-c] [-d] [-l] [-v] file1 [file2, …] -c Send output to stdout. -d Decompress instead of compressing. -v Provide verbose output. CIT 140: Introduction to IT
Compressing Files Old School The compress command compress [options][file-list] CIT 140: Introduction to IT
Uncompressing Files Old School The uncompress command CIT 140: Introduction to IT
Compressing Files: gzip gzip [-#] [-c] [-d] [-l] [-v] file1 [file2, …] -# Specify compression level. Default=6. -c Send output to stdout. -d Decompress instead of compressing. -l List compression stats. -v Provide verbose output. CIT 140: Introduction to IT
Compressing Files: gzip > man bash >bash.man > man tcsh >tcsh.man > ls –l *man -rw-r--r-- 1 waldenj 267350 Oct 4 19:48 bash.man -rw-r--r-- 1 waldenj 239534 Oct 4 19:48 tcsh.man > gzip *.man > ls –l *gz -rw-r--r-- 1 waldenj 71333 Oct 4 19:45 bash.man.gz -rw-r--r-- 1 waldenj 69759 Oct 4 19:45 tcsh.man.gz > gzip –l *gz compressed uncompressed ratio uncompressed_name 71333 267350 73.3% bash.man 69759 239534 70.8% tcsh.man 141092 506884 72.1% (totals) > CIT 140: Introduction to IT
Uncompressing Files: gunzip > gunzip bash.man.gz > ls -l *man *gz -rw-r--r-- 1 waldenj 267350 Oct 4 19:45 bash.man -rw-r--r-- 1 waldenj 69759 Oct 4 19:45 tcsh.man.gz > gzip -v bash.man bash.man: 73.3% -- replaced with bash.man.gz > gzip -dc bash.man.gz | less User Commands BASH(1) NAME bash - GNU Bourne-Again Shell … > ls -l *man *gz -rw-r--r-- 1 waldenj 71333 Oct 4 19:45 bash.man.gz -rw-r--r-- 1 waldenj 69759 Oct 4 19:45 tcsh.man.gz CIT 140: Introduction to IT
Modern Compression: bzip2 bzip2 [-#] [-c] [-d] [-l] [-v] file1 [file2, …] -# Specify compression level. Default=9. -c Send output to stdout. -d Decompress instead of compressing. -v Provide verbose output. CIT 140: Introduction to IT
Modern Compression: bzip2 > bzip2 -v bash.man tcsh.man bash.man: 4.821:1, 1.659 bits/byte, 79.26% saved, 267350 in, 55456 out. tcsh.man: 4.259:1, 1.878 bits/byte, 76.52% saved, 239534 in, 56236 out. > ls -l *bz2 -rw-r--r-- 1 waldenj 55456 Oct 4 19:45 bash.man.bz2 -rw-r--r-- 1 waldenj 56236 Oct 4 19:48 tcsh.man.bz2 > bzip2 -d bash.man.bz2 > bunzip2 tcsh.man.bz2 > ls -l *.man -rw-r--r-- 1 waldenj 267350 Oct 4 19:45 bash.man -rw-r--r-- 1 waldenj 239534 Oct 4 19:48 tcsh.man > bzip2 -dc bash.man.bz2 |less User Commands BASH(1) NAME bash - GNU Bourne-Again Shell CIT 140: Introduction to IT
Displaying Compressed Files zcat • Identical to compress -dc gzcat • Identical to gzip -dc bzcat2 • Identical to bzip2 -dc CIT 140: Introduction to IT
Compression Benchmarks > ls -l patch* -rw-r--r-- 1 waldenj 28944395 Oct 4 19:37 patch-2.6.13 -rw-r--r-- 1 waldenj 10238237 Oct 4 19:37 patch-2.6.13.Z -rw-r--r-- 1 waldenj 5009926 Oct 4 19:37 patch-2.6.13.bz2 -rw-r--r-- 1 waldenj 6220228 Oct 4 19:37 patch-2.6.13.gz CIT 140: Introduction to IT
Archiving Files: tar tar [-c] [-t] [-x] [-v] [-f file.tar] file1 [file2, …] -c Create a new tape archive. -f Write the archive to specified file instead of writing to tape. -t Trace (view) archive contents. -v Provide verbose output. -x eXtract archive contents. CIT 140: Introduction to IT
Archiving Files: tar > tar -cvf manpages.tar *.man bash.man tcsh.man > ls -l manpages.tar -rw-r--r-- 1 waldenj 512000 Oct 4 21:01 manpages.tar > tar -tf manpages.tar bash.man tcsh.man > tar -tvf manpages.tar -rw-r--r-- waldenj/students 267350 2005-10-04 19:45 bash.man -rw-r--r-- waldenj/students 239534 2005-10-04 19:48 tcsh.man > mkdir tmp > cd tmp > tar -xvf ../manpages.tar bash.man tcsh.man CIT 140: Introduction to IT
Other File Compression Tools PKzip/WinZip zip, unzip ARJ arj, unarj RAR rar, unrar CIT 140: Introduction to IT
Sorting Ordering set of items by some criteria. Systems in which sorting is used include: • Words in a dictionary. • Names of people in a telephone directory. • Numbers. CIT 140: Introduction to IT
Sorting: sort sort [-f] [-i] [-d] [-l] [-v] file1 [file2, …] -d Sort in dictionary order (default.) -f Ignore case of letters. -i Ignore non-printable characters. -n Sort in numerical order. -r Reverse order of sort -u Do not list duplicate lines in output. CIT 140: Introduction to IT
sort Example > cat days.txt Sunday Monday Tuesday Wednesday Thursday Friday Saturday > sort days.txt Friday Monday Saturday Sunday Thursday Tuesday Wednesday CIT 140: Introduction to IT
sort Example > cat days.txt Sunday Monday Tuesday Wednesday Thursday Friday Saturday > sort -r days.txt Wednesday Tuesday Thursday Sunday Saturday Monday Friday CIT 140: Introduction to IT
sort Example > cat numbers.txt 101 5571 58 2001 9 > sort numbers.txt 101 2001 5571 58 9 > sort -n numbers.txt 9 58 101 2001 5571 CIT 140: Introduction to IT