220 likes | 236 Views
Learn advanced file processing methods, compress and archive files using gzip, bzip2, and tar. Discover sorting and data compression strategies to reduce storage needs efficiently. Explore modern compression with bzip2, and benchmark compression ratios for optimal efficiency.
E N D
CSC 140: Introduction to IT Advanced File Processing CIT 140: Introduction to IT
Topics • Compressing files: • compress, • gzip, • bzip2 • Archiving Files: tar • Sorting files: sort CIT 140: Introduction to IT
Data Compression Problem: How can we store X bytes using only Y < X bytes? Solution: Find redundancies in the data. • Run-length encoding Encode reptitions as the repeated value and a count. Ex: thethethe -> the3 • Dictionary encoding Build dictionary of words. Encode each with a number. Common words: the, an, is, this CIT 140: Introduction to IT
"Ask not what your country can do for you -- ask what you can do for your country." Dictionary: 1 ask 2 not 3 what 4 your 5 country 6 can 7 do 8 for 9 you Data Compression Encoded version: “1 2 3 4 5 6 7 8 9 – 1 3 9 6 7 8 4 5.” CIT 140: Introduction to IT
Compressing Files: compress compress [-c] [-d] [-l] [-v] file1 [file2, …] -c Send output to stdout. -d Decompress instead of compressing. -v Provide verbose output. CIT 140: Introduction to IT
Compressing Files Old School The compress command compress [options][file-list] CIT 140: Introduction to IT
Uncompressing Files Old School The uncompress command CIT 140: Introduction to IT
Compressing Files: gzip gzip [-#] [-c] [-d] [-l] [-v] file1 [file2, …] -# Specify compression level. Default=6. -c Send output to stdout. -d Decompress instead of compressing. -l List compression stats. -v Provide verbose output. CIT 140: Introduction to IT
Compressing Files: gzip > man bash >bash.man > man tcsh >tcsh.man > ls –l *man -rw-r--r-- 1 waldenj 267350 Oct 4 19:48 bash.man -rw-r--r-- 1 waldenj 239534 Oct 4 19:48 tcsh.man > gzip *.man > ls –l *gz -rw-r--r-- 1 waldenj 71333 Oct 4 19:45 bash.man.gz -rw-r--r-- 1 waldenj 69759 Oct 4 19:45 tcsh.man.gz > gzip –l *gz compressed uncompressed ratio uncompressed_name 71333 267350 73.3% bash.man 69759 239534 70.8% tcsh.man 141092 506884 72.1% (totals) > CIT 140: Introduction to IT
Uncompressing Files: gunzip > gunzip bash.man.gz > ls -l *man *gz -rw-r--r-- 1 waldenj 267350 Oct 4 19:45 bash.man -rw-r--r-- 1 waldenj 69759 Oct 4 19:45 tcsh.man.gz > gzip -v bash.man bash.man: 73.3% -- replaced with bash.man.gz > gzip -dc bash.man.gz | less User Commands BASH(1) NAME bash - GNU Bourne-Again Shell … > ls -l *man *gz -rw-r--r-- 1 waldenj 71333 Oct 4 19:45 bash.man.gz -rw-r--r-- 1 waldenj 69759 Oct 4 19:45 tcsh.man.gz CIT 140: Introduction to IT
Modern Compression: bzip2 bzip2 [-#] [-c] [-d] [-l] [-v] file1 [file2, …] -# Specify compression level. Default=9. -c Send output to stdout. -d Decompress instead of compressing. -v Provide verbose output. CIT 140: Introduction to IT
Modern Compression: bzip2 > bzip2 -v bash.man tcsh.man bash.man: 4.821:1, 1.659 bits/byte, 79.26% saved, 267350 in, 55456 out. tcsh.man: 4.259:1, 1.878 bits/byte, 76.52% saved, 239534 in, 56236 out. > ls -l *bz2 -rw-r--r-- 1 waldenj 55456 Oct 4 19:45 bash.man.bz2 -rw-r--r-- 1 waldenj 56236 Oct 4 19:48 tcsh.man.bz2 > bzip2 -d bash.man.bz2 > bunzip2 tcsh.man.bz2 > ls -l *.man -rw-r--r-- 1 waldenj 267350 Oct 4 19:45 bash.man -rw-r--r-- 1 waldenj 239534 Oct 4 19:48 tcsh.man > bzip2 -dc bash.man.bz2 |less User Commands BASH(1) NAME bash - GNU Bourne-Again Shell CIT 140: Introduction to IT
Displaying Compressed Files zcat • Identical to compress -dc gzcat • Identical to gzip -dc bzcat2 • Identical to bzip2 -dc CIT 140: Introduction to IT
Compression Benchmarks > ls -l patch* -rw-r--r-- 1 waldenj 28944395 Oct 4 19:37 patch-2.6.13 -rw-r--r-- 1 waldenj 10238237 Oct 4 19:37 patch-2.6.13.Z -rw-r--r-- 1 waldenj 5009926 Oct 4 19:37 patch-2.6.13.bz2 -rw-r--r-- 1 waldenj 6220228 Oct 4 19:37 patch-2.6.13.gz CIT 140: Introduction to IT
Archiving Files: tar tar [-c] [-t] [-x] [-v] [-f file.tar] file1 [file2, …] -c Create a new tape archive. -f Write the archive to specified file instead of writing to tape. -t Trace (view) archive contents. -v Provide verbose output. -x eXtract archive contents. CIT 140: Introduction to IT
Archiving Files: tar > tar -cvf manpages.tar *.man bash.man tcsh.man > ls -l manpages.tar -rw-r--r-- 1 waldenj 512000 Oct 4 21:01 manpages.tar > tar -tf manpages.tar bash.man tcsh.man > tar -tvf manpages.tar -rw-r--r-- waldenj/students 267350 2005-10-04 19:45 bash.man -rw-r--r-- waldenj/students 239534 2005-10-04 19:48 tcsh.man > mkdir tmp > cd tmp > tar -xvf ../manpages.tar bash.man tcsh.man CIT 140: Introduction to IT
Other File Compression Tools PKzip/WinZip zip, unzip ARJ arj, unarj RAR rar, unrar CIT 140: Introduction to IT
Sorting Ordering set of items by some criteria. Systems in which sorting is used include: • Words in a dictionary. • Names of people in a telephone directory. • Numbers. CIT 140: Introduction to IT
Sorting: sort sort [-f] [-i] [-d] [-l] [-v] file1 [file2, …] -d Sort in dictionary order (default.) -f Ignore case of letters. -i Ignore non-printable characters. -n Sort in numerical order. -r Reverse order of sort -u Do not list duplicate lines in output. CIT 140: Introduction to IT
sort Example > cat days.txt Sunday Monday Tuesday Wednesday Thursday Friday Saturday > sort days.txt Friday Monday Saturday Sunday Thursday Tuesday Wednesday CIT 140: Introduction to IT
sort Example > cat days.txt Sunday Monday Tuesday Wednesday Thursday Friday Saturday > sort -r days.txt Wednesday Tuesday Thursday Sunday Saturday Monday Friday CIT 140: Introduction to IT
sort Example > cat numbers.txt 101 5571 58 2001 9 > sort numbers.txt 101 2001 5571 58 9 > sort -n numbers.txt 9 58 101 2001 5571 CIT 140: Introduction to IT