1 / 22

CSC 140: Introduction to IT

CSC 140: Introduction to IT. Advanced File Processing. Topics. Compressing files: compress, gzip, bzip2 Archiving Files: tar Sorting files: sort. Data Compression. Problem: How can we store X bytes using only Y < X bytes? Solution: Find redundancies in the data.

vine
Download Presentation

CSC 140: Introduction to IT

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSC 140: Introduction to IT Advanced File Processing CIT 140: Introduction to IT

  2. Topics • Compressing files: • compress, • gzip, • bzip2 • Archiving Files: tar • Sorting files: sort CIT 140: Introduction to IT

  3. Data Compression Problem: How can we store X bytes using only Y < X bytes? Solution: Find redundancies in the data. • Run-length encoding Encode reptitions as the repeated value and a count. Ex: thethethe -> the3 • Dictionary encoding Build dictionary of words. Encode each with a number. Common words: the, an, is, this CIT 140: Introduction to IT

  4. "Ask not what your country can do for you -- ask what you can do for your country." Dictionary: 1 ask 2 not 3 what 4 your 5 country 6 can 7 do 8 for 9 you Data Compression Encoded version: “1 2 3 4 5 6 7 8 9 – 1 3 9 6 7 8 4 5.” CIT 140: Introduction to IT

  5. Compressing Files: compress compress [-c] [-d] [-l] [-v] file1 [file2, …] -c Send output to stdout. -d Decompress instead of compressing. -v Provide verbose output. CIT 140: Introduction to IT

  6. Compressing Files Old School The compress command compress [options][file-list] CIT 140: Introduction to IT

  7. Uncompressing Files Old School The uncompress command CIT 140: Introduction to IT

  8. Compressing Files: gzip gzip [-#] [-c] [-d] [-l] [-v] file1 [file2, …] -# Specify compression level. Default=6. -c Send output to stdout. -d Decompress instead of compressing. -l List compression stats. -v Provide verbose output. CIT 140: Introduction to IT

  9. Compressing Files: gzip > man bash >bash.man > man tcsh >tcsh.man > ls –l *man -rw-r--r-- 1 waldenj 267350 Oct 4 19:48 bash.man -rw-r--r-- 1 waldenj 239534 Oct 4 19:48 tcsh.man > gzip *.man > ls –l *gz -rw-r--r-- 1 waldenj 71333 Oct 4 19:45 bash.man.gz -rw-r--r-- 1 waldenj 69759 Oct 4 19:45 tcsh.man.gz > gzip –l *gz compressed uncompressed ratio uncompressed_name 71333 267350 73.3% bash.man 69759 239534 70.8% tcsh.man 141092 506884 72.1% (totals) > CIT 140: Introduction to IT

  10. Uncompressing Files: gunzip > gunzip bash.man.gz > ls -l *man *gz -rw-r--r-- 1 waldenj 267350 Oct 4 19:45 bash.man -rw-r--r-- 1 waldenj 69759 Oct 4 19:45 tcsh.man.gz > gzip -v bash.man bash.man: 73.3% -- replaced with bash.man.gz > gzip -dc bash.man.gz | less User Commands BASH(1) NAME bash - GNU Bourne-Again Shell … > ls -l *man *gz -rw-r--r-- 1 waldenj 71333 Oct 4 19:45 bash.man.gz -rw-r--r-- 1 waldenj 69759 Oct 4 19:45 tcsh.man.gz CIT 140: Introduction to IT

  11. Modern Compression: bzip2 bzip2 [-#] [-c] [-d] [-l] [-v] file1 [file2, …] -# Specify compression level. Default=9. -c Send output to stdout. -d Decompress instead of compressing. -v Provide verbose output. CIT 140: Introduction to IT

  12. Modern Compression: bzip2 > bzip2 -v bash.man tcsh.man bash.man: 4.821:1, 1.659 bits/byte, 79.26% saved, 267350 in, 55456 out. tcsh.man: 4.259:1, 1.878 bits/byte, 76.52% saved, 239534 in, 56236 out. > ls -l *bz2 -rw-r--r-- 1 waldenj 55456 Oct 4 19:45 bash.man.bz2 -rw-r--r-- 1 waldenj 56236 Oct 4 19:48 tcsh.man.bz2 > bzip2 -d bash.man.bz2 > bunzip2 tcsh.man.bz2 > ls -l *.man -rw-r--r-- 1 waldenj 267350 Oct 4 19:45 bash.man -rw-r--r-- 1 waldenj 239534 Oct 4 19:48 tcsh.man > bzip2 -dc bash.man.bz2 |less User Commands BASH(1) NAME bash - GNU Bourne-Again Shell CIT 140: Introduction to IT

  13. Displaying Compressed Files zcat • Identical to compress -dc gzcat • Identical to gzip -dc bzcat2 • Identical to bzip2 -dc CIT 140: Introduction to IT

  14. Compression Benchmarks > ls -l patch* -rw-r--r-- 1 waldenj 28944395 Oct 4 19:37 patch-2.6.13 -rw-r--r-- 1 waldenj 10238237 Oct 4 19:37 patch-2.6.13.Z -rw-r--r-- 1 waldenj 5009926 Oct 4 19:37 patch-2.6.13.bz2 -rw-r--r-- 1 waldenj 6220228 Oct 4 19:37 patch-2.6.13.gz CIT 140: Introduction to IT

  15. Archiving Files: tar tar [-c] [-t] [-x] [-v] [-f file.tar] file1 [file2, …] -c Create a new tape archive. -f Write the archive to specified file instead of writing to tape. -t Trace (view) archive contents. -v Provide verbose output. -x eXtract archive contents. CIT 140: Introduction to IT

  16. Archiving Files: tar > tar -cvf manpages.tar *.man bash.man tcsh.man > ls -l manpages.tar -rw-r--r-- 1 waldenj 512000 Oct 4 21:01 manpages.tar > tar -tf manpages.tar bash.man tcsh.man > tar -tvf manpages.tar -rw-r--r-- waldenj/students 267350 2005-10-04 19:45 bash.man -rw-r--r-- waldenj/students 239534 2005-10-04 19:48 tcsh.man > mkdir tmp > cd tmp > tar -xvf ../manpages.tar bash.man tcsh.man CIT 140: Introduction to IT

  17. Other File Compression Tools PKzip/WinZip zip, unzip ARJ arj, unarj RAR rar, unrar CIT 140: Introduction to IT

  18. Sorting Ordering set of items by some criteria. Systems in which sorting is used include: • Words in a dictionary. • Names of people in a telephone directory. • Numbers. CIT 140: Introduction to IT

  19. Sorting: sort sort [-f] [-i] [-d] [-l] [-v] file1 [file2, …] -d Sort in dictionary order (default.) -f Ignore case of letters. -i Ignore non-printable characters. -n Sort in numerical order. -r Reverse order of sort -u Do not list duplicate lines in output. CIT 140: Introduction to IT

  20. sort Example > cat days.txt Sunday Monday Tuesday Wednesday Thursday Friday Saturday > sort days.txt Friday Monday Saturday Sunday Thursday Tuesday Wednesday CIT 140: Introduction to IT

  21. sort Example > cat days.txt Sunday Monday Tuesday Wednesday Thursday Friday Saturday > sort -r days.txt Wednesday Tuesday Thursday Sunday Saturday Monday Friday CIT 140: Introduction to IT

  22. sort Example > cat numbers.txt 101 5571 58 2001 9 > sort numbers.txt 101 2001 5571 58 9 > sort -n numbers.txt 9 58 101 2001 5571 CIT 140: Introduction to IT

More Related