540 likes | 685 Views
CIS52 – File Manipulation. File Manipulation Utilities Regular Expressions sed, awk. Overview. comm – comparison of sorted files cut – output sections of lines in a file find – find files that match a pattern paste – merges records in files pr – paginate files into pages
E N D
CIS52 – File Manipulation File Manipulation Utilities Regular Expressions sed, awk
Overview • comm – comparison of sorted files • cut – output sections of lines in a file • find – find files that match a pattern • paste – merges records in files • pr – paginate files into pages • tr – translate or delete characters
Overview • regular expressions • sed – StreamEditor (batch file editor) • awk – Aho,Weinberger,Kernighan (Pattern match)
The comm before the storm • Compares 2 sorted files • Results reported in 3 columns • 1st – records found only in file 1 • 2nd – records found only in file 2 • 3rd – records that match in both files • Options remove corresponding columns • – [1] [2] [3]
comm – cont. • Either file name can be substituted with standard input • Example: • File1 File2 aa bb dd cc ee dd gg ee hh ff
comm results File1 File2 Both aa bb cc dd ee ff gg hh option -1 bb cc dd ee ff option -2 aa dd ee gg hh option -12 dd ee
cut to the chase • Allows you to extract portions of each record in a file. • Delimits data in the file into fields or columns. • Default delimiter is the tab character • Can be changed by the –d option
cut cont. • cut- [b | c | [ f [-d char] [-s] ] list [--output-delimiter=string] • b – bytes • c – characters (same as bytes) • f – fields • d – delimiter character • s– display only records with delimiters
cut ! print • char – single byte used to delimit fields in a record • list – list of range/s of characters to display • Ranges are comma separated. • 1-7 first 7 characters in record • 1,7 first and seventh characters
cut ! print again • string – list of characters to substitute for the delimiters.
cut - Example [/@linux2 uid]$ cat file1 The quick brown fox eyed the jactitating dog [/@linux2 uid]$ cut –f1,3,5,7 –d’ ‘ file1 The brown eyed dog [/@linux2 uid]$ cut –f1,4-6,7 –d’ ‘ file1 The fox eyed the dog
find that pot of gold • find – selects all files that meet the selection criteria in the expression • No action is taken unless it is specified • Sub-directories are scanned automatically • The expression can be simple or complex
find me something • The criteria expression: • And’s each operand separated by a space • Or’s each operand separated by –o • Processes left to right sequentially
find criteria continued • Actions • -print prints the path of all files that meet the selection criteria • -exec cmds\; executes the commands before the \: • -ok same as –exec but must have a Y from stdin.
find criteria continued again • Evaluations • -type specify a type of file (ie. directory) • -atime ±naccessed ±n days ago. • -mtime ±nmodified ±n days ago. • -user uid owner of the file • -nouser uid owner is not known to system
paste tastes good • paste [options] [filelist]each record in the file is merged into 1 record • -s process filelist sequentially. All records are processed before going to the next file • -d [delimiter list] each character in turn delimits the file records.
paste continued [/@linux2 uid]$ cat file1 A B C [/@linux2 uid]$ cat file2 1 2 3 [/@linux2 uid]$ cat file3 x y z
paste continued [/@linux2 uid]$ paste file1 file2 file3 [/@linux2 uid]$ paste –s file1 file2 file3 Output file A 1 x B 2 y C 3 z Output file A B C 1 2 3 x y z
pr – public relations--NOT • pr paginate file(s) for printing • Can specify page attributes • Changed lines through the –l option • For multiple files each starts a new page
pr – continued • pr paginate a file for printing • Creates a header and trailer • Changed through the –h option • Suppress through the –t option • Can create columns of data • –nbrNumber of columns per line • –SxCharacter used to separate columns
pr – continued • Can create numbers for each line • –nck • c - character data separator default is tab character • k – number of digits
Regular Expressions • A set of characters that define the criteria used to identify a string within a record. • Used by vi, grep, sed, awk, and others.
tr – Translate this • tr – [c] [d] [s] [t] set1 [ set2 ]Translate from set1 to set2 • c – compliment of set1 • d – delete characters found in set1 • s – squeeze out duplicates • t – truncate set1 to length of set2
Regular Expressions • Simple strings • Bound by / … / • Interpreted literally • ie. /e D/ - matches exactly e D • Taste Dee – OK • Taste don’t – not OK
Regular Expressions • The • special single sub character • Matches any single character • ie. – /.eny/ matches Aeny Beny Ceny • The [char-range ] define a character class • The [^ char-range ] define the not-in-character class
Regular Expressions • The Ø • (asterisk) • Matches 0 or more of the preceding character. • What’s this? • /. Ø / • / [ a-zA-Z ] Ø/ • / ([^)] Ø)/
Regular Expressions • The /^ (for the rabbit) character • In the beginning … • The $/ (for the teacher) character • At the end …
Regular Expressions • Quote the raven – backslash • \. This yields • • \\ This yields \ • \* This yields * • \[ This yields [ • \] This yields ] • \ / This yields /
sed – the old Stream EDitor Sed [-n] [-fscript ] [file-list] • Copies and edits to standard output • Edits file(s) in a non-interactive mode • Gets its instructions from a script file • –ffilename contains sed instructions • No option 1st command argument is used • –n suppress stdout unless specified
sed – the old mill stream • Record processing • Read record from file list • Read record from script (or cmd line) • Apply selection criteria • If selected perform instructionand repeat 2 4 until no more script • Repeat 1 5 until no more file list.
He sed what!!?? • Instruction format [addr1 ] ,addr2 ] ] inst [arg-list] • Address • A line number • Regular expression • Addr1 – start • Addr2 – stop
Address line numbers • $ Designates the last line of the last file • 1st address line number • Starts selecting records based on their position in the input file list relative to 1. • 2nd address line number • Stops selecting records when position in the input file list is > than the line number.
He sed some more • Instructions • ! – Not negates the address selection • sed ‘!/line/ p’ file.list • {…} – Groups the instructions for the address selection
sed Instructions • p – Print now and continue • d – Delete and get the next record • q – Quit processing; Stop; Go Away
sed Instructions • c – Change • [addr1] [addr2] c\ yada yada yadaall selected records are replaced as a group by the change value • a – Append • [addr1] a\ … add the text to the end of the selected records
sed Instructions • i – Insert • [addr1] a\ … add the text to the beginning of the selected records • n – Next • [addr1] nwrites the current, gets the next and continues the script
sed Instructions • w – Write • [addr1] [,addr2] w filenamewrites the selected records to a file • r – Read • [addr1] r filenamereads records from the filename and appends them to the selected record
sed Instructions • s – Substitute • [addr1] [,addr2] s/ptrn /repl /[g] [p] [w f ]for each selected record match the pattern and replace • g – Replace all non-overlapping occurrences • p – Print the record • w – write the record to the filename
Hawk – Squawk – awk • The programmable utility that does everything.Aho – Weinberger – Kernighan • Provides: • Conditional execution • Looping • Handles: • Numeric & string variables • Regular expresions • C print facilities
awk • awk [–Fc] [–f]program-file [file list] • F – field delimiter character • f – name of the awk program file • program-file instream instructions • List of files to process
awk – program lines • pattern [ action ] • Like sed pattern selects records • Record processing is the same as sed
awk – pattern • Patterns follow regular expression format. • ~ Tests for match to regular expression • !~ Tests for NO match to regular expression • , – Establishes a pattern range all records are processed inclusively within the range • BEGINexecutes before the first record is processed • ENDexecutes after the last record is processed
awk – relationaloperators • < – less than • <= – less than or equal to • == – equal to • != – not equal to • >= – greater than or equal to • > – greater than
awk – operators • Arithmetic • + – addition • - – subtraction • * – multiplication • / – division • Assignment • = – assigns value to the left • += – adds value to the left
awk – booleanoperators • && – and • || – or • ! – not
awk – actions • # - Comment to the right on any line • Default action is print to stdout • Multiple actions can be taken • Use {…} to enclose multiple actions • Separate actions with ;
awk – actions • print variable … • Var , Var2 , Var3 • Prints variables separated by delimiter • Var Var2 Var3 • NO separators • “literal value “ • Prints exactly everything between the “ “
awk – actions • printf “cntl string” variable … • Control String • \n – new line • \t – tab • %[-] [n] [.d] conv char • - left justification • nnumber of character • .d decimal positions
awk – actions • %[-] [n] [.d] conv char • - left justification • nnumber of character • .d decimal positions • conv char – conversion characterd - decimal, e - exponent, f - floating-pointo - octal, x - hexadecimals - string
awk – variables • awk provided variables • NF – total number of fields • $1…$n – each field in the current record • FS – input field separator (default space or tab ) • OFS – output field separator (default space )