290 likes | 444 Views
Introduction to BASH, AWK, and PERL Victor Anisimov, NCSA. FIU / SSERCA / XSEDE Workshop, Apr 4-5, 2013, Miami, FL. MOTIVATION. Increase Productivity of Research & Development
E N D
Introduction to BASH, AWK, and PERLVictor Anisimov, NCSA FIU / SSERCA / XSEDE Workshop, Apr 4-5, 2013, Miami, FL
MOTIVATION • Increase Productivity of Research & Development Scripting languages require less effort in implementation of small computational projects than that when using regular programming languages Scripts are more portable than binary code Scripts are easy to maintain Lab materials: /home/anisimov/labs.tgz on FIU cluster Important: type “module add make” after logging to FIU cluster Introduction to BASH, AWK, and PERL
BASH, AWK, and PERL BASH is a Linux shell AWK is a language for data post-processing PERL is a versatile programming language Common feature: interpreted programming languages How to decide which one I will need: project complexity dictates which language to use Introduction to BASH, AWK, and PERL
Objective of the Course As of now: • No prerequisites are necessary • No change in the way you think • No need to memorize abstract concepts At the end of the day: • You will learn three programming languages • You will improve your project organization skills • You will increase your productivity Introduction to BASH, AWK, and PERL
Every Project Works with Data • Data generation by computation • Extraction of data from text files • Data format conversion • Data computation • Data analysis and reporting • Data archival and retrieval Scripting languages can handle this work without turning the data processing into a major programming project Introduction to BASH, AWK, and PERL
Projects have Complex Processing Flows • Input to a program depends on the result of another program • The process includes many steps that need to be automated • The process is not standard and has to be created • The process needs to be optimized Scripting Languages are perfect for automation of repetitive processes Introduction to BASH, AWK, and PERL
Elements of Programming Language • Data types • Conditional statements • Loops • Functions / procedures • Input / Output Our first guide to this virtual world is BASH shell. Introduction to BASH, AWK, and PERL
BASH Data Types • BASH treats all variables as text strings • Limited support of integer arithmetics #!/bin/bash greetings="Hello ${USER}!" # example of string today=`date` # run a program by enclosing it in grave accents echo "${greetings} Today is ${today}” N=1; let N=N+2; echo "Integer math: 1+2=${N}" R=0.1; R=`echo “$R+1.2” | bc -l`; echo "FP math: 0.1+1.2=${R}” $ chmod 755 01-hello.sh $ ./01-hello.sh Hello victor! Today is Thu Apr 4 13:37:02 EST 2013 Integer math: 1+2=3 FP math: 0.1+1.2=1.3 Introduction to BASH, AWK, and PERL
BASH Conditional Statements One more data type: built-in constants $# - number of arguments; $0 - self name; $1, $2, … - command-line arguments #!/bin/bash # supported string comparison conditions: == != # supported arithmetic conditions: -eq (==) -ne (!=) –lt (<) -le (<=) -gt (>) –ge (>=) if [ $# != 2 ] ; then echo "USAGE $0 argument1 argument2" ; exit fi if [ $1 -gt $2 ] ; then echo "True: $1 -gt $2" else echo "False: $1 -gt $2" fi $ ./02-conditions.sh Introduction to BASH, AWK, and PERL
BASH Loops • Loop over list LIST="01 02 03 04 05” example: 03-loops.sh for job in ${LIST} ; do echo "job number ${job}” done • Conditional loop N=1 while [ ${N} -le 5 ] ; do echo ${N} let N=N+1 done • C-style loop for ((a=1; a <= LIMIT ; a++)) Introduction to BASH, AWK, and PERL
BASH Procedures / Functions • Functions contain repetitive part of the code #!/bin/bash # declaration of function filenameGenerator() { echo "$1.out" } # call the function and supply arguments filenameGenerator 1 filenameGenerator 2 $ ./04-functions.sh $ 1.out $ 2.out Introduction to BASH, AWK, and PERL
BASH Input / Output • I/O is extremely simple in BASH cat file.out send file content to std output mycode.sh| mytool.sh send output to another program mycode.sh> /dev/null get rid of unwanted output mycode.sh&> log.out & detach from terminal Introduction to BASH, AWK, and PERL
Sample BASH Project • Perform context replacement in text file 05-project.sh #!/bin/bash if [ $# -ne 1 ] ; then echo "Usage: $0 file.coor” else # create name for output file outfile=`echo $1 | sed 's/\.coor/\.pdb/'` # replace "HETATM" by "ATOM " in the text cat $1 | sed 's/HETATM/ATOM /' > $outfile # count number of processed lines wc -l $outfile fi Introduction to BASH, AWK, and PERL
AWK Developed by Aho, Weinberger, and Kernighan • Although simple and powerful, BASH code can quickly become bulky because of limited structural constructs • AWK designed to simplify data extraction and post-processing; and thus it nicely complements BASH when computational projects become a little more involved Introduction to BASH, AWK, and PERL
The Power of AWK in Action • Compute sum of number in the one-line code #!/bin/bash awk 'BEGIN{sum=0} {for (i = 1; i <= NF; i++) sum += $i} END{print sum}’ $ echo "1.2 2.3 3.4" | ./01-sum.sh $ 6.9 AWK logistics: • section BEGIN{…} is executed once in the beginning • standard input is processed by main program body, i.e. by second {…} block • NF is a built-in constant equal to number of fields obtained from standard input • $1, $2, … individual input fields • i is loop index, so we can address each field as $i • input fields are processed in the C-style for-loop and their value is summed up • Section END{…} is executed once in the end of execution • Variable type is automatically recognized by awk based on operation type Introduction to BASH, AWK, and PERL
AWK: Input Field Separator (option –F) • AWK accepts custom field separators #!/bin/bash awk -F$1 '{for (i = 1; i <= NF; i++) print $i}’ Use comma as field separator $ echo "1,a,3,b:5" | ./02-inpfields.sh , 1 a 3 b:5 Challenge: Try using different field separators comma character Introduction to BASH, AWK, and PERL
AWK: PDB-to-XYZ Format Conversion Arrays in AWK are super easy !!! 03-convert.sh #!/bin/bash # Convert PDB file to XYZ format if [ $# -ne 1 ] ; then echo "Usage: $0 input.pdb" else cat $1 | awk 'BEGIN {n=0} { if($1 == ”ATOM") {n=n+1; a[n]=$3; x[n]=$5; y[n]=$6; z[n]=$7} } END { printf "%d\n\n", n; for (i=1; i<=n; i++) printf "%-5s %7.3f %7.3f %7.3f\n", a[i], x[i],y[i],z[i]; }' fi Introduction to BASH, AWK, and PERL
AWK: Column Block-average 04-blockaverage.sh #!/bin/bash # compute block-average for data from loan.out if [ $# -ne 2 ] ; then echo "USAGE: $0 blocksize column” ; exit fi cat loan.out | awk -v blocksize=$1 -v column=$2 ' BEGIN{n=0; j=0} { if(NF==10) {x[n]=$column; n++} } # read all data END{ nblocks = n / blocksize; for(i=0; i<nblocks; i++){ # loop over blocks aver=0.0; # compute average for each block for(nRecs=0; nRecs<blocksize && j<n; nRecs++) { aver += x[j]; j++ } printf "%4d %9.3f %d\n", i+1, aver/nRecs, nRecs; } }' Introduction to BASH, AWK, and PERL
AWK: Multiple Input Files 05-nfiles-demo.sh 06-nfiles-full.sh • Alternative processing of input data from a file #!/bin/bash # alternative way of handling input files inpfile="loan.out” # input file to be processed nlines=`wc -l ${inpfile} | awk '{print $1}’` # get number of lines awk -v inpfile=${inpfile} -v size=${nlines} ' BEGIN{ command = "cat " inpfile; # string concatenation for(i=0; i<size; i++) { command | getline; # getting a line from the file if(NF==10) print $0; # print entire line } }' Introduction to BASH, AWK, and PERL
AWK: Functions – Return Absolute Value • Compute absolute value #!/bin/sh awk 'function abs(x){return ((x+0.0 < 0.0) ? -x : x)} {print abs($1)}’ $ echo -23.11 | ./07-function.sh 23.11 Introduction to BASH, AWK, and PERL
AWK: Writing to File • AWK writes to file by using the mechanism of output redirection #!/bin/sh # redirecting output to a file if [ $# -ne 1 ] ; then echo "Usage $0 input.pdb" ; exit fi output=`echo $1 | sed 's/\.pdb/\.txt/'` cat $1 | awk -v fname=${output} '{print $0 > fname}' 08-file.sh Introduction to BASH, AWK, and PERL
Exercise Write a script to optimize the loan duration NCSA Loan Simulator (copy left) FIU Workshop 2013, will be our computational kernel Input: Starting balance = $ 1000.00 Annual interest = % 7.20 Minimum payment = % 1.00 Output: month: 1 balance: 1006.00 charge: 6.00 payment: 259.00 interest: 6.00 month: 2 balance: 751.48 charge: 4.48 payment: 259.00 interest: 10.48 month: 3 balance: 495.43 charge: 2.95 payment: 259.00 interest: 13.43 month: 4 balance: 237.85 charge: 1.42 payment: 237.85 interest: 14.85 Simulation results: Borrowed 1000.00 Paid 1014.85 in 4 months Finance charge 14.85 The program is not flexible enough; so, how to get the answer we need? Introduction to BASH, AWK, and PERL
PERL Practical Extraction and Reporting Language by Larry Wall • Full fledge (interpreted) programming language • Highly optimized and amazingly fast • Ideal for data processing and data extraction • Lots of reusable plug-ins available for download • Fast learning curve • If you know C-language, you already know Perl Introduction to BASH, AWK, and PERL
PERL: Program Structure enable warnings #!/usr/bin/perl –w my $inpFileName = ""; # string my $sum = 0.0; # floating point if (@ARGV != 1) { # number of command-line arguments printf " USAGE %s loan.out\n", $0; exit } else { $inpFileName = $ARGV[0]; unless (open INP, "<$inpFileName") { die "Error: Cannot open input file $inpFileName” } readData(); close INP; print "All Done\n"; } sub readData { } mandatory semicolon at the end of line $0 is self program name read 1st command-line argument open file descriptor for reading (<) close file descriptor after reading is done do the work here (will be described later) Introduction to BASH, AWK, and PERL
PERL: Pattern Matching Extracting specific parts from text files is often a non-trivial task # Patterns my $ap = "\\S+"; # Any pattern my $lp = "\\w+\\d*"; # Label (text) pattern my $ip = "-?\\d+"; # Integer pattern my $rp = "-?\\d*\\.?\\d*"; # Real pattern my $ep = "[+|-]?\\d+\\.?\\d*[D|E]?[+|-]?\\d*"; # Exponential pattern (scientific format) \s – space [+|-]? – either + or – or neither \S – non-space + – one or more same instances \w – word character (a-zA-Z0-9) ? – optional instance \W – anything but word character * – any number of same instances \d – numeric character (0-9) \D – anything except numeric \. – any character mask multiplier Introduction to BASH, AWK, and PERL
PERL: Arrays @ARGV # built-in array for command-line arguments my @array = (); # array declaration # accessing array elements for(my $i=0; $i < $nRecords; $i++) { printf "%9.3f \n", $array[$i]; } # returning and passing arrays ($nRecords, $total) = readData( $ARGV[1], \@array ); sub readData { my ($column, $data) = @_; $$data[$i] = $substring; # such array must be handled as a pointer } Introduction to BASH, AWK, and PERL
Exercise: Data Extraction Project 01-parser.pl • Use the data from loan.out • Read a specified column • Sum up the values • Extra credit: make sure that the values to be summed up have type real Introduction to BASH, AWK, and PERL
Useful Internet Resources • BASH http://tldp.org/LDP/abs/html/ • AWK http://www.gnu.org/software/gawk/manual/gawk.html • PERL http://www.perl.org book: Learning Perl, Author: Randal L. Schwartz, O’Reilly Introduction to BASH, AWK, and PERL
Let Us know your opinion http://www.bitly.com/fiuworkshop Thank you !!! Introduction to BASH, AWK, and PERL