80 likes | 174 Views
Pig from Alan Gates’ book (In preparation for exam2). Introduction. Download pig from pig.apache.org (into timberlake or your local computer/laptop) Unzip and untar it. You are set to go.
E N D
Introduction • Download pig from pig.apache.org (into timberlake or your local computer/laptop) • Unzip and untar it. You are set to go. • You can execute in local mode for learning purposes. Later on you can test it on your hadoop installation. • Navigate to the director where pig is installed. ./bin/pig –x local • Will put you in grunt mode or local mode
Data and pig Script • Create a data (called data) directory in the directory where bin is located. • Download from github all the data files related to pig book and store in the data directory • NYSE_divdidends • NYSE_daily • Etc. • Now go thru’ the examples in chapters 1-4, either by typing them in line by line or by creating script files. • Mystockanalysis.pig can be executed by • ./bin/pig –x local Mystockanalysis.pig or line by line on grunt
Chapter 1 • Hello world of pig. • Mary had little lamb example. • Go through the example in page.3 • Create “mary” file in your data directory • Type in the commands line by line as in p.3 • Now create a ch1.pig file out of the coammands • Run the script file using the pig command • Try some other commands not listed there. • Understand the examples discussed in p.5,6
Chapter 2 • Discusses installing and running pig • Go through the example in p.14. • That’s all.
Chapter 3 • Discuss the grunt shell that is the prompt for the local mode • pig –x local • Results in grunt grunt> • See the example in page 20
Chapter 4 • Pig data model • Scalars like: int, long, float, double, etc. • Complex types: Map, chararray to element mapping, sort of like key, value pair • Tuple ordered collection of Pig elements (‘bob, 55) • Bag is an unordered collection of tuples • Nulls • Schemas: Pig has lax attitude towards schemas • Explicit: • dividends = load ‘NYSE_dividends’ as (exchange:chararray, symbol:chararray, date: chararray, dividend:float); • Or you could say • divs = load ‘NYSE_dividends’ as (exchange, symbol, date, dividend); • See the table on page 28 • See the example p.28,29,30.
Chapter 5 • Pig Latin • Look at the examples p.33-50 • Commands discussed are: • Load, store, dump • Relational operations: foreach, filter, group, order ..by, distinct, join • Data operation: limit, sample, parallel.