80 likes | 233 Views
A short introduction to Apache Crunch. What is it and how does it simplify and aid the creation of Hadoop pipelines ?
E N D
Apache Crunch • What is it ? • How does it work ? • Why use it ? • Hadoop MapReduce pipelines • Scrunch • Joins www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Apache Crunch – Pipe line • Crunch is based on Google's FlumeJava • Provides a Java based API for M/R pipelines • It uses an MST ( multiple serializable type ) data model • Good for processing complex data types • Better for “non tuple” data types i.e. • Images • Audio • Seismic data www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Apache Crunch – Pipe line • What is a Map Reduce Pipe line ? • Map • Shuffle • Reduce • Combine • Arranged in sequence and / or in parallel • Potentially very long chains www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Apache Crunch – Scala • Scrunch is a Scala wrapper for Apache Crunch • Reduced code • Functional and OO styles • Uses type inferencing for Map / Reduce • Incorporates Java Materialize functionality • Includes REPL ( read eval print loop ) www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Apache Crunch – Joins • Details of Joins available in Crunch • Inner / Outer like SQL joins • Same with Left / Right / Full joins • MapSide join is an in memory join www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Apache Crunch – Performance • A light weight API that runs efficiently • Crunch is a thin veneer on top of Map Reduce • Two implementations available • Hadoop Writeables • Avro • Avro implementation much faster www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Apache Crunch – API • Operators • DoFn • CombineFn • FilterFn • Joins • Cartesian • Sort • Secondary Sort • Pobject • BloomFilters • Data Model • Pipeline • MRPipeline • MemPipeline • Pcollection • Ptable • PgroupTable • Source • Target • Emitter • PType www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Contact Us • Feel free to contact us at • www.semtech-solutions.co.nz • info@semtech-solutions.co.nz • We offer IT project consultancy • We are happy to hear about your problems • You can just pay for those hours that you need • To solve your problems