390 likes | 597 Views
东北大学. VLDB summer school 学习报告. 王春磊 2013 年 9 月 6 日. Agenda. Parallel Data Processing Big Data Map/Reduce and Hadoop Stratosphere – A Platform for Big Data Analytics. DIMA – TU Berlin. 29.07.2013. 78. DIMA – TU Berlin. The Stratosphere System Stack Layered approach – several
E N D
东北大学 VLDB summer school 学习报告 王春磊 2013年9月6日
Agenda • ParallelDataProcessing • BigData • Map/ReduceandHadoop • Stratosphere – A Platform for Big Data Analytics DIMA–TUBerlin 29.07.2013 78
TheStratosphereSystemStack Layeredapproach–several entrypointstothesystem Pact4 Scala Scala-Compiler Plugin Meteor Script SOPREMO Compiler PACTProgram StratosphereOptimizer RuntimeOperators NepheleDataflowEngine NepheleParallel Dataflow DIMA–TUBerlin 29.07.2013 79 79
TheStratosphereSystemStack Meteorscriptinglanguage -InspiredbyJaql -Nesteddatamodel(JSON) -Relationalcoreoperators -Packagesforinformation extractionandintegration Pact4 Scala Meteor Script PACTProgram SOPREMO Compiler Scala-Compiler Plugin StratosphereOptimizer RuntimeOperators NepheleDataflowEngine NepheleParallel Dataflow DIMA–TUBerlin 29.07.2013 80 80
TheStratosphereSystemStack PACTprogrammingmodel GeneralizesMapReduce withadditional second-orderfunctions (PArallelizationConTracts) Pact4 Scala Meteor Script PACTProgram SOPREMO Compiler Scala-Compiler Plugin StratosphereOptimizer RuntimeOperators NepheleDataflowEngine NepheleParallel Dataflow DIMA–TUBerlin 29.07.2013 81 81
TheStratosphereSystemStack Runtimeengine -Memorymanagement -AsynchronousIO -Queryexecution (sorting,hashing,…) Pact4 Scala Scala-Compiler Plugin Meteor Script SOPREMO Compiler PACTProgram StratosphereOptimizer RuntimeOperators NepheleDataflowEngine NepheleParallel Dataflow DIMA–TUBerlin 29.07.2013 82 82
TheStratosphereSystemStack Nepheledataflowengine -Resourceallocation -Scheduling -Taskcommunication -Faulttolerance -Executionmonitoring Pact4 Scala Scala-Compiler Plugin Meteor Script SOPREMO Compiler PACTProgram StratosphereOptimizer RuntimeOperators NepheleDataflowEngine NepheleParallel Dataflow DIMA–TUBerlin 29.07.2013 83 83
TheStratosphereSystemStack Stratosphereoptimizer picks: -Physicalexecution strategies -Partitioningstrategies -Operatororder Pact4 Scala Scala-Compiler Plugin Meteor Script SOPREMO Compiler PACTProgram StratosphereOptimizer RuntimeOperators NepheleDataflowEngine NepheleParallel Dataflow DIMA–TUBerlin 29.07.2013 84 84
Abschnittsübersicht Meteor Compiler Scala- Compiler StratosphereOptimizer RuntimeOperators NepheleDataflowEngine THEMETEORSCRIPTING LANGUAGE DIMA–TUBerlin 29.07.2013 85 85
MeteorExamples DIMA–TUBerlin 29.07.2013 86 86
MeteorExamples(2) DIMA–TUBerlin 29.07.2013 87 87
Abschnittsübersicht Meteor Compiler Scala- Compiler StratosphereOptimizer RuntimeOperators NepheleDataflowEngine THENEPHELEEXECUTION ENGINE DIMA–TUBerlin 29.07.2013 89 89
NepheleJobGraphs JobGraph ExecutionGraph F F F Channels E E E Tasks Parallel Execution D D D (networkchannel) C C C B B B (memorychannel) A A A Tasksconsumedatastreams andproducedatastreams Channelsarespannedaccordingtoa "distributionpattern" DIMA–TUBerlin 29.07.2013 90 90
NepheleArchitecture Workloadovertime ■Standardmasterworkerpattern ■Workerscanbeallocatedondemand Client PublicNetwork(Internet) ComputeCloud Master Private/VirtualizedNetwork CloudController PersistentStorage Worker Worker Worker DIMA–TUBerlin 29.07.2013 91
StructureofaNepheleSchedule ■NepheleScheduleisrepresentedasDAG Output1 Task:LineWriterTask.program Output:s3://user:key@storage/outp Task1 Task:MyTask.program □Verticesrepresenttasks □Edgesdenotecommunicationchannels ■Mandatoryinformationforeachvertex □Taskprogram □Input/outputdatalocation(I/Overticesonly) ■Optionalinformationforeachvertex Numberofsubtasks(degreeofparallelism) Numberofsubtaskspervirtualmachine □ □ Typeofvirtualmachine(#CPUcores,RAM…) Channeltypes Sharingvirtualmachinesamongtasks □ □ □ Input1 Task:LineReaderTask.program Input:s3://user:key@storage/input DIMA–TUBerlin 29.07.2013 92
InternalScheduleRepresentation ■Nephelescheduleisconvertedintointernal representation Output1(1) ID:2 Type:m1.large Task1(2) ■Explicitparallelization □Parallelizationrange(mpl)derivedfromPACT □WiringofsubtasksderivedfromPACT ■Explicitassignmenttovirtualmachines □SpecifiedbyIDandtype □Typereferstohardwareprofile ID:1 Type:m1.small Input1(1) DIMA–TUBerlin 29.07.2013 93
ExecutionStages ■Issueswithon-demandallocation: □Whentoallocatevirtualmachines? □Whentodeallocatevirtualmachines? □Noguaranteeofresourceavailability! ■Stagesensurethreeproperties: □VMsofupcomingstageareavailable □Allworkersaresetupandready □Dataofpreviousstagesisstoredinpersistent manner Stage1 Output1(1) ID:2 Type:m1.large Stage0 Task1(2) ID:1 Type:m1.small Input1(1) DIMA–TUBerlin 29.07.2013 94
ChannelTypes ■Networkchannels(pipeline) Stage1 Output1(1) ID:2 Type:m1.large □Verticesmustbeinsamestage ■In-memorychannels(pipeline) □VerticesmustrunonsameVM □Verticesmustbeinsamestage Stage0 ■Filechannels □VerticesmustrunonsameVM □Verticesmustbeindifferentstages Task1(2) ID:1 Type:m1.small Input1(1) DIMA–TUBerlin 29.07.2013 95
FromPACTstoNephele PACTcode (grouping) invoke(): while(!input2.eof) KVPairp=input2.next(); hash-table.put(p.key,p.value); functionmatch(Keyk,Tupleval1, while(!input1.eof) KVPairp=input1.next(); Tupleval2) ->(Key,Tuple) User Function KVPaitt=hash-table.get(p.key); if(t!=null) KVPair[]result= UF.match(p.key,p.value,t.value); { Tupleres=val1.concat(val2); res.project(...); Keyk=res.getColumn(1); output.write(result); Return(k,res); end } Nephelecode (communication) In-Memory Channel compile V4 V4 UF1 (map) UF2 (map) V1 V2 V3 V1 V3 V2 V3 V1 V3 V2 UF4 UF3 (match) span V3 V4 Network Channel (reduce) DIMA–TUBerlin 29.07.2013 96 96
Abschnittsübersicht Meteor Compiler Scala- Compiler StratosphereOptimizer RuntimeOperators NepheleDataflowEngine Second-orderfunctionsfordataparallelism THEPACTPROGRAMMING MODEL 29.07.2013 DIMA–TUBerlin 97 97
ParallelizationContracts(PACTs) Second-order function First-orderfunction (usercode) Data Data ■Describehowinputispartitionedingroups □“Whatisprocessedtogether” ■First-orderUDFcalledonceperinputgroup ■MapPACT □Eachinputrecordformsagroup, □EachrecordisindependentlyprocessedbyUDF ■ReducePACT □Oneattributeisthedesignatedkey □Allrecordswithsamekeyvalueformagroup MapPACT ReducePACT 29.07.2013 DIMA–TUBerlin 98 98
MoreParallelizationContracts CrossPACT Eachpairofinput recordsformsa group MatchPACT Eachpairwithequal keyvaluesformsa group CoGroupPACT Allpairswithequal keyvaluesforma group 2DReduce DistributedDistributedequi-join Cartesianproduct ■MorePACTscurrentlyunderconsideration □Forsimilarityoperators,streamprocessing,etc 29.07.2013 DIMA–TUBerlin 99 99
PACTProgrammingModel ■APACTprogramisanarbitrary dataflowDAGconsistingofoperators ■Anoperatorconsistsof □Asecond-orderfunction(SOF)signature (PACT) □Auser-definedfirst-orderfunction(FOF) writteninJava ■PACTprogramsserveasintermediate representation,butarealsoexposed totheuser □ToimplementUDFsforfunctionalitynot supportedbyMeteor Sink1 Reduce(onA) sum(B),avg(C) Match(A=D) if(A>3)emit Map C:=max(A,B) Source1 Extract(A,B) Map if(D>4)emit Source2 Extract(D,E) 29.07.2013 DIMA–TUBerlin 100 100
PACTProgrammingModel ■APACTprogramisanarbitrary dataflowDAGconsistingofoperators ■Anoperatorconsistsof □Asecond-orderfunction(SOF)signature (PACT) □Auser-definedfirst-orderfunction(FOF) writteninJava ■PACTprogramsserveasintermediate representation,butarealsoexposed totheuser □ToimplementUDFsforfunctionalitynot supportedbyMeteor Sink1 Reduce(onA) sum(B),avg(C) Match(A=D) if(A>3)emit Map C:=max(A,B) Source1 Extract(A,B) Map if(D>4)emit Source2 Extract(D,E) 29.07.2013 DIMA–TUBerlin 100 100
Abschnittsübersicht Meteor Compiler Scala- Compiler StratosphereOptimizer RuntimeOperators NepheleDataflowEngine OpeningtheBlackBoxes THESTRATOSPHERE OPTIMIZER 29.07.2013 DIMA–TUBerlin 101 101
OptimizerDesign ■Cost-basedoptimizerproducesphysicalexecutionplan givenPACTprogram Annotatesedgeswithdistributionpatters,e.g.,broadcast,partition Choosesphysicalexecutionstrategies(e.g.,hash/sort) ReordersPACTfunctions Constructs“Nephelejobgraph” □ □ □ □ ■Challenge:Semanticsofuser-definedfunctionsunknown □Howtoderivecorrecttransformations(thistalk) □Howtocostfunctions(ongoingwork) □MixandmatchUDFsandnativeoperators(ongoingwork) 29.07.2013 DIMA–TUBerlin 102 102
OptimizationOverview ■Approach: □StaticallyanalyzeusercodeineachPACTUDFsandextractproperties □Basedontheseproperties,derivesemanticallycorrecttransformations □Enumeratesemanticallyequivalentplans ■Contribution:HowtodeeplyembedMapReducefunctions intoaqueryoptimizer □Parallelizationandreordering □Appliestodataflowscomposed(inpart)offunctionswrittenin arbitraryimperativecode □ExportabletoScope,SQL/MapReduce(e.g.,Aster,Greenplum) 29.07.2013 DIMA–TUBerlin 103 103
…viaStaticCodeAnalysis Feasible: 1voidmatch(Recordleft, 2 3 4 5 6 7 8 9 10 11 12 13 14 Recordright, Collectorcol){ Recordout=copy(left); if(left.get(0)>3){ doublea=right.get(2); out.set(2,1.0/a); } out.set(1,42); out.set(3,right.get(0)); out.set(4,right.get(1)); out.set(5,right.get(2)); col.emit(out); } 1.Recorddatamodel, fixedAPIfor 2.Nocontrolflowbetween operators Correct: ■Difficultycomesfrom differentcodepaths ■Correctnessguaranteed throughconservatism ■AddtoR,Wwhenin doubt 29.07.2013 DIMA–TUBerlin 104 104
OpeningtheBlackBoxes… Analyzeusercodetodiscover: ■OutputschemaOf:Schemaofoutput recordgivenschemaofinput record(s) ■ReadsetRf:Attributesoftheinput record(s)thatmightinfluenceoutput ■WritesetWf:Attributesofthe outputrecord(s)thatmighthave differentvaluesfromrespective inputattributes ■EmitcardinalityEf:Boundson recordsemittedpercall(1,>1,…) (Of,Rf,Wf,Ef) PACT f 29.07.2013 DIMA–TUBerlin 105 105
CodeAnalysisAlgorithm ■Rffromgetstatements ■Wfbybackwardstraversal ofdataflowgraph startingfromemit statement ■Efbytraversingcontrol flowgraph Input1=[A,B,C] Input2=[D,E,F] Output=[A,B,C,D,E,F] Rf={A,B,C,D,E,F} Wf={B,C} Ef=1 1voidmatch(Recordleft, 2Recordright, 3Collectorcol){ 4Recordout=copy(left); 5if(left.get(0)>3){ 6doublea=right.get(2); 7out.set(2,1.0/a); 8} 9out.set(1,42); 10out.set(3,right.get(0)); 11out.set(4,right.get(1)); 12out.set(5,right.get(2)); 13col.emit(out); 14} 29.07.2013 DIMA–TUBerlin 106 106
AutomaticParallelization ■Optimizercanpick partitioningstrategies □FromPACTsignature ■E.g.,forMatch: broadcast,partition,SFR ■Partitioningstrategies propagatedtop-downas interestingproperties ■Caninferpreserved partitioningviaR/Wsets □Identifiespass-through UDFs ■AReducedoesnotalways implyaphysicalsort operator Sink1 Reduce(onA) sum(B),avg(C) fifo Match(A=D) if(A>3)emit probeHT(A)buildHT(D) parti./sort.(A)partition(D) MapMap C:=max(A,B)if(D>4)emit Source1Source2 Extract(A,B)Extract(D,E) 29.07.2013 DIMA–TUBerlin 107 107
OperatorReordering ■ReorderingPACTs □Reducedatavolume □Introducenewpartitioningopportunities ■Reordering,partitioning,andphysical operatorsinonestage □“Optimal”executionplan ■Powerfultransformationsusingread andwriteconflicts ■Can“emulate”mostrelational optimizationswithoutknowing operatorsemantics Sink1 Match(A=D) if(A>3)emit buildHT(A) Reduce(onA)probeHT(D) sum(B),avg(C)part./sort(D) buildHT(A) partition(A) Map C:=max(A,B) Source1 Extract(A,B) Map if(D>4)emit Source2 Extract(D,E) 29.07.2013 DIMA–TUBerlin 108 108
ExampleTransformations Theorem1:TwoMapoperatorscanbereorderediftheir UDFshaveonlyread-readconflicts Theorem2:ForaMapandaReduce,weneedinaddition theReducekeygroupstobepreserved Enabledoptimizations: Selectionpush-down (Bushy)joinreordering Aggregationpush-down Equivalenttoinvariantgroupingtransformation[Chaudhuri&Shim1994]f Reorderingofnon-relationalReducefunctions 29.07.2013 DIMA–TUBerlin 109 109
Abschnittsübersicht SOPREMO Compiler Scala- Compiler Stratosphereoptimizer RuntimeOperators NepheleDataflowEngine Spinningfastiterativedataflows SUPPORTFORITERATIVE QUERIES 29.07.2013 DIMA–TUBerlin 110 110
Motivation ■IterationsimportantforMachineLearning,graphs,etc ■Manysetupsrequiremultiplesystemsthatarededicatedto specialstepsinaprocessingpipeline ■Example: □MapReduce(extract,filter,transform,aggregate) □SpecializedSystemsforModeltraining MapReduceUpdate,Pregel/GraphLab,Specialized"homebrewed" solutions Extract Transform Postprocess &TestModel TrainModel DWH/ Hadoop/ Stratosphere DWH/ Hadoop/ Stratosphere Pregel(Giraph) GraphLab DIMA–TUBerlin Source Data 29.07.2013 Result 111 111
Approach ■Don’tbuildspecializedsystems-embediterationsina dataflowsystemsurfacingproperabstractions ■Gainqueryoptimization,externalmemoryalgorithms,… ■LayerAPIsontop DSLScript CustomAPI DataFlowAPI DataFlowOptimizer ParallelDataFlowRuntime DataFlowPrograms (MapReduce/PACT/ extendedRel.Alg.) 29.07.2013 DIMA–TUBerlin 112 112
“Bulk”Iterations ■Recomputestateateach iteration ■Conceptualfeedbackedgein thedataflow–lazyunrolling possible ■Distinguishdynamicdata path(differentdataeach iteration)andconstantdata path(same) □Cachingheuristicswere constantanddynamicpaths meet □Cacheddatamaybeindexed ■Optimizerweighscostsfor constantanddynamicdata pathdifferently □Automaticallyfavorsplansthat pushworktotheconstantpath S (pid,r) dynamic Reduce(ontid) Sumup partial ranks JoinP andA (pid,tid,p) constant (pid=tid,r=∑k) Match(onpid) (tid,k=r*p) A p 29.07.2013 DIMA–TUBerlin 113 113
PageRank:TwoOptimizerPlans O O Sumup partialranks JoinPandA Sumup partialranks JoinPandA Reduce(ontid) (pid=tid,r=∑k) fifo Match(onpid) (tid,k=r*p) Reduce(ontid) (pid=tid,r=∑k) part./sort(tid) Match(onpid) (tid,k=r*p) probeHashTable (pid) CACHE part./sort(tid) A(pid,tid,p) CACHE buildHashTable(pid) partition(pid) A(pid,tid,p) probeHash- Table(pid) I partition(pid) buildHash- Table(pid) broadcast I fifo p p (pid,r) (pid,r) 29.07.2013 DIMA–TUBerlin 114 114