520 likes | 985 Views
ETL. Extract Transform Load. Introduction of ETL. ETL is used to migrate data from one database to another, to form data marts and data warehouses and also to convert database from one format or type to another. Process of ETL. Extract Process of reading data from a database Transform
E N D
ETL ExtractTransformLoad
IntroductionofETL • ETL is usedtomigratedatafromonedatabasetoanother,toformdatamartsanddatawarehousesandalsotoconvertdatabasefromoneformatortypetoanother
Process of ETL • Extract • Processofreadingdatafromadatabase • Transform • Processofconvertingtheextracteddatafromitspreviousformintotheformitneedtobe • Byusingrulesorlookuptablesorbycombingthedatawithotherdata • Load • Processofwritingthedataintothetargetdatabase
Operations of Transform • Selectingonlycertaincolumnstoload • Translatingcodedvalues • Encodingfree-formvalues • Sorting • Joiningdatafrommultiplesources • Aggregation • Splittingacolumnintomultiplecolumns • Derivinganewcalculatedvalues • …
Pentaho Data Integration • Pentahodataintegration(PDI, alsocalledKettle) isforETLprocesses. • Download:http://sourceforge.net/projects/pentaho/files/DataIntegration/ • Two partsofPDI • Transformation: transformationistheprocessofETL • Job:jobisusedtoruntransformation
Transformation • Definition Hop Note Step
MainComponents Input Output Transformation Allcomponents
Job • Definition Hop Note JobEntry
ComponentsofPDI • Spoon • GUItooltodesigntheETLprocesstransformations. • Creatingjobswhichautomatethedatabaseupdateprocess • Performingthetypicaldataflowfunctionsincluding:reading,validating,refining,transforming,writingdata • Pan • ApplicationtorundatatransformationsdesignedinSpoon • Kitchen • Applicationhelpsexecutethejobsinabatchmode,usuallyusingaschedule • Carte • AwebServerwhichallowsremotemonitoringoftherunningPDIETLprocessesthroughawebbrowser
FeatureofPDI • SimpleVisualDesigner • GraphicETLtool • Dynamictransformations • Integrateddebuggerfortestingandtuningjobexecution
FeatureofPDI • Drag and Drop Integration • Richlibraryofpre-buildcomponentstoaccess • IntegrationwithZero-CodingRequired • PowerfulAdministrationandManagement • DataProfilingandDataQuality • Identifydatathatfailstocomplywithbusinessrulesandstandards • Managerdataqualitywithpartnerssuchashumaninterface
FeatureofPDI • Support for Any Big DataSource