200 likes | 333 Views
The 3 Ts of Hadoop. Wuheng Luo Ankur Gupta 06.2013. MetaScale is a subsidiary of Sears Holdings Corporation. The 3 Ts of Hadoop. 3-Stage Circular Process of Enterprise Big Data. What is the 3Ts?. 3Ts = Transfer , Transform , and Translate A new enterprise big data pattern
E N D
The 3 Ts of Hadoop WuhengLuo Ankur Gupta 06.2013 MetaScale is a subsidiary of Sears Holdings Corporation
The 3 Ts of Hadoop 3-Stage Circular Process of Enterprise Big Data
What is the 3Ts? • 3Ts = Transfer, Transform, and Translate • A new enterprise big data pattern • to bring disruptive change to conventional ETL • To leverage Hadoop for streamlining data processes • To move toward real-time analytics
The 3Ts Goal To simplify enterprise data processing, reduce latency to turn enterprise data from raw form to products of discovery so as to better support business decisions.
The 3Ts One Liners Transfer Once the Hadoop system is in place, a mandate is needed to immediately and continuously capture and deliver all enterprise data, from all data sources, through all data systems, to Hadoop, and store the data under HDFS. Transform When source data is in, clean, standardize, and convert the data through dimensional modeling. Data transformation should be performed in-place within Hadoop, without moving the data out again for integration reasons. Translate Finish the data flow cycle by turning analytical data aggregated in Hadoop to data products of business wisdom. Use batch and streaming tools built on top of Hadoop to Interact with data scientists and end users.
Hadoop as Enterprise Data Hub “Data Hub” is not a new concept, but:
TRANSFER Sourcing Data into Hadoop Intent Capture continuously all enterprise data at earliest touch points possible, deliver the data from all sources, through all source data systems, to Hadoop, and store the data under HDFS.
TRANSFER Motivation To gain distinctive competing capability, enterprises need to build an integrated data infrastructure as the foundation for big data analytics. Use Hadoop as THE centralized enterprise data repository, and make it the grand destination for all enterprise source data.
TRANSFER (3 Ts’) Transfer vs. (ETL’s) Extract
TRANSFER Consequences
TRANSFER • Implementation • Always do a data gap analysis first • Fork the ingestion in both batch and streaming if needed • Have a delivery plan for the data feed • Synchronize data changes between source system and Hadoop
TRANSFORM Integrating Data within Hadoop Intent Keep the data flow beyond the ingest phase by transforming the data from dirty to clean, from raw to standardized, and from transactional to analytical, all within Hadoop.
TRANSFORM Motivation As the latency or speed from raw data to business insight becomes the focal point of enterprise data analytics, use Hadoop as data integration platform to perform in-place data transformation.
TRANSFORM • Implementation • Partition enterprise-wide standardized data and job-specific analytical data in HDFS, and retain history. • Use dimensional modeling to transform and standardize, make dimensional data as the atomic unit of enterprise data. • Identify all enterprise data entities, and add finest grain attributes to each entity as dimensional data. • Take a bottom-up approach, also think about data usage across the enterprise, not specific task bound.
TRANSFORM (3 Ts’) Transform vs. (ETL’s) Transform
TRANSLATE Making Data Products out of Hadoop Intent Turn analytical data into data products of business wisdom using home-made or commercial tools of analytics built on top of Hadoop. Business decisions supported by data products will help generate more new data, thus a new round of enterprise data flow cycle…
TRANSLATE Motivation Low-latency big data analytics requires right platform/tools Use Hadoop as the platform of choice for enterprise data analytics because of its openness and flexibility Choose analytical tools that are flexible, agile, interactive and user friendly
TRANSLATE • Implementation • Big data analytics takes a team effort • Include statisticians, data scientists and developers • Utilize both generic and Hadoop specific technologies • Consider both batch and streaming based approaches • Provide access to pre-computed view and on-the-fly query • Use both home-made and Hadoop-based commercial tools • Use web-based, mobile friendly UI • Visualize
The 3 Ts of Hadoop Continuous Iteration of Enterprise Data Flow
Thank You! MetaScale is a subsidiary of Sears Holdings Corporation For further information email: visit: contact@metascale.com www.metascale.com