290 likes | 391 Views
Components and Architecture. CS 543 – Data Warehousing. Architecture. What are the key components of a data warehouse? Architecture is the structure that binds the components into an integrated whole DW architecture provides the overall framework for developing and deploying DW solutions.
E N D
Components and Architecture CS 543 – Data Warehousing
Architecture • What are the key components of a data warehouse? • Architecture is the structure that binds the components into an integrated whole • DW architecture provides the overall framework for developing and deploying DW solutions CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS
Architectural Components CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS
Distinguishing Characteristics • Different objectives and scope • Data content • Complex analysis and quick response • Flexible and dynamic • Metadata driven CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS
Architecture Supporting Flow of Data CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS
Technical Architecture • The technical architecture of a DW is the complete set of functions and services provided within its components • Functions • Services • Rules and procedures • Data stores • Tools are the means to implement an architecture • Architecture comes first, then the tools; select the appropriate tools based on the architecture CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS
Data Acquisition (1) • This component includes • Extraction • Transfer into staging area • Preparation for loading (transformation, cleansing, and integration) CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS
Data Acquisition (2) CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS
Data Acquisition – Functions and Services (1) • Data extraction • Select data sources and determine the types of filters to apply to individual sources • Generate automatic extract files from operational systems using replication and other techniques • Create intermediary files to store selected data to be merged later • Transport extracted files from multiple platforms • Provide automated job control services for creating extract files • Reformat input from outside sources, departmental files, databases, and spreadsheets • Resolve inconsistencies for common data elements from multiple sources • Generate common application code for data extraction CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS
Data Acquisition – Functions and Services (2) • Data transformation • Map input data to data for DW repository • Clean data, remove duplicates, merge/purge • De-normalize extracted data structures as required by the dimensional model of the DW • Convert data types • Calculate and derive attribute values • Check for referential integrity • Aggregate data as needed • Resolve missing values • Consolidate and integrate data CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS
Data Acquisition – Functions and Services (3) • Data staging • Provide backup and recovery for staging area repository • Sort and merge files • Create files as input to make changes to dimension tables • If staging area storage is a relational database, create and populate database CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS
Data Storage • This architectural component covers the process of loading the prepared data from the data staging area into the data warehouse repository CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS
Data Storage – Functions and Services • Load data for full refreshes of DW tables • Perform incremental loads at regular prescribed intervals • Support loading into multiple tables at the detailed and summarized levels • Optimize the loading process • Provide automated job control services for loading the data warehouse • Provide backup and recovery for the DW database • Provide security • Monitor and fine-tune the database • Periodically archive data from the database according to preset conditions CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS
Information Delivery (1) • This architectural component spans a broad spectrum of many different methods of making information available to the users of the DW • To the users, information delivery is the DW; it is the front-end through which the users retrieve information from the DW • Information • Online queries and interactive analyses • Regular and ad-hoc reports • Specialized applications (e.g. executive information system) • Data mining CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS
Information Delivery (2) CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS
Information Delivery – Functions and Services • Provide security to control information access • Monitor user access to improve service and for future enhancements • Allow users to browse data warehouse content • Simplify access by hiding internal complexities of data storage from users • Automatically reformat queries for optimal execution • Enable queries to be aware of aggregate tables for faster results • Govern queries and control runaway queries • Provide self-service report generation for users • Store result sets for queries and reports for future use • Provide multiple levels of data granularity • Provide event triggers to monitor data loading • Make provision for the users to perform complex analysis • Enable data feeds to downstream, specialized data support systems such as EIS and data mining CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS
Infrastructure Supporting Architecture • The architecture defines the functions and services; the infrastructure defines the elements to support the architecture • Infrastructure is the foundation supporting the architecture • Hardware servers • OSs • Data management systems • Networking elements • Supporting tools and applications • People • Procedures CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS
Operational Infrastructure • Operational infrastructure includes • People • Procedures • Training • Management software • Operational infrastructure are the people and procedures that keep the DW functioning, and not those who develop the DW CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS
Physical Infrastructure (1) CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS
Physical Infrastructure (2) • Physical infrastructure includes • Computing hardware (e.g. server) • OS and utilities • Networking hardware and software • Software tools • Decisions about the physical infrastructure are critical for a DW. Two principles • Leverage as much of the existing physical infrastructure • Keep the infrastructure as modular as possible CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS
Hardware and Operating System • Hardware • Scalability • Support • Vendor reference • Vendor stability • Operating system • Compatibility • Scalability • Security • Reliability • Availability • Preemptive multitasking • Multi-threaded approach • Memory protection CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS
Single Platform Option • Simplest option, where all functions and services are performed by a single computing platform • Typically used by small to medium sized companies who have mainframes or large Unix servers already in use with capacity to spare • Some shortcomings of using mainframes • Stretched to capacity • Non availability of tools • Multiple legacy platforms • Company’s migration policy CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS
Hybrid Option • Most companies opt for the hybrid option where multiple platforms are used for data warehousing (data acquisition, data storage, information delivery) CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS
Data Extraction • Data extraction • Best performed on each source system’s own computing platform • Initial reformatting and merging • Best performed on each source system’s own computing platform • Extract files are reformatted and merged into a smaller number of files performing verification against the source system • Initial data cleansing • Also performed on source system platform • Transformation and consolidation • Performed on the staging area platform • Validation and final quality check • Performed on the staging area platform • Creation of load images • Performed on the staging area platform CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS
Options for the Data Staging Area • In one of the legacy platforms • On the data storage platform • On a separate optional platform • You can optimize the platform for complex transformations and cleaning • Install specialized tools for transformations and cleaning • Keep track of entire data content in the staging area CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS
Data Movement CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS
Client/Server Architecture (1) CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS
Client/Server Architecture (2) • Application server (middle tier) • To run middleware and establish connectivity • To execute management and control software • To handle data access from the Web • To manage metadata • For authentication • As front end • For managing and running standard reports • For sophisticated query management • For OLAP applications CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS
Maturing of the Infrastructure CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS