180 likes | 521 Views
An overview of Data Warehousing and OLAP Technology. Presented By Manish Desai. Introduction What is data warehouse ? Explanation of definition Data warehouse Vs. Operational Database Data warehouse architecture Back end tools Conceptual model Database design Warehouse servers
E N D
An overview of Data Warehousing and OLAP Technology Presented By Manish Desai
Introduction • What is data warehouse ? • Explanation of definition • Data warehouse Vs. Operational Database • Data warehouse architecture • Back end tools • Conceptual model • Database design • Warehouse servers • Index structures • Meta data • Conclusion • References
Introduction • Essential elements of decision support • Enables The Knowledge Worker to make better and faster decisions • Used in many industries like: • Manufacturing (for order shipment) • Retail (for inventory management) • Financial Services (claims and risk analysis) • Every major database vendor offers product in this area
What is Data Warehouse ? • A data warehouse is a “subject-oriented, integrated, time-varying, non-volatile collection of data that is used primarily in organizational decision making” • Typically maintained separately from operational databases
Explanation of definition • Subject-Oriented: • Designed around subject such as customer, vendor, product and activity • Does not includes data that are not needed for Decision support system (DSS) • Integrated: • Most important feature • Consistent naming convention, measurement of variables and so forth • The data should be stored in single globally acceptable fashion
Explanation (continues…) • Time Varying: • All data in the warehouse should be accurate as of some moment in time • Data stored over a long time horizon (5 –10 years) • Key structure contains element of time (implicitly or explicitly) • Data once correctly recorded cant be updated • Non Volatile: • No Update of data allowed • only loading and access of data operations
Architecture • Data sourcing,migration,cleanup tools • Meta data repository • Data marts • Data query, reporting, analysis and mining tools • Data warehouse administration and management
Architecture (continues…) • Distributed Data warehouse • Load balancing, scalability,higher availability • Meta data replicated and centrally administrated • Too expansive • Data marts • Departmental subset focused on selected subjects • example: marketing department includes customer, sales and product tabels • Has own repository and administration • May lead to complex integration problems if not designed properly
Back end tools and Utilities • Data cleaning, loading, refreshing tools • Cleaning • Multiple source, possibility of errors • Example: replace string sex by gender • Loading • Building indices, sorting and making access paths • Large amount of data • Incremental loading • Only updated tuples are inserted ,Process hard to manage • Refresh • Propagating updates • When to refresh ? • Set by administrator depending on user needs and traffic
Conceptual Model and front end tools • Multi dimensional view • Dimensions together uniquely determine the measure • Example: Sales can be represented as city,product, data • Each dimension is described by set of attribute • Example: product consist of • Category of product • Industry of product • Year of introduction • Front end tools • Multi dimensional spreadsheet • Supports Pivoting-reorientation • Roll_up - summarized data • Drill_down - go from high level to low level summary
Database design • Two ways to represent Multi dimensional model • Star schema • Database consist of single fact table and single table for each dimension • Each tuples in fact table consist of pointer to each of dimension • Snowflake schema • Refinement over star schema • Dimensional hierarchy is explicitly represented by normalizing dimension tables
Warehouse Servers • Specialized SQL servers • Provides advanced query language and query processing support for SQL queries over star and snowflake schemas • Example: Redbrick • ROLAP • Between relational back end and client front end tools • Extend traditional relational servers to support multidimensional queries • Example: Microstratergy • MOLAP • Multidimensional storage engine • Direct mapping • Example: Essbase from Arbor Inc.
Index structures • Bit map indices • Use single bit to indicate specific value of attribute • Example: instead of storing eight characters to record “engineer” as skill of employee use single bit id# Name Skill 1000 John 1 • Join indices • Maintains the relationship between foreign key with its matching primary keys
Meta data and warehouse management • Its data about data • Used for building, maintain, managing and using data warehouse • Administrative meta data • Information about setting up and using warehouse • Business meta data • Business terms and definition • Operational meta data • Information collected during operation of warehouse
Conclusion • Data warehouse is the technology for the future. • data warehouse enables knowledge worker to make faster and better decisions
References • Inmon W. H.,Building the data warehouse • www.olapcouncil.org • www.pwp.starnetinc.com • www.arborsoft.com • Kimball, R. The data warehouse toolkit.