480 likes | 602 Views
Data Warehousing:. Jeffrey T. Edgell Assistant Executive Director ManTech Advanced Systems International. Decision Support.
E N D
Data Warehousing: Jeffrey T. Edgell Assistant Executive Director ManTech Advanced Systems International
Decision Support • In order to make correct decisions, accurate, meaningful information about business environments, external issues, and internal workings must be available in a timely fashion.
A Need For New Technology • Government and industrial entities have been collecting data in electronic format since the 1960s. • Today, organizations collect millions of pieces of information about every aspect of their operation on a daily basis. • Data is obtained from multiple disparate sources.
A Need For New Technology • Often information is replicated, leading to confusion. • Related data is often retained in seemingly heterogeneous and incompatible platforms. • Common data attributes are represented in nonstandard formats and naming constructs across systems.
A Need for New Technology • Most systems are built for data collection (transaction based). • Designed to support On-Line Transaction Processing (OLTP). • Designed to support day-to-day business operations. • Very specific applications built to support interaction with the data.
A Need for New Technology • Perform best when handling small specific volumes of data. • Does not accept information from dissimilar sources readily. • Are not constructed to handle analysis of large amounts of data efficiently.
A Need for New Technology • Capable of answering questions of a specific nature and time frame. • How many items do I have in stock today? • How many tickets were sold on a specific date? • What is the current price of an item?
A Need for New Technology • Transaction based systems experience great difficulty in answering analytical and decision support questions. • Analysis takes a long time, interfering with: • transaction performance • daily operations • The nature of the data is dynamic and dispersed.
A Need for New Technology • Most organizations have created a “spider web” of systems and data sources.
A Need for New Technology • All of this has created “data overload” and “data confusion”. • What do I do with all of this data? • What does it mean? • Do I really need this data? • I am overwhelmed with the amount of data I am confronted with. • I cannot make a timely decision (too much data from too many sources).
Data Warehousing 101 Data warehousing is: • A large historical database designed to accept key analytical data from multiple anddisparate sources that manage the day-to-day management of enterprise data. Furthermore, the role of the warehouse is to transform transaction data into corporate information. The warehouse is provided in a read-only fashion to a user.
Data Warehousing 101 • A data warehouse will provide: • The ability to ask business analysis questionsin a real-time, iterative fashion,obtaining decision support information readily and quickly.
Data Warehousing 101 • A data warehouse is not: • A repository for all corporate data. • A data warehouse will not: • Single handedly solve all of the problems associated to an enterprise.
Data Warehousing 101 Key components include the following: • data model • data storage architecture (relational, proprietary) • data access/replication/transport • data transformation and scrubbing • staging and publication • metadata • warehouse hardware and software
Information Directory Repository Data Legacy Data Transformation Data Warehouse Data Warehouse Management Layer Legacy Data External Data Source Data Warehousing 101 Data Warehouse Architecture
1993 Qtrly Data 1996 Month Data 1994 Qtrly Data 1997 Month Data 1995 Qtrly Data Aggregation Data Warehousing 101 • How a warehouse deals with aging data: 1998 Detail Data
Data Warehousing 101 Model concepts: • Fact table(s) • A table containing multiple measurable descriptors relating to a specific area of business • Each fact can be viewed, calculated, and aggregated against various defining areas of the business (time, geography, customer)
Data Warehousing 101 Model concepts: • Dimension Table(s) • Retains information (product description, geography description, customer description) that is descriptive and remains moderately constant over time
Data Warehousing 101 Data Warehouse Modeling • Special modeling techniques must be applied to provide rapid response of queries on large volumes of data. • OLTP systems are built with update operations in mind, resulting in normalization and greatly reduced browse performance.
Data Warehousing 101 Common data model techniques are as follows: • star schema • snowflake • fact constellation • relational
Dimensions Dimensions SALES Data Warehousing 101 Sample Star Schema Model TIME GEOGRAPHY STORE CUSTOMER Sales Facts
Year North Qtr South TIME Month GEOGRAPHY East West Dimensions Dimensions SALES East Region STORE CUSTOMER Sales Facts West Region Data Warehousing 101 Sample Snowflake Model
TIME GEOGRAPHY Regional Sales District Sales Store Sales Dimensions Dimensions STORE CUSTOMER Data Warehousing 101 Sample Fact Constellation Model
Data Marting 101 Data marting is: • Afunctional segmentof an enterprise restricted for purposes of security, locality, performance, or business necessity using modeling and information deliverytechniques identical to data warehousing.
Data Marting 101 • Why build a data mart? • Allows an organization to visualize the large but focus on the small and attainable. • Provides a platform for rapid delivery of an operational system. • Minimizes risk. • A corporate warehouse can be constructed from the union of the enterprise data marts.
Data From Data Transaction Sources Warehouse Update From the Warehouse The data warehouse populates the data marts. Financial Logistics Contract Data Mart DataMart Data Mart Data Marting 101
Data Warehouse Update From the Data Marts The data marts populate Financial Logistics Contract the data warehouse. Data Mart Data Mart Data Mart Data From Transaction Sources Data Marting 101
Virtual Data Warehouse Abstract Data Warehouse Access Layer Data is moved through the abstract layer on demand. The data warehouse layer manages the data marts Financial Logistics Contract as a warehouse. Data Mart Data Mart Data Mart Data From Transaction Sources Data Marting 101
OLAP 101 • OLAP is a powerful graphics-oriented tool used to access the data warehouse • OLAP supports • Business analysis queries • Data visualization • Trend analysis • Scenario analysis • User defined queries
OLAP 101 • Drill Down • Move from summary to detail • Roll Up • Move from detail to summary • Slice and Dice • Look at a specific interest of the business
OLAP 101 • Pivot and Rotate • Looking at data from varying perspectives • Drill Through • Move to a near transaction level of detail
OLAP 101 • The flavors of OLAP • Multidimensional On-Line Analytical Processing (MOLAP) • Relational On-Line Analytical Processing (ROLAP) • Hybrid On-Line Analytical Processing (HOLAP)
OLAP 101 • MOLAP • Produces a hypercube • Pre-aggregated and pre-calculated • Rapid response times • Limited in the amount of data that can be managed
OLAP 101 • ROLAP • Data remains in a relational format • Some degree of aggregation • Slower response times • Scales to large amounts of data
OLAP 101 • HOLAP • Can manage data both as ROLAP and MOLAP • Currently evolving • MOLAP vendors are finding it easier to move into the HOLAP market space
Data Mining 101 • As defined by the Gartner Group in 1995, data mining is: • “…the process of discovering meaningful new correlations, patterns, and trends by sifting through large amounts of data stored in a repository, using pattern recognition technologies and statistical and mathematical techniques.”
Data Mining 101 • Data mining requires an analyst who is familiar with the domain to appropriately model scenarios. • Data mining assists analysts in uncovering nontrivial data relationships. • Analysis must be conducted to determine the meanings of these newly identified relationships.
Why Use a Data Warehouse ? • Data warehousing is a must for anyone who uses multiple data sources to make decisions and understand business (trends, forecasting). • Those who do not move to warehousing will not be capable of responding to problems and business conditions, thus falling behind the competition.
Why Use a Data Warehouse ? • For organizations wanting to minimize costs and maximize productivity, warehousing is a must. • Individuals who spend time gathering data instead of analyzing data require the assistance of a warehouse. • Organizations that collect data but have difficulty determining meanings and impacts need a data warehouse.
Making the Warehouse a Reality • Think big but work small. • Match technology to requirements. • Build for the future (scalability). • Work closely with the users. • Requirements • Rapid Application Development (RAD) • Periodic releases to the user community
Real World Success Stories • Radio Shack • Sales and stocking analysis • Marketing (regionalized mailings) • Wal-Mart • Sales and stock analysis • Trend analysis • Vendor analysis
Real World Success Stories • Naval Surface Warfare Center (NSWC) • Procurement • Supply • Workload • Harris Semiconductor • Yield • Product • Personnel productivity
Real World Success Stories • Defense Logistics Agency (DLA)/ManTech • Trend analysis • Problem identification • Procurement support • Enterprise data analysis
A Few Observations About Data Warehouses • Industry and our experience indicate that: • Warehouses that succeed average an ROI of 400% with the top end being as much as 600% in the first year. • The incremental approach is most successful (build the warehouse a functional area at a time). • The average time to gather requirements, perform a design, and deploy a warehouse increment is six months. • New tools may be required that differ from the transaction environment. • Software oriented toward intelligent analysis and query of the data warehouse • Hardware oriented to support the massive storage requirements and analytical queries
Keys to Success • Do you understand why you are building the warehouse? • Have you identified both technical and business professionals that you will need to build the warehouse? • Do you have a strong management sponsor? • Are you managing the expectations of the users?
System Administration DW Architect Data Architect DW Manager DW Administrator Decision Support Analysts DBA Application Developer Data Cleansing/ Transformation Analyst Business Analyst Management Careers in Data Warehousing