210 likes | 327 Views
Basic Concepts of Datawarehousing An Overview. Prasanth Gurram. How to answer these Business Queries?. How did my revenue improve in the past 5 years?. What are the slow movers in my product line?. Which channel costs me more and pays less?. Which of my Sales Agents
E N D
Basic Concepts of Datawarehousing An Overview Prasanth Gurram
How to answer these Business Queries? How did my revenue improve in the past 5 years? What are the slow movers in my product line? Which channel costs me more and pays less? Which of my Sales Agents are doing better? Strategic Planning / Budgeting Currency Risk, Interest Rate Risk, Liquidity Risk What is Defaulter’s Profile? Who are my profitable customers? What is the sales distribution region wise?
DSS Decision Support Systems (DSS) are interactive computer-based systems intended to help decision makers utilize data and models to identify and solve problems and make decisions. Data Warehouse is the foundation of DSS process. It is a Strategy and a Process for Staging Corporate Data. Enable users to get a “Business View” of the data Facilitate Data based Decision Making that would drive and improve the Business Discover “Hidden Trends”
Driving Forces for DSS Business Speed Reform Customers RESULT: COMPETITION Technology
Scenario without DSS • Unavailability of Tools and Techniques for acquisition of data from various sources for answering business questions and making decisions, in earlier days • Intensive efforts in data formatting than data analysis • Static and inflexible report generation • Time-lag in accessing the information at central place
OLTP v/s DSS Environment • OLTP Environment • get data IN • large volumes of simple transaction queries • continuous data changes • low processing time • mode of processing • transaction details • data inconsistency • mostly current data • DSS Environment • get information OUT • small number of diverse queries • periodic updates only • high processing time • mode of discovery • subject oriented - summaries • data consistency • historical data is relevant
OLTP v/s DSS Environment • OLTP Environment • high concurrent usage • highly normalized data structure • static applications • automates routines • DSS Environment • low concurrent usage • fewer tables, but more columns per table • dynamic applications • facilitates creativity
Benefits for Business User • Flexible Information Access • High Availability • Ease of Use • Quality & Completeness of Data • Focus on Information Processing • Information Base for Knowledge Discovery
Available line of technology • Advances in dbms technology • Data warehousing • On-line analytical processing • Data mining
Datawarehouse • Data warehouses store large volumes of data which are frequently used by DSS.It is maintained separately from the organization’s operational databases • Data warehouse is subject-oriented,integrated, time-variant, and nonvolatile collection of data • Subject-oriented : Contains information regarding objects of interest for decision support: Sales by region, by product, etc. • Itegrated: Data are typically extracted from multiple, heterogeneous data sources (e.g., from sales, inventory, billing DBs etc.). • Time-variant: Contain historical data, longer horizon than operational system. • Nonvolatile : Data is not (or rarely) directly updated.
Datawarehouse • Is the enabling technology that facilitates improved business decision-making • It’s a process, not a product • A technique for assembling and managing a wide variety of data from multiple operational systems for decision support and analytical processing It’s a journey not a destination...
DW Components S T A G IN G A R E A Extraction Cleansing FS1 DM1 FS2 DM2 N E T W O R K . . . DW ODS Transmission DMn OLAP ANALYSIS FSn Legacy System Data Mart Population Aggregation Summarization Transformation Knowledge Discovery Metadata Layer
Operational Process • Data extraction • Data Cleansing and Transformation • Data Load and refresh • Build derived data and views • Service queries • Administer the warehouse
Extraction Process ( Data Capturing ) Feed System Application Incremental Data Business Transactions Control Metadata • Extract the incremental data from feed system • Store the extracted data into a temporary area • Extract data from multiple, heterogeneous, and external sources Data Capturing Process
Extraction Process (Data Transmission ) Feed System Side Staging area Network Cloud Incremental Data Incremental Data FTP • Transmit the extracted data from Feed system to Staging area • Periodicity of transmission ( daily / weekly ) depends upon the feed system
Cleansing Process Process Metadata Cleansing Rules Good Clean data Raw data (Staging Area) Cleansing Process Control Metadata Bad Cleansing Reports • Detect errors in the data and rectify them when possible • Mark it Good/Bad • Generate the cleansing Reports and mail to the DWA and Feed System representatives
Transformation Process • Process Metadata • Mapping Detail • Transformation Rule Clean Operational Data Transformation Process Control Metadata • Transform the cleaned Operational Data into DSS Data • Load the DSS data into ODS • ODS contains the current DSS data at the lowest level of granularity Operational Data Store
Summarization Process ODS DW Summarization Process Weekly Monthly Yearly Control Metadata • Summarizeand aggregate ODS data and Populate to the Warehouse • Periodicity of Summarization Process depends upon the level of summarization at Warehouse ( weekly, monthly, daily )
Metadata • Data about Data • Used to maintain Datawarehouse • Control data • Static: • Roles, permissions, naming standards, source system names, • Locations, target names, transformation and mapping rules • Dynamic: • Scheduling, scripts, load statistics, space usage, • Backup statistics • Business data • Business rules,Who validates data,Who controls,How they validate
DW Components/Tools • Extraction/transformation/load tool (family of tools including data modeling tool, extraction tool, Meta data repository, and DW administration tools) • Meta data exchange architecture (API used to integrate all components of DW with central Meta data) • Target databases (relational, multidimensional, hybrid) • Data access and analysis tools for end users • Database servers, operating systems, networks