330 likes | 340 Views
Learn about the benefits and capabilities of using the web for data warehousing, including managing clickstreams and bringing existing data warehouses online. Discover how web-based data warehousing can enhance customer insights and drive business growth.
E N D
Warehousing on the Web Webhouse
Why Utilize the Web? • What is the data Webhouse • Managing clickstreams • WWW today • ROI • DSS
Data Webhouse • Defined by Ralph Kimball • Two distict focuses • Bringing the web to the warehouse • Clickstream data as a source of information • Bringing existing data warehouses to web • Fully distributed environment
Required Capabilities • Capture clickstream logs and convert to tables for analysis • Merge customer demographic and account info with above • Interpret customer paths in website • Identify abandoned sessions • Use dw to drive customer responses appearing on your website • DW querying and reporting available through web browsers • Attach multimedia to DW • DW security
Architecture – Web to Warehouse • Beyond comprehensive snapshot of business on real-time basis also want knowledge of customer behavior • Extended design factors • Timliness – real-time • Data volume – no upper limit • Response time – less than 10 seconds
Hot Response Cache • A file server holding complex file objects • As a file server it is an I/O engine (bandwidth) • Must hold objects which will be requested • Security responsibility of requesting server • Extension of original operational data store (ODS) • Does not physically speed up database creates illusion by storing predictable answers
Who are our users? • Traditional • Power users • need database connectivity • Analysts • want to manipulate existing data • Report viewers • view standardized reports • Web • Our customers • Our business partners • Our employees
Clickstreams • Clickstream not another data source • Distributed nature leads to multiple data sources which require synchronization • Multiple parties • More than a dozen log file formats for capturing clickstream data • Search specification • Basic form of clickstream data stateless • Log shows isolated page retrieval event • Clickstream data anonymous • Todays Promotions • Clickthroughs and referrals as a revenue source
Clickstreams • Clickstream post-processor – receives raw long data from web server and normalizes it into a format which can be combined with application derived data for insertion into dw • Todays Promotions • Clickthroughs and referrals as a revenue source
Why Bring DW to Web? • Primary function of dw to publish information – web good partner • Need distrnuted dw – web provides universal connectivity • Universal front-end – web browser
Web Pushes Data Warehouse • User interface effectiveness measurable • Queries and updates mixed • Speed expected – 10 second rule • Global • 27 X 7 expected • International characters, dates, addresses • Expanded multimedia • Animation, zoomable images, maps, video clips • Need material in digital form • Enterprise information portal will require items to be searchable
Web Pushes Data Warehouse • Mass customization • Dynamically created web pages – XML • Fully distributed • Linking together all the data marts • Security and Privacy • Publish only to those who need to know • User profiles and access profiles defined in one place • Full-time expert security person
Second Generation User Interface Guidelines • Near- instantaneous performance • Website Design • Design for lowest common denominator • Measure page performance on a continuous basis • Paint navigation buttons immediately • Disclose content progressively • Implement page caching • Cache data, reports • Improve web server bandwidth • Improve server throughput
Second Generation User Interface Guidelines • Data Webhouse design • Adapt all web design responses • Select appropriate DBMS software – dimensional models, OLAP • Use indexes, aggregations • Partition files • Increase RAM • Use parallel processing
Meet User Expectations • Website design • Site navigation choices • Help choices • Communication with various groups – response must be assured • Headlines serious and define content • Indicate off-screen material • Survey customer needs and wants
Meet User Expectations • Data Webhouse design • Report library • Folder of previous queries, reports … • Dimension browser – viewing dimension can assist report creation • Business metadata interface –understand organizations data assets
Streamline Process • Business processes designed from ground up to work seamlessly on web • Website design • Reengineer to streamline process and make navigation easier, uniform interfaces • Remove barriers to reaching page • Minimize clicks and new windows • Allow interruption and return
Streamline Process • Data Webhouse design • Build an explicit value chain for reporting and analysis around the application suite using conformed dimensions and facts • Drill across functions • Single user interface for reporting against all parts of business • Master report library and FAQs • Single login and single console access to webhouse
Reassure Users • Website Design • Map of processes • Data Webhouse design • Provide status and lineage of current data • Provide status of running reports • Active notification • Allow for entry of NA if data not available • Time stamped dimensions • Time stamped reports
Allow Problem Resolution • Website design • Allow backtracking, rollback, play forward • Keep old transactions • Easy error reporting • Acknowledge, track and follow-up all user inputs, show wait time • Assist searching • Data Webhouse design • Provide adequate end user support • Show aggregates in use and available • Show system load and percent completed
Build Trust • Clearly state and observe website’s policies for using customer’s identity • Website design • Do not abuse privacy • Link to privacy statement • Use friendly pictures of people • Distinguish between ad content and editorial content
Build Trust • Data Webhouse design • Two-factor security • What you know – password • What you posses – token • Track changes in employee and contractor status • Create and enforce roles for employees, contractors and customers • Manage webhouse security directly
Provide Communication Hooks • Website design • Provide useful links to others – internal and external • Remove links that invalidate the “back” button • Use copyable URLs • Use URL as medium of distribution
Advantages of Web Today 1998 2000 • Immediate worldwide access • Centralized management - Decentralized • Thin client • Multi-platform (client and server) - Distributed • Little or no software distribution - Downloads A+
Disadvantages of Web Today 1998 2000 • Immature technology - Teenager • Security - Solutions • Speed restricted by bandwidth - data and logic must both travel across internet • Design limited to least common denominator or access restricted to specific browser
Vulnerabilities • Physical assets • Information assets • theft • modification • Software assets • Ability to conduct business
Application Application • Browser • Applets/ActiveX • Email • Spreadsheet • Word-processing Web Architecture Thin Client Communication layer (network/internet) Internet Server Analysis/ Graphics Report SQL statistics Writer Query OLAP Server Multidimensional Summary/Alternative Database Relational Tables Database Servers Data Warehouse - Relational Database
Business Management through Information • Analysis of historical records • order processing, inventory levels, shipments, receivables, customer history, etc. • Goals include: • Measures of efficiency • Anticipate changes (planning and forecasting) • Make adjustments • Integration of model and control function
Rule-Based Management • Create Strategic rules • IF market demand increases THEN implement marketing campaign A3 • IF profit margin drops below value X THEN adjust overhead by … • Must not forget alert rules • If unanticipated condition, then notify CFO • Must not be too reactive • would cause thrashing
OLDM Decision Process • Simultaneous capture of: • Decision support information • Surveyed customer on-line in exchange for an additional discount • with business function inputs • Immediate computation or estimation of secondary information • based on planning and forecasting rules • Decision support information is: • available on-line • ready to use “as is” Management Defined !
OLDM Decision Process • Derived data becomes control information • Automation of analysis and decision support • immediately available to management • Problems documented on-line • Classes of problem and corrective action codified • problem recognition • decision rules
OLDM Decision Process • Requires four types of information • Characteristics which identify a class of problem • Corrective action ( management responses by problem class) • Rules to implement actions • Record of result
Potential of OLDM • Better managed business • knowledge asset capture and retention • consistency across enterprise • flexible, highly responsive • Close loop with customer • event and market driven but controlled • Direct customer interaction • via web, telephone, remote connection • Improved systems capacity planning and system management • Re-alignment of business and IT