140 likes | 591 Views
Introduction to Data Warehousing Janet Delve. Overview. Review of Relational Databases and Normalisation Introduction to Data Warehousing (Byte article). Books. Inmon, W. H., Building the Data Warehouse, Wiley, 2002 Kimball, R., Ross, M., The Data Warehouse Toolkit, Wiley, 2002
E N D
Introduction to Data WarehousingJanet Delve Data Warehouse Introduction
Overview • Review of Relational Databases and Normalisation • Introduction to Data Warehousing (Byte article) Data Warehouse Introduction
Books • Inmon, W. H., Building the Data Warehouse, Wiley, 2002 • Kimball, R., Ross, M., The Data Warehouse Toolkit, Wiley, 2002 • Barquin, R.C., Edelstein, H.A., Building, Using and Managing the Data Warehouse, Prentice Hall, 1997 • Connolly, T. and Begg, C., Database Systems - A Practical Approach to Design, Implementation and Management, Addison Wesley Data Warehouse Introduction
The Byte article • The article is about DATA MINING (DM) or Knowledge Discovery (KD) and consists of three articles: • The Data Gold Rush which looks at the uses of DM – ‘finding the nugget of gold in the mountain of data slag.’ • DM has an enormous range of applications – customer purchasing, analysing legal decisions, astronomy, discovering patterns in health care. • 2.A Data Miner’s Tools: 3 types of software: • query and reporting tools; • multidimensional analysis tools; • intelligent agents. Data Warehouse Introduction
The Byte article • Data Mining Dynamitelooks at the processes needed to support DM. • Data needs cleansing of unnecessary fields, and storing in convenient form. • Uses DWs and parallel computers. • There are short term gains for businesses whose • ‘advertising will target customers with new precision.’ • Long term gains – new discoveries? Data Warehouse Introduction
The Data Gold Rush • Databases now store vast amounts of data – credit card purchases, point-of-sale (POS) transactions, detailed pictures of galaxies. • Need to turn data into information to guide marketing strategy etc. Wal-Mart uploads 20 million POS transactions every day. • DM describes past trends and predicts future trends. • DM process begins with the business problem. • DM analyst supports analyst and needs to identify their data sources and experience. • DM process diagram – p.84. • Spotlight –POS example – p.84. Many products, Data Warehouse Introduction
The Data Gold Rush • Wide geographical area, 125 weeks. • AT&T labs, knowledge representation tools describe database contents, thus producing meta-data. • DM tools search for patterns - top down or bottom up methods are used for this; • People use DM to increase profitability. • Major corporations involved in DM Research and Development – IBM, Microsoft, General Motors etc. • Products used for DM range from • *OLAP (On-Line Analytical Processing) such as Essbase and • *DSS (Decision Support Systems) Agents to • *DM tools including some AI techniques Data Warehouse Introduction
The Data Gold Rush • *advanced DM tools. • OLAP tools – DM or just ‘fancy query tools’? • Specific mining tools for e.g. finance, health. Sales and Marketing Solutions packs have 70% of work done for client who tailors remaining 30%. • P. 86 – health applications, p. 88 SKICAT, credit-card fraud, lending decisions, stocks. • This article enquired whether one day we would be able to mine the internet – the answer is yes, with Google mega searches, and webhouses full of clickstream data. Amazon.com personalised info. Data Warehouse Introduction
A Data Miner’s Tools • DM reveals new relationships and patterns; • Human’s good at detecting anomalies, DM tools good at detecting patterns; • Intelligent Agents. • These are set up by experts, then need little maintenance. They can work on text. They are good for discovering unsuspected relationships. • Multidimensional Analysis (MDA) tools. Data Warehouse Introduction
A Data Miner’s Tools • These represent data as n-dimensional matrices called hypercubes – OLAP. Good for iterative, interactive, hands-on exploration of data; • Query-and-reporting tools. These need close direction to frame queries. Work on a database structure. Best at asking specific questions to verify hypotheses. Slow system down – need DW. • Delta Airlines – frequent-flier program. Data Warehouse Introduction
Data Mining Dynamite • Data must be ‘cleansed’ – one bank had nominal information stored in 13 different formats in its various databases. Need to eliminate errors due to duplicate data, different formats etc. Fig. Top p. 98. Telephone company. • Once cleaned, data is transported to DW, but thought needs to be given to how data is represented. • DW is a server-based replication of a mainframe’s data. It regularly receives updated info from the mainframe. Data Warehouse Introduction
Data Mining Dynamite • The database on the DW then handles queries from the client machine independently of the mainframe. • DW contains integrated and summary information. • DWs built by specialists – often expensive but worth it. 1,000 ROI. • One-Query Theorem – there may be one query that will revolutionize your business. Hopefully, DM can find this for you, and be self-financing. Data Warehouse Introduction
Data Mining Dynamite • Meta-data describes contents of DW. • DW may be set up for a particular area, but when it is up and running, other users often appear wanting to mine different areas. E.g. UK store chain used DW to analyse customer purchasing, but the chain’s accounting dept. used the DW and discovered loss due to theft of pens and batteries was substantial – p.99. Data Warehouse Introduction
Data Mining Dynamite • Storage – now extremely important – some companies lock away their storage devices. • Storage technologies such as RAID (redundant arrays of independent disks) are becoming increasingly parallel. • Need quicker way to read data off storage disk, otherwise storage becomes bottleneck for DM applications. • Do all companies have a DW? Data Warehouse Introduction