280 likes | 451 Views
Data Management & Data Warehouses. MIS 320 Kraig Pencil Summer 2014. Game Plan. Introduction Why use a relational database? Database management systems Data warehouses Data mining Data marts. A. Why use a relational database?.
E N D
Data Management & Data Warehouses MIS 320 Kraig Pencil Summer 2014
Game Plan • Introduction • Why use a relational database? • Database management systems • Data warehouses • Data mining • Data marts
A. Why use a relational database? 1. A database sounds great, but why don’t we just store all our data in one big table in an Excel spreadsheet? • Example: Can you foresee any hassles or potential difficulties associated with entering/storing order information in the following Excel table?
A. Why use a relational database? 1. Why don’t we just store our data in one spreadsheet table? (cont.) • Potential problems • May have “redundant” data entry • Potential for data entry errors (different/wrong phone numbers) • Updates can be a hassle/inefficient (e.g., change phone no) • Solution • “Normalize” the data … Break up the table into a set of linked tables in a data base (instead of having one spreadsheet) • See example
Example: Normalized Tables(and the advantages of a database) • Questions: • Any unneeded redundancy? • Is it now efficient to update customer info? • Where is the foreign key?
Example: Non-Normalized Data Table for an Auto Shop (Rainer & Turban, Fig 4.6) Examples of redundancy
B. Database Management Systems 1. What is a “database management system” (DBMS)? • SW that allows one to create, store, organize, manage, and use data • Example of a DBMS? 2. Key components • Data Definition subsystem • Data Manipulation subsystem • Application Generation subsystem • Data Administration subsystem • DBMS engine
DBMS Components Lab Tutorials 1,2 Lab Tutorials 3,5 Lab Tutorials 4,6
B. Database Management Systems 3. Examples of DBMS components in Access Data Definition subsystem • Data dictionary (“Design view” for a table) Data Manipulation subsystem: Move, change, and “ask questions” • View of a table (“Datasheet view” for a table) • Query-by-example (QBE) tool • Structured query language (SQL) Application Generation subsystem: the “front end” • Design of forms and reports Data Administration subsystem • Optimize query performance • Security settings with password
B. Database Management Systems 4. What aspects of data need to be specified? • Lots of aspects!!! • Recall table creation in MS Access (Tutorials 1 & 2) • Common data properties • Data “type” (number, text, date, etc.) • Description • Field size • Required/not required • Etc. • An important reference for a database system: Data dictionary • Stores information about the data in a database
Access Example: Data “type” (number, text, date, etc.) Description Field size Required/not required Information about the “Gender” field is specified in “Field Properties” section
Access Example: Data Manipulation Subsystem(Low Stock Products query) QBE or SQL may be used to prepare a query. Which approach would be easier for most people?
Access Example: Application Generation Subsystem(Employer Information Form)
Access Example: Data Administration(Performance Analysis for a Database)
B. Database Management Systems (cont.) 5. DBMS: Example products • You are very likely to work with – and possibly help develop a database– using one or more of the following: • Small-Midsize DBMS: MicrosoftAccess, dBase, Paradox • Mid-to-Large DBMS: MicrosoftSQL Server, Oracle, DB2, Informix, IMS
C. Data Warehouses • Business problem: • Difficult for larger organizations to analyze organizational data from multiple sources • Solution: Data warehouse • Gather/integrate information from existing operational databases into a “warehouse” • Create “Business Intelligence” system • See next figure
Create a Data Warehouse from Operational Databases From Haag, et al., MIS for the Information Age, 2004
C. Data Warehouses (cont.) 3. Data warehouse features • Designed to support business decision making • Not transactions! • Supports OLAP • On-line Analytical Processing • Crosses functional boundaries of an organization • Can be very large • Note: Warehouse is “read only” • Why? • Can be a significant strategic resource for a company • Can yield a high ROI • Examples • ???
C. Data Warehouses (cont.) • Implementation issues • People may be reluctant to share information • “ETL” process is not easy • Extraction, transformation, load • Expensive
D. Data Mining • Provides a means to extract patterns and relationships from large amount of data (e.g., a data warehouse) • Mining analogy • Sift through raw dirt/rock to find something of value • Large volumes of data are sifted in an attempt to find something worthwhile • Example: market basket analysis • Identify products that may be attractive to a customer • See next slide: Amazon.com buyer suggestions
D. Data mining (cont.) 4. Identify previously unknown patterns • e.g., What are characteristics of customers likely to default on a bank loan?“Target knows before it shows” • How Target Figured Out A Teen Girl Was Pregnant Before Her Father Did • How Companies Learn Your Secrets: NYTimes • e.g., Suppose you discovered that beer and diapers*were often found in the same purchase? • “Market basket analysis” • What could you do with that information to improve sales of one, the other or both? *This is a common example, not an actual case.
E. Data Marts 5. Data marts • Warehouses can be overwhelming/difficult to implement … Some organizations create “data marts” • A subset of a data warehouse • Simpler, scaled-down version • Focuses on/Integrates a specific area (e.g., Sales department) • Provides useful decision making tools Haggen photo from: www.callhugh.com/ ferndale.php MiniMart photo from: http://www.ae.gatech.edu/research/controls/pictures/f020801_gtar/Mini%20Mart.JPG
Data Mining – Business Intelligence • A few videos to watch and think about … • http://www.youtube.com/user/SASsoftware?v=C14GVhNt7Do&feature=pyv&ad=4782573666&kw=CRM • http://www.youtube.com/user/ibm?#p/c/13/fFdITHMuy2w • http://www.youtube.com/user/SASsoftware?v=2677nWVNg9M&feature=pyv&ad=4782551166&kw=business%20analytics • http://www.youtube.com/watch?v=El_lSd6G5WU • http://www.youtube.com/watch?v=uP89kaDU40c • http://www.youtube.com/user/SASsoftware?v=C14GVhNt7Do&feature=pyv&ad=4782573666&kw=CRM#p/u/35/ecqk0JUKvAI
Big Data • Big data[1][2] is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. (Wikipedia) • (Image)
Global Big Data: + 2.5 exobytes/day • The world's technological per-capita capacity to store information has roughly doubled every 40 months since the 1980s[15] • As of 2012, every day 2.5 quintillion (2.5×1018) bytes of data were created.[16] • (Wikipedia) • (Image)
Big Data • The next frontier in data? • http://www.eweek.com/c/a/Data-Storage/Big-Data-Analytics-Is-Just-Starting-to-Reach-Its-Potential-10-Reasons-Why-457684/?kc=EWKNLEAU07102012STR1 • Some terms: • Hadoop (distributed file organization) • Distributed databases and server clusters • Cassandra (No only SQL DBMS) • MapReduce (breaking computation into smaller pieced, then combining the results of each computation)