1 / 28

Data Management & Data Warehouses

Data Management & Data Warehouses. MIS 320 Kraig Pencil Summer 2013. Game Plan. Introduction Why use a relational database? Database management systems Data warehouses Data mining Data marts. A. Why use a relational database?.

Download Presentation

Data Management & Data Warehouses

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Management & Data Warehouses MIS 320 Kraig Pencil Summer 2013

  2. Game Plan • Introduction • Why use a relational database? • Database management systems • Data warehouses • Data mining • Data marts

  3. A. Why use a relational database? 1. A database sounds great, but why don’t we just store all our data in one big table in an Excel spreadsheet? • Example: Can you foresee any hassles or potential difficulties associated with entering/storing order information in the following Excel table?

  4. A. Why use a relational database? 1. Why don’t we just store our data in one spreadsheet table? (cont.) • Potential problems • May have “redundant” data entry • Potential for data entry errors (different/wrong phone numbers) • Updates can be a hassle/inefficient (e.g., change phone no) • Solution • “Normalize” the data …  Break up the table into a set of linked tables in a data base (instead of having one spreadsheet) • See example

  5. Example: Normalized Tables(and the advantages of a database) • Questions: • Any unneeded redundancy? • Is it now efficient to update customer info? • Where is the foreign key?

  6. Example: Non-Normalized Data Table for an Auto Shop (Rainer & Turban, Fig 4.6) Examples of redundancy

  7. B. Database Management Systems 1. What is a “database management system” (DBMS)? • SW that allows one to create, store, organize, manage, and use data • Example of a DBMS? 2. Key components • Data Definition subsystem • Data Manipulation subsystem • Application Generation subsystem • Data Administration subsystem • DBMS engine

  8. DBMS Components Lab Tutorials 1,2 Lab Tutorials 3,5 Lab Tutorials 4,6

  9. B. Database Management Systems 3. Examples of DBMS components in Access Data Definition subsystem • Data dictionary (“Design view” for a table) Data Manipulation subsystem: Move, change, and “ask questions” • View of a table (“Datasheet view” for a table) • Query-by-example (QBE) tool • Structured query language (SQL) Application Generation subsystem: the “front end” • Design of forms and reports Data Administration subsystem • Optimize query performance • Security settings with password

  10. B. Database Management Systems 4. What aspects of data need to be specified? • Lots of aspects!!! • Recall table creation in MS Access (Tutorials 1 & 2) • Common data properties • Data “type” (number, text, date, etc.) • Description • Field size • Required/not required • Etc. • An important reference for a database system:  Data dictionary • Stores information about the data in a database

  11. Access Example: Data “type” (number, text, date, etc.) Description Field size Required/not required Information about the “Gender” field is specified in “Field Properties” section

  12. Access Example: Data Manipulation Subsystem(Low Stock Products query) QBE or SQL may be used to prepare a query. Which approach would be easier for most people?

  13. Access Example: Application Generation Subsystem(Employer Information Form)

  14. Access Example: Data Administration(Performance Analysis for a Database)

  15. B. Database Management Systems (cont.) 5. DBMS: Example products • You are very likely to work with – and possibly help develop a database– using one or more of the following: • Small-Midsize DBMS: MicrosoftAccess, dBase, Paradox • Mid-to-Large DBMS: MicrosoftSQL Server, Oracle, DB2, Informix, IMS

  16. C. Data Warehouses • Business problem: • Difficult for larger organizations to analyze organizational data from multiple sources • Solution: Data warehouse • Gather/integrate information from existing operational databases into a “warehouse” •  Create “Business Intelligence” system • See next figure

  17. Create a Data Warehouse from Operational Databases From Haag, et al., MIS for the Information Age, 2004

  18. C. Data Warehouses (cont.) 3. Data warehouse features • Designed to support business decision making • Not transactions! • Supports OLAP • On-line Analytical Processing • Crosses functional boundaries of an organization • Can be very large • Note: Warehouse is “read only” • Why? • Can be a significant strategic resource for a company • Can yield a high ROI • Examples • ???

  19. C. Data Warehouses (cont.) • Implementation issues • People may be reluctant to share information • “ETL” process is not easy • Extraction, transformation, load • Expensive

  20. D. Data Mining • Provides a means to extract patterns and relationships from large amount of data (e.g., a data warehouse) • Mining analogy • Sift through raw dirt/rock to find something of value • Large volumes of data are sifted in an attempt to find something worthwhile • Example: market basket analysis • Identify products that may be attractive to a customer • See next slide: Amazon.com buyer suggestions

  21. Data Mining: Example of pattern discovered via mining

  22. D. Data mining (cont.) 4. Identify previously unknown patterns • e.g., What are characteristics of customers likely to default on a bank loan?“Target knows before it shows” • How Target Figured Out A Teen Girl Was Pregnant Before Her Father Did • How Companies Learn Your Secrets: NYTimes • e.g., Suppose you discovered that beer and diapers*were often found in the same purchase? • “Market basket analysis” • What could you do with that information to improve sales of one, the other or both? *This is a common example, not an actual case.

  23. E. Data Marts 5. Data marts • Warehouses can be overwhelming/difficult to implement …  Some organizations create “data marts” • A subset of a data warehouse • Simpler, scaled-down version • Focuses on/Integrates a specific area (e.g., Sales department) • Provides useful decision making tools Haggen photo from: www.callhugh.com/ ferndale.php MiniMart photo from: http://www.ae.gatech.edu/research/controls/pictures/f020801_gtar/Mini%20Mart.JPG

  24. Data Marts: Subsets of Data Warehouse

  25. Data Mining – Business Intelligence • A few videos to watch and think about … • http://www.youtube.com/user/SASsoftware?v=C14GVhNt7Do&feature=pyv&ad=4782573666&kw=CRM • http://www.youtube.com/user/ibm?#p/c/13/fFdITHMuy2w • http://www.youtube.com/user/SASsoftware?v=2677nWVNg9M&feature=pyv&ad=4782551166&kw=business%20analytics • http://www.youtube.com/watch?v=El_lSd6G5WU • http://www.youtube.com/watch?v=uP89kaDU40c • http://www.youtube.com/user/SASsoftware?v=C14GVhNt7Do&feature=pyv&ad=4782573666&kw=CRM#p/u/35/ecqk0JUKvAI

  26. Big Data • Big data[1][2] is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. (Wikipedia) • (Image)

  27. Global Big Data: + 2.5 exobytes/day • The world's technological per-capita capacity to store information has roughly doubled every 40 months since the 1980s[15] • As of 2012, every day 2.5 quintillion (2.5×1018) bytes of data were created.[16] • (Wikipedia) • (Image)

  28. Big Data • The next frontier in data? • http://www.eweek.com/c/a/Data-Storage/Big-Data-Analytics-Is-Just-Starting-to-Reach-Its-Potential-10-Reasons-Why-457684/?kc=EWKNLEAU07102012STR1 • Some terms: • Hadoop (distributed file organization) • Distributed databases and server clusters • Cassandra (No only SQL DBMS) • MapReduce (breaking computation into smaller pieced, then combining the results of each computation)

More Related