470 likes | 552 Views
CHAPTER SIX. Databases and Data Warehouses. Information Granularity. Refers to the level of detail of information Detailed (POS transaction) Course (Global sales totals). Transactional vs. Analytical Information. Transactional information comes from a business process A bank deposit
E N D
CHAPTER SIX Databases and Data Warehouses
Information Granularity • Refers to the level of detail of information • Detailed (POS transaction) • Course (Global sales totals)
Transactional vs. Analytical Information • Transactional information comes from a business process • A bank deposit • A credit card charge • Analytical information uses transactional data for the purposes of decision making • Account balance trends • Using credit card history to detect fraud
Information Dimensions • Information timeliness • Obsolete information is useless • Today’s information needs to be provided in real time or near real time • Information quality • Wrong information is useless • Redundant information can be the cause of errors • Information must be complete • Data inconsistency and data integrity
Database Management • Characteristics • Complex • Databases often spread across multiple servers • Databases often spread across multiple physical disks • Fault tolerance is critical • Databases may be distributed
Database Vendors • The industry has consolidated • IBM • DB2 Universal • Oracle • Microsoft • SQL Server • Access • Sun (MySQL) • Is now Oracle
Database Performance • Transaction Processing Performance Council provides standard benchmarks • TPC-C – Online transaction processing • TPC-E – Online brokerage transactions • TPC-H – Ad-hoc decision support • TPC-W – Web / E-commerce
Database Performance (TPC-C) • Multiple transaction types • Independent of software and hardware • Scalable • Basis is online transaction processing (OLTP)
1960s Data Management • These are legacy systems • Batch processing • Characterized by traditional file processing • Data processing was sequential • Not possible to directly locate a particular file record • Data dependent on the programs that used the data • Program data dependence
1970s Data Management • Batch processing gives way to on line transaction processing • Files stored on disk rather than tape • Any record can be located in the same amount of time • Technologies • Indexed Sequential Access Method (ISAM) • Virtual Sequential Access Method (VSAM) • Direct Access files • Use a hashing function to derive record keys
1980s Data Management • Databases are becoming commonplace • Personal computer databases are evolving • DBase • R-Base
1990s Data Management • Huge data stores and transaction processing capabilities • Distributed databases • Object-oriented databases • 6 Million+ transactions per second
Realities of a DBMS • Data centric rather than application centric • Can be a repository for all an organization’s data • Databases tend to be centralized • Queries get data from a DBMS • SQL is the standard query language • Report generators create printed and Web-based reports • Applications interface with DBMS
Types of Databases • Database models include: • Hierarchical database model – A tree-based structure • Network database model – Mathematically, a directed graph • Relational database model – stores information in the form of logically related two-dimensional tables • Object-oriented databases
Elements of a Database • Logical view and physical view • Users see and work with the logical view • Physical view is controlled by the database management system itself
Entities and Attributes • Relational databases store information in tables (entities) • Customer / order / product • Tables contain fields (attributes) • Customer name, address
Keys • Each table has a primary key that uniquely identifies each record • Natural keys have some meaning (stock symbol) • Artificial keys have no intrinsic meaning (your R number) • Foreign keys are used to link tables in one-to-many relationships
Advantages of an RDMS (Scalability) • Database can scale to the terabyte or petabyte range • NSA maintains 1.9 trillion telephone call records • Large databases can span several servers and storage devices
Advantages of an RDBMS (Redundancy) • Databases can be configured to write duplicate (redundant) information • Citibank • Journaling and checkpointing are supported
Advantages of an RDBMS (Integrity) • Relational integrity constraints are rules that apply to the relationships between tables • Business integrity constraints enforce business rules • Not really a part of the DBMS itself
Advantages of an RDBMS (Information Security) • A DBMS supports advanced access rights • By table and fields • By time of day • By location • By row information
Data-driven Web Sites • Nearly all transactional Web sites rely on a database • Amazon • Your bank • Any shopping cart application • Ebay or Craig’s List • Facebook and You Tube
Database Integration • Databases often need to be integrated • Because of mergers and acquisitions • Because of organizational changes • We are referring to connections to multiple databases
Data Warehouses (Introduction) • Central source for clean data • May contain internal or external data • Use to spot hidden patterns in data • May be integrated with operational database • Parts of a data warehouse are called datamarts • Data warehouses contain an analytical component
Cleansing Data • Data is often obtained from a myriad of sources • External lists • Internal databases • Other databases • This data must be cleansed and sanitized to remove • Redundancy / errors / etc…
Multidimensional Analysis • Data are often analyzed as 3-dimensional cubes • Cubes are then ‘sliced and diced’ to look at various layers
Database Design (Introduction) • In the systems process, we design before we implement • Requirements specification • Conceptual design • Logical design • Physical Design
Database Design Tools • Unified Modeling Language (UML) • Visio • Rational Rose • Entity relationship diagrams describes relationships between data • Normalization eliminates redundant data
Database Management HR • Database administrators • Data managers • Programmers and systems analysts • Data security
Business Intelligence (Introduction) • Simply put, it’s internal and external data used to support better decision making • It’s challenging to sift through the mountains of data • It requires cross-functional collaboration between systems • More in the next chapter but we use ERP systems to improve business intelligence
Business Intelligence (Industries) • BI applies to all industries • Retail and sales • Understanding procurement and distribution (SCM) / customers (CRM) • Banking • Understand credit worthiness / fraud behavior • Insurance • Forecast claim risk and understand at – risk customers
Business Intelligence (Industries) • Airlines • Routing planes / minimize turnaround time (Southwest) • Marketing • Demographics • Sell based on known customer behavior (Harrah’s) • Amazon
Business Intelligence (Levels) • Operational • Day-to-day operations (building a Dell) • Tactical • Short term (Dell ordering supplies) • Strategic • Long term organizational goals • The systems that provide BI typically do so at all levels
BI and Latency • From the time of acquisition, how long does it take to analyze (analysis latency) • Time to make a decision based on the analysis • E-transactions significantly reduce latency
Data Mining (Introduction) • Data gets mined (analyzed) from data contained in a data warehouse or data mart • Specialized tools are used to analyze data for ‘interesting nuggets’ • Ways to mine • Drill down (general to specific) • Drill up (specific to general)
Data Mining (Clustering) • Cluster analysis groups data by trait or traits • Examples • Don’t drink the water in Fallon • Segment customers by zip codes
Data Mining (Association) • Answers the question “What traits are associated with other traits” • When I stay at Harrah’s, • I gamble • I eat at the Sage room • When I stay in Vegas, • I gamble more
Data Mining (Statistical Analysis) • It’s basic statistics • Analysis of variance • Correlation coefficients • Etc…
BI Benefits • We can understand what’s happening inside and outside a department • Sales knows about product inventory levels and production schedules • Production knows about sales and sales forecasts • Finance knows about the sales forecasts too • This information is provided in near real time
Quantifying BI • Some benefits can be clearly quantified • Costs went down • Productivity increased • Inventory levels were optimized 10% • Some are indirectly quantified • Some benefits are intangible • Sometimes, we get unexpected results