700 likes | 714 Views
Learn the importance of information quality for making sound business decisions, including accuracy, completeness, consistency, uniqueness, and timeliness. Understand the impact of poor information quality on costs and business operations.
E N D
CHAPTER 6 DATABASES AND DATA WAREHOUSES
UNDERSTANDING INFORMATION • Information is everywhere in an organization • Data are raw facts that describe the characteristics of an event • Sales event – date, item number, item description, quantity ordered, customer name, shipping details • Information is data converted into a menaingful and useful context • Sales event – best/worst selling item, best/worst customer • Employees must be able to obtain and analyze the many different levels, formats, and granularities of organizational information to make decisions • Successfully collecting, compiling, sorting, and analyzing information can provide tremendous insight into how an organization is performing
UNDERSTANDING INFORMATION • Information granularity – refers to the extent of detail within the information (fine and detailed or coarse and abstract) • Levels • Formats • Granularities
Information Quality • Business decisions are only as good as the quality of the information used to make the decisions • Characteristics of high quality information include: • Accuracy Are all the values correct? Is the name spelled correctly? Is the dollar amount recorded properly? • Completeness Are any of the values missing? Is the address complete including street, city, state, and zip code? • Consistency Is aggregate or summary information in agreement with detailed information? • Do all total fields equal the true total of the individual fields? • Uniqueness Is each transaction, entity, and event represented only once in the information? • Are there any duplicate customers? • Timeliness Is the information current with respect to the business requirements? Is information updated weekly, daily, or hourly?
Information Quality • Low quality information example
Information Quality • Issue 1: Without a first name it would be impossible to correlate this customer with customers in other databases (Sales, Marketing, Billing, Customer Service) to gain a compete customer view (CRM) • Issue 2: Without a complete street address there is no possible way to communicate with this customer via mail or deliveries. An order might be sitting in a warehouse waiting for the complete address before shipping. The company has spent time and money processing an order that might never be completed • Issue 3: If this is the same customer, the company will waste money sending out two sets of promotions and advertisements to the same customers. It might also send two identical orders and have to incur the expense of one order being returned • Issue 4: This is a good example of where cleaning data is difficult because this may or may not be an error. There are many times when a phone and a fax have the same number. Since the phone number is also in the e-mail address field, chances are that the number is inaccurate • Issue 5: The business would have no way of communicating with this customer via e-mail • Issue 6: The company could determine the area code based on the customer’s address. This takes time, which costs the company money. This is a good reason to ensure that information is entered correctly the first time. All incorrect information needs to be fixed, which costs time and money
Understanding the Costs of Poor Information • The four primary sources of low quality information include: • Online customers intentionally enter inaccurate information to protect their privacy • Information from different systems have different entry standards and formats • Call center operators enter abbreviated or erroneous information by accident or to save time • Third party and external information contains inconsistencies, inaccuracies, and errors
Understanding the Costs of Poor Information • Potential business effects resulting from low quality information include: • Inability to accurately track customers • Difficulty identifying valuable customers • Inability to identify selling opportunities • Marketing to nonexistent customers • Difficulty tracking revenue due to inaccurate invoices • Inability to build strong customer relationships
Understanding the Costs of Poor Information • Poor information could cause the SCM system to order too much inventory from a supplier based on inaccurate orders • Poor information could cause a CRM system to send an expensive promotional item (such as a fruit basket) to the wrong address of one of its best customers • What occurs when you have the inability to build strong customer relationships? • Decreased seller power
Understanding the Benefits of Good Information • High quality information can significantly improve the chances of making a good decision • Good decisions can directly impact an organization's bottom line
DATABASE FUNDAMENTALS • Information is everywhere in an organization • Almost every business decision is based on information • Information is stored in databases • Database – maintains information about various types of objects (inventory), events (transactions), people (employees), and places (warehouses)
DATABASE FUNDAMENTALS • Database models include: • Hierarchical database model – information is organized into a tree-like structure (using parent/child relationships) in such a way that it cannot have too many relationships • Network database model – a flexible way of representing objects and their relationships • Relational database model – stores information in the form of logically related two-dimensional tables
DATABASE ADVANTAGES • Database advantages from a business perspective include • Increased flexibility • Increased scalability and performance • Reduced information redundancy • Increased information integrity (quality) • Increased information security • Spreadsheet limitations • Limited number of rows and columns (Excel - 65,536 rows by 256 columns) Once you use more than 65,536 rows you have outgrown your spreadsheet • Only one users can access the spreadsheet • Users can view all information in the spreadsheet • Users can change all information in the spreadsheet
Increased Flexibility • A well-designed database should: • Handle changes quickly and easily • Provide users with different views • Have only one physical view • Physical view – deals with the physical storage of information on a storage device • Have multiple logical views • Logical view – focuses on how users logically access information
Increased Scalability and Performance • A database must scale to meet increased demand, while maintaining acceptable performance levels • Scalability – refers to how well a system can adapt to increased demands • Performance – measures how quickly a system performs a certain process or transaction
Reduced Redundancy • Databases reduce information redundancy by recording each piece of information in only one place • Redundancy – the duplication of information or storing the same information in multiple places; can lead to low quality information • Inconsistency is one of the primary problems with redundant information
Increased Integrity (Quality) • Information integrity – measures the quality of information • Integrity constraint – rules that help ensure the quality of information • Relational integrity constraint – rule that enforces basic and fundamental information-based constraints • Users cannot create an order for a nonexistent customer • An order cannot be shipped without an address • Business-critical integrity constraint – rule that enforce business rules vital to an organization’s success and often require more insight and knowledge than relational integrity constraints • Product returns are not accepted for fresh product 15 days after purchase • A discount maximum of 20 percent
Increased Security • Information is an organizational asset and must be protected • Databases offer several security features including: • Password – provides authentication of the user • Accesslevel – determines who has access to the different types of information • Accesscontrol – determines types of user access, such as read-only access
Increased Security • Why you would want to define access level security? • Access levels will typically mimic the hierarchical structure of the organization and protect organizational information from being viewed and manipulated by individuals who should not have access to the sensitive or confidential information • Low level employees typically have the lowest levels of access • High level employees typically have access to all types of database information
Increased Security • For example: You would not want analysts viewing all salary information for the entire company - in general: • Analysts can usually only view their own salary • Managers have higher access and can view the salaries of all their team members, but cannot view other managers’ salaries • Directors can view all of their managers’ and analysts’ salaries, but not other directors’ salaries • The CFO and CEO can view every employee’s salary
RELATIONAL DATABASE FUNDAMENTALS • Entity – a person, place, thing, transaction, or event about which information is stored • The rows in each table contain the entities • In Figure 6.5 CUSTOMER includes Dave’s Sub Shop and Pizza Palace entities • Entity class (table) – a collection of similar entities • In Figure 6.5 CUSTOMER, ORDER, ORDER LINE, DISTRIBUTOR, and PRODUCT entity classes
RELATIONAL DATABASE FUNDAMENTALS • Attributes (fields, columns) – characteristics or properties of an entity class • The columns in each table contain the attributes • In Figure 6.5 attributes for CUSTOMER include: • Customer ID • Customer Name • Contact Name • Phone • Possible other attributes: • Address • Fax • E-mail • Cell phone
RELATIONAL DATABASE FUNDAMENTALS • Primary keys and foreign keys identify the various entity classes (tables) in the database • Primary key – a field (or group of fields) that uniquely identifies a given entity in a table • Foreign key – a primary key of one table that appears an attribute in another table and acts to provide a logical relationship among the two tables • Example • Hawkins Shipping in the DISTRIBUTOR table has a primary key called Distributor ID – DEN8001 • Hawkins Shipping (Distributor IDDEN8001) is responsible for delivering orders 34561 and 345652 • Therefore, Distributor ID in the ORDER table creates a logical relationship (who shipped what order) between ORDER and DISTRIBUTOR
RELATIONAL DATABASE FUNDAMENTALS • How many orders have been placed for T’s Fun Zone? • Ans: 1 Order IT 34563 • How many orders have been placed for Pizza Palace? • Ans: None • How many items are included in Dave’s Sub Shop’s two orders? • Ans: Order 34561 has 3 items and order 34562 has one item for a total of 4 items in both orders. • Who is responsible for distributing Dave’s Sub Shop’s orders? • Ans: Hawkins Shipping • Which products are included in Order 34562? • Ans: 300 Vanilla Coke
DATABASE MANAGEMENT SYSTEMS • Database management systems (DBMS) – software through which users and application programs interact with a database
DATABASE MANAGEMENT SYSTEMS • Direct interaction – • The user interacts directly with the DBMS • The DBMS obtains the information from the database • Indirect interaction • User interacts with an application (i.e., payroll application, manufacturing application, sales application) • The application interacts with the DBMS • The DBMS obtains the information from the database
DATABASE MANAGEMENT SYSTEMS • Four components of a DBMS
Data Definition Component • Data definition component – creates and maintains the data dictionary and the structure of the database • The data definition component includes the data dictionary • Data dictionary – a file that stores definitions of information types, identifies the primary and foreign keys, and maintains the relationships among the tables • The data dictionary is an important part of the DBMS because users can consult the dictionary to determine the different types of database information
Data Definition Component • Data dictionary essentially defines the logical properties of the information that the database contains Business integrity constraint Relational integrity constraint
Data Manipulation Component • Data manipulation component – allows users to create, read, update, and delete information in a database • A DBMS contains several data manipulation tools: • View – allows users to see, change, sort, and query the database content • Reportgenerator – users can define report formats • Query-by-example (QBE) – users can graphically design the answers to specific questions • Structured query language (SQL) – query language
Data Manipulation Component • Sample report using Microsoft Access Report Generator
Data Manipulation Component • Sample report using Access Query-By-Example (QBE) tool
Data Manipulation Component • Results from the query in previous QBE
Data Manipulation Component • SQL version of the QBE Query in Figure 6.10
Application Generation and Data Administration Components • Application generation component – includes tools for creating visually appealing and easy-to-use applications • Data administration component – provides tools for managing the overall database environment by providing faculties for backup, recovery, security, and performance • IT specialists primarily use these components
INTEGRATING DATA AMONG MULTIPLE DATABASES • Integration – allows separate systems to communicate directly with each other • Forward integration – takes information entered into a given system and sends it automatically to all downstream systems and processes • Backward integration – takes information entered into a given system and sends it automatically to all upstream systems and processes
INTEGRATING DATA AMONG MULTIPLE DATABASES • One of the biggest benefits of integration is that organizations only have to enter information into the systems once and it is automatically sent to all of the other systems throughout the organization • This feature alone creates huge advantages for organizations because it reduces information redundancy and ensures accuracy and completeness • Without integrations an organization would have to enter information into every single system that requires the information from marketing and sales to billing and customer service • Entering the same customer information into multiple systems is redundant, and chances of making a mistake in one of the systems is high • For example, customer information would have to be manually entered into the marketing, sales, ordering, inventory, billing, and shipping databases. (Each of these systems are separate and would have their own database – if the company doesn’t have a complete ERP installed.)
INTEGRATING DATAAMONG MULTIPLE DATABASES • Forward and backward integration
INTEGRATING DATAAMONG MULTIPLE DATABASES • Sales enters the information when it is negotiating the sale (looking for opportunities) • The information is then passed to the order entry system when the order is actually placed • The order fulfillment system picks the products from the warehouse, packs the products, labels boxes, etc • Once the order is filled and shipped, the customer is billed • What would happen if users could enter order information directly into the billing system? • The systems would quickly become out-of-sync. There might be bills for nonexistent orders, or orders that do not have any bills (if someone deleted a bill) • For this reason organizations typically place a business-critical integrity constraint on integrated systems: With a forward integration the information must be entered in the sales system, you could not enter information directly into the billing system
INTEGRATING DATAAMONG MULTIPLE DATABASES • Integrations are expensive to build and maintain and difficult to implement • For these reasons many organizations only build forward integrations and use business-critical integrity constraints to ensure all information is always entered only at the start of the integration (one source of record) • Why would an organization want to build both forward and backward integrations? • This allows users to enter information at any point in the business process and the information is automatically sent upstream and downstream to all other systems • For example, if order fulfillment determined that they could not fulfill an order (the product had been discontinued), they could simply enter this information into the database and it would be sent automatically upstream to the sales representative who could contact the customer and downstream to billing to remove the item from the bill
INTEGRATING DATAAMONG MULTIPLE DATABASES • Building a central repository specifically for integrated information
INTEGRATING DATAAMONG MULTIPLE DATABASES • Users can create, read, update, and delete in the main customer repository, and it is automatically sent to all of the other databases • Business-critical integrity constraints still need to be built to ensure information is only ever entered into the customer repository, otherwise the information will become out-of-sync
HISTORY OF DATA WAREHOUSING • Bill Inmon, is recognized as the "father of the data warehouse" and co-creator of the "Corporate Information Factory." • Data warehouses extend the transformation of data into information • In the 1990’s executives became less concerned with the day-to-day business operations and more concerned with overall business functions • The data warehouse provided the ability to support decision making without disrupting the day-to-day operations
DATA WAREHOUSE FUNDAMENTALS • Data warehouse – a logical collection of information – gathered from many different operational databases – that supports business analysis activities and decision-making tasks • The primary purpose of a data warehouse is to aggregate information throughout an organization into a single repository for decision-making purposes • Database store information for a single application whereas a data warehouse stores information from multiple databases, or multiple applications, and external information such as industry information • This enables cross-functional analysis, industry analysis, market analysis, etc., all from a single repository • Data warehouses support online analytical processing (OLAP)
DATA WAREHOUSE FUNDAMENTALS • Extraction, transformation, and loading (ETL) – a process that extracts information from internal and external databases, transforms the information using a common set of enterprise definitions, and loads the information into a data warehouse • ETL process also gathers data from the data warehouse and passes it to the data marts • Data mart – contains a subset of data warehouse information • A data warehouse has an enterprise-wide organizational focus, while a data mart focuses on a subset of information for a given business unit such as finance
Multidimensional Analysis • Databases contain information in a series of two-dimensional tables • In a data warehouse and data mart, information is multidimensional, it contains layers of columns and rows • Dimension – a particular attribute of information – such as Products, Promotions, Stores, Category, Region, Stock price, Date, Time, Weather • The ability to look at information from different dimensions can add tremendous business insight • By slicing-and-dicing the information a business can uncover great unexpected insights
Multidimensional Analysis • Cube – common term for the representation of multidimensional information