280 likes | 465 Views
Data Resource Management Agenda. What types of data are stored by organizations? How are different types of data stored? What are the potential problems with stored data?. Data Stored in the Purchase Order Database. What is contained in the purchase order database on the previous page?
E N D
Data Resource Management Agenda • What types of data are stored by organizations? • How are different types of data stored? • What are the potential problems with stored data?
Data Stored in the Purchase Order Database • What is contained in the purchase order database on the previous page? • Operational (transaction) data generated “internally” or within the organization. • Historical data (PurchaseHistory entity). • What generates the data (where does it come from)? • How is it input into a computer? • Who is responsible for the accuracy of that data?
Problems with internal operational data • Not integrated. • Redundant of other systems in the organization. • Potentially of poor quality (“dirty data”): • Incomplete. • Not accurate. • Inconsistent. • The meaning of the data is not fully defined and/or understood by all stakeholders.
More about the purchase order database • What happens when a purchase order is completely received and paid for? • Is the data deleted from the database? • Should the data be stored? Why or why not? • Where would the data be stored? Same database? Different database? • Should the data be stored in the same format or in a different format?
What about decision support data? • How does data used to help make decisions differ from transaction data? • Does decision making data need to be stored separately from transaction data? • Does decision makingdata require a different database structure?
We use data to answer management questions Operational Questions Decision Support Questions Which vendor gives us the best price for ProductID 8992? Which vendor delivers ProductID 8992 most reliably (on time and in best condition)? Which Product Type is increasing in price most steeply? Which Product Type is decreasing in price most quickly? Do employees place purchase orders with the same vendors, or do they differentiate based on price or reliability? • What products are due to be received today? • What products were received today? • What is the price for ProductID 1224 on PO#0667? • When is the next due date for ProductID 8992? • Which employee received the products for PO#0667? • Which employee placed the most purchase orders this month?
Organizations also use external data. • What data NOT generated by the organization might be relevant to making decisions? • Who generates external data, and how does it get into my database??
One Database for both transaction processing and decision support Transaction Database separate from decision making database So, can one database support both transaction processing and decision support applications? Should we just add a few tables to the transaction processing database, as shown in the prior slide?? Or should we create a separate database for decision making? VS.
A Business Intelligence “System” • Transaction processing systems vs. operational systems. • A business intelligence system encompasses all processes, hardware and software necessary to extract data, transform it, integrate it, store it, and provide information. The information is then made effective and accessible to users to support decision making. • Sounds like just another information system... So what makes it different?
Data Input! Data Use! Big Data!
The “V’s” of Big Data • Volume: scale of data • Velocity: frequency of change • Variety: Different forms and sources of data • Veracity: Uncertainty of the accuracy of data
Components of a business intelligence system • Data store (called a “data warehouse”). • Extraction/transformation/loading processes. • End user query tools. • End user visualization tools.
What is a data warehouse? • A data warehouse is a database designed to support a business intelligence system. • A data warehouse is: • Integrated: It is a centralized, consolidated database integrating data from an entire organization. • Subject-oriented: Data warehouse data are organized around key subjects. The data are usually arranged by topic, such as customers, products, suppliers, etc. • Time-variant: Data in the warehouse contain a time dimension so that they may be used as a historical aggregation. • Non-volatile: Once data enter, they seldom leave. Data are appended rather than overwritten. Data are updated in batches.
Issues in creating a data warehouse • How to get accurate and complete data? • How to consolidate data? • Differing data meanings. • Differing storage mechanisms. • Differing data formats.
Issues in designing a data warehouse • Must have a predefined subject focus. • Has the potential to be very large – must define the “grain” or granularity level of storage. • Will always have a dimension of time. • May contain derived data. • May be a summary of data, rather than each detailed transaction. • Does not always adhere to standard normalization rules.
Potential Data Mart for Purchase Orders Can be a “star” or a “snowflake” design – this one is a snowflake
Accessing a data warehouse • Visualization tools. • Graphical. • Spreadsheet format - usually Excel look-and-feel. • Beyond the spreadsheet using discovery tools. Example: http://www.gapminder.org/ • Dashboard. Examples: http://www.dundas.com/dashboard/online-examples/ • Query tools. • OLAP: Online analytical processing. • Data mining: Artificial intelligence based query methods.
Online analytical processing • Provides multi-dimensional data analysis techniques. • Works primarily with data aggregation. • Provides advanced statistical analysis. • Supports access to very large databases. • Provides enhanced query optimization algorithms. • Lots of acronyms: OLAP, ROLAP, MOLAP, HOLAP. • Can be add-ons to existing products, example is Excel. Can have their own user interfaces.
Data mining • Data mining tools: • analyze the data; • uncover patterns hidden in the data; • form computer models based on the findings; and • use the models to predict business behavior. • Proactive tools. • Based on artificial intelligence software such as decision trees, neural networks, fuzzy logic systems, inductive nets and classification networking.