1 / 44

Chapter 11

Chapter 11. Data Management: Warehousing, Analyzing, Mining & Vizualization. Learning Objectives. Recognize the importance of data, their managerial issues, and their life cycle. Describe the sources of data, their collection, and quality issues.

atira
Download Presentation

Chapter 11

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 11 Data Management: Warehousing, Analyzing, Mining & Vizualization

  2. Learning Objectives • Recognize the importance of data, their managerial issues, and their life cycle. • Describe the sources of data, their collection, and quality issues. • Relate data management to multimedia and document management. • Explain the operation of data warehousing and its role in decision support.

  3. Learning Objectives(cont.) • Understand the data access and analysis problem and the data mining and online analytical processing solutions. • Describe data presentation methods and explain geographical information systems, visual simulations, and virtual reality as decision support tools. • Discuss the role and provide examples of marketing databases. • Recognize the role of the Web in data management.

  4. Case: Sears & Data Warehouses Problem: • Sears was caught by surprise in the 1980s as shoppers defected to specialty stores and discount mass merchandisers. Solution: • Sears constructed a single sales information data warehouse, replacing 18 old databases which were packed with redundant, conflicting & obsolete data. • By 2001, Sears made the following Web initiatives: • e-Commerce home improvement center • B2B supply exchange for the retail industry • Online Toy catalog and much more

  5. Case: Sears & Data Warehouses Results: • The ability to monitor sales by item per store enables Sears to create a sharp local market focus. • Data monitoring of Web-based sales helps Sears marketing and Web advertisement plans. • Response time to queries has dropped from days to minutes. • The data warehouse offers Sears employees a tool for making better decisions. • Sears retailing profits have climbed more than 20 % annually since the data warehouse was implemented.

  6. Difficulties of Managing Data • The amount of data increases exponentially. • Data are scattered throughout organizations and are collected by many individuals using several methods and devices. • Only small portions of an organization’s data are relevant for any specific decision. • An ever-increasing amount of external data needs to be considered in making organizational decisions. • Data are frequently stored in several servers and locations in an organization.

  7. Raw data may be stored in different computing systems, databases, formats, and human and computer languages. Legal requirements relating to data differ among countries and change frequently. Selecting data management tools can be a major problem because of the huge number of products available. Data security, quality, and integrity are critical yet are easily jeopardized. Difficulties of Managing Data (cont.)

  8. Data Life Cycle

  9. Internal Data. An organization’s internal data are about people, products, services, and processes. Personal Data.IS users or other corporate employees may document their own expertise by creating personal data. External Data. There are many sources for external data, ranging from commercial databases to sensors and satellites. The Internet & Commercial Database Services. Some external data flow to an organization through electronic data interchange (EDI), through other company-to-company channels or the Internet. Data Sources & Collection

  10. Data Quality (DQ) DQ is an extremely important issue since quality determines the data’s usefulness as well as the quality of the decisions based on the data.

  11. Intrinsic DQ: Accuracy, objectivity, believability, and reputation. Accessibility DQ: Accessibility and access security. Contextual DQ: Relevancy, value added, timeliness, completeness, amount of data. Representation DQ: Interpretability, ease of understanding, concise representation, consistent representation. Data Quality Problems (Strong et al.,1997)

  12. Object-Oriented Databases • The object-oriented database is the most widely used of the newest methods of data organization, especially for Web applications. • An object-oriented database is a part of the object-oriented paradigm, which also includes object-oriented programming, operating systems, and modeling. • Object-oriented databases are sometimes referred to as multimedia databases and are managed by special multimedia database management systems.

  13. Document Management Document Management is the automated control of electronicdocuments, page images, spreadsheets, word processing documents, and complex, compound documents through their entire life cycle within an organization, from initial creation to final archiving. Benefits of Document Management: • Greater control over production, storage, and distribution of documents • Greater efficiency in the reuse of information • Control of a document through a workflow process • Reduction of product cycle times

  14. Case: U.S. Automobile Association(USAA) Problem: • The USAA is a large insurance company in Texas that serves over 2 million officers.In the 1980s, the company experienced extreme delays in data retrieval and searches. Solution: • Using an environment called Automated Insurance Environment, USAA has been transformed into a completely paperless company. Results: • The system reduces the cost of storing documents, improves customer service, and improves productivity of employees. • USAA now saves $70,500,000 for the 10,000,000 documents handled annually.

  15. Transactional: The data in transactions processing systems (TPS) are organized mainly in a hierarchical structure and are centrally processed. Databases and processing systems are known as operational systems. Analytical: Analytical processing involves analysis of accumulated data, mainly by end-users. Includes DSS, EIS, Web applications, and other end-user activities. Data Processing Data processing in organizations can be viewed either as transactional or analytical.

  16. Delivery Systems A good data delivery system should be able to support: • Easy data access by the end-users themselves. • A quick decision-making process. • Accurate and effective decision making. • Flexible decision making.

  17. Data Warehouses • The purpose of a data warehouse is to establish a data repository that makes operational data accessible in a form readily acceptable for analytical processing activities (e.g. decision support, EIS) • Data warehouses include a companion calledmetadata, meaning data about data. Major Benefits of Data Warehouses: (1) The ability to reach data quickly, as they are located in one place. (2) The ability to reach data easily, frequently by end-users themselves, using Web browsers.

  18. Data Warehouses

  19. Characteristics of Data Warehouses • Organization. Data are organized by detailed subjects. • Consistency. Data in different operational databases may be encoded differently. In the warehouse they will be coded in a consistent manner. • Time variant. The data are kept for 5 to 10 years so they can be used for trends, forecasting, and comparisons over time. • Non-volatile. Once entered into the warehouse, data are not updated. • Relational.The data warehouse uses a relational structure. • Client/server. The data warehouse uses the client/server to provide the end user an easy access to its data.

  20. Data Warehouse Suitability Data warehousing is most appropriate for organizations in which some of the following apply. • Large amounts of data need to be accessed by end-users. • The operational data are stored in different systems. • An information-based approach to management is in use. • There is a large, diverse customer base. • The same data are represented differently in different systems. • Data are stored in highly technical formats that are difficult to decipher. • Extensive end-user computing is performed.

  21. Data Martsare an alternative used by many other firms is creation of a lower cost, scaled-down version of a data warehouse. They refer to small warehouses designed for a strategic business unit (SBU) or a department. Two major types of Data Marts: 1) Replicated (dependent) Data Marts. In such cases one can replicate functional subsets of the data warehouse in smaller databases. 2) Stand-Alone Data Marts. A company can have one or more independent data marts without having a data warehouse. Data Marts

  22. KDD is the process of extracting useful knowledge from volumes of data. It is the subject of extensive research. KDD’s objective is to identify valid, novel, potentially useful, and ultimately understandable patterns in data. KDD is useful because it is supported by three technologies that are now sufficiently mature: Massive data collection Powerful multiprocessor computers Data mining algorithms Knowledge Discovery in Databases (KDD)

  23. Evolution of KDD

  24. Ad-hoc queries allow users to request in real time information from the computer that is not available in the periodical reports. Such answers are needed to expedite decision making. Online analytical processing (OLAP) refers to such end-user activities as DSS modeling using spreadsheets and graphics, which are done online. Ready-made Web-based Analysis.Many vendors provide ready made analytical tools, mostly in finance, marketing, and operations. Tools & Techniques of KDD

  25. Data mining derives its name from the similarities between searching for valuable business information in a large database,and mining a mountain for valuable ore. Data mining technology can generate new business opportunities by providing these capabilities: Automated prediction of trends and behaviors. Data mining automates the process of finding predictive information in large databases. Automated discovery of previously unknown patterns. Data mining tools identify previously hidden patterns in one step. Data Mining

  26. Retailing & Sales Banking Manufacturing & Production Brokerage & Securities trading Computer hardware & software Insurance Policework Government &Defense Airlines Health care Broadcasting Marketing Applications of Data Mining Data Mining is currently being used in the following areas;

  27. Text mining is the application of data mining to non-structured or less structured text files. Text mining helps organizations to do the following: Find the “hidden” content of documents, including additional useful relationships. Group documents by common themes. Web Mining refers to mining tools used to analyze a large amount of data on the Web, such as what customers are doing on the Web—that is, to analyze clickstream data. Text & Web Mining

  28. Data Visualization Data visualization refers to the presentation of data by technologies such as digital images, geographical information systems, graphical user interfaces, multidimensional tables and graphs, virtual reality, three-dimensional presentations, and animation.

  29. CASE: Data Visualization Helps Haworth Problem • Haworth Corporation, a major office furniture manufacturer, has maintained a competitive edge by offering customization. • But many customers are unable to visualize the 21 million potential product combinations. Solution: • Computer visualization software enables sales representatives with laptops to show customers exactly what they were ordering. Results: • Reduction in time spent between sales reps and CAD operators, & increased customer satisfaction with quicker delivery.

  30. Multidimensionality • Modern data and information may have several dimensions. • e.g. Management may be interested in examining sales figures in a certain city by product, by time period, by salesperson, and by store. • It is important to provide the user with a technology that allows him or her to add, replace, or change dimensions quickly and easily in a table and/or graphical presentation. • The technology of slicing, dicing, and similar manipulations is called Multidimensionality.

  31. Multidimensionality Three factors are considered in multidimensionality: Examples of dimensions: Products, salespeople, market segments, business units, geographical locations, distribution channels, countries, industries. Examples of measures: Money, sales volume, head count, inventory profit, actual versus forecasted results. Examples of time: Daily, weekly, monthly, quarterly, yearly.

  32. Advantages of Multidimensionality • Data can be presented and navigated with relative ease. • Multidimensional databases are easier to maintain. • Multidimensional databases are significantly faster than relational databases as a result of the additional dimensions and the anticipation of how the data will be accessed by users. .

  33. Geographic Information Systems (GIS) • A geographical information system (GIS) is a computer-based system for capturing, storing, checking, integrating, manipulating, and displaying data using digitized maps. • Every record or digital object has an identified geographical location. • Banks are using GIS for plotting the following: • Branch and ATM locations • Customer demographics • Volume and traffic patterns of business activities • Geographical area served by each branch • Market potential for banking activities • Strengths and weaknesses against the competition • Branch performance

  34. Geographic Information Systems (GIS) • GIS Softwarevaries in its capabilities, from simple computerized mapping systems to enterprise wide tools for decision support data analysis. • GIS Data are available from a wide variety of sources. Government sources (via the Internet and CD-ROM) provide some data, while vendors provide diversified commercial data as well • GIS & Decision Making. The graphical format of makes it easy for managers to visualize the data& make decisions. • GIS and the Internet or intranet. Most major GIS software vendors are providing Web access, such as embedded browsers, or a Web/Internet/intranet server that hooks directly into their software. • Emerging GIS Applications.

  35. Visual interactive modeling (VIM) uses computer graphic displays to represent the impact of different management decisions on goals such as profit or market share. A VIM can be used both for supporting decisions & training. It can represent a static or a dynamic system. Visual interactive simulation (VIS) is one of the most developed areas in VIM. It is a decision simulation in which the end-user watches the progress of the simulation model in an animated form using graphics terminals. Visual Interactive Modeling (VIM)

  36. Virtual Reality • Virtual reality (VR) is interactive, computer-generated, three-dimensional graphics delivered to the user through a head-mounted display. • VR applications to date have been used to support decision making indirectly. • Boeing has developed a virtual aircraft mock-up to test designs. • At Volvo, VR is used to test virtual cars in virtual accidents. • Data visualization helps financial decision makers by using visual, spatial & aural immersion virtual systems. • Some stock brokerages have a VR application in which users surf over a landscape of stock futures, with color, hue, and intensity.

  37. Marketing Transaction Database • The Marketing transaction database (MTD) combines many of characteristics of static databases and marketing data sources into a new database that allows marketers to engage in real-time personalization and target every interaction with customers. • The MTD provides dynamic, or interactive, functions not available with traditional types of marketing databases. • Exchanging information allows marketers to refine their understanding of each customer continuously. • Data mining, data warehousing, and MTDs are delivered on the Internet and intranets.

  38. Implementation Examples The following examples illustrate how companies use data mining and warehousing to support the new marketing approaches; • Alamo Rent-a-Car discovered that German tourists liked bigger cars. So now, when Alamo advertises its rental business in Germany, the ads include information about its larger models. • Au Bon Pain Company discovered that they were not selling as much cream cheese as planned. When they analyzed point-of-sale data, they found that customers preferred small, one-serving packaging. • AT&T and MCI sift through terabytes of customer phone data to fine-tune marketing campaigns and determine new discount calling plans.

  39. CASE: Data Mining Powers Walmart • Wal-Mart’s formula for success owes much to the company’s multimillion-dollar investment in data warehousing. • The systems house data on point of sale, inventory, products in transit, market statistics, customer demographics, finance, product returns, and supplier performance. • The data are used for three broad areas of decision support: • analyzing trends • managing inventory • understanding customers • The data warehouse is available over an extranet to store managers and suppliers. • In 2001, 5,000 users made over 35,000 database queries each day.

  40. Web-based Data Management Systems • Business intelligence activities – from data acquisition, through warehousing, to mining – can be performed with Web tools or are interrelated with Web technologies and e-Commerce. • e-Commerce software vendors are providing Web tools that connect the data warehouse with EC ordering and cataloging systems. • e.g. Tradelink, a product of Hitachi • Data warehousing and decision support vendors are connecting their products with Web technologies and EC. • e.g. Comshare’s DecisionWeb, Brio’s Brio One, Web Intelligence from Business Objects, and Cognos’s DataMerchant.

  41. Corporate Portals

  42. Web-based Data Acquisition Traditional data acquisition has become a pervasive element in today’s business environment. This acquisition includes both the recording of information from online surveys and questionnaires, and direct measurements taken in the manufacturing environment. Intelligent Data Warehouse The amount of data in the data warehouse can be very large. While the organization of data is done in a way that permits easy search, it still may be useful to have a search engine for specific applications. Web-based Data Acquisition & Agents

  43. Managerial Issues • The legacy data problem.What should be done with masses of information already stored in a variety of formats, often known as the legacy data acquisition problem? • Cost–benefit issues & justification.A cost–benefit analysis must be undertaken before any commitment to new technologies. • Where to store data physically.Should data be distributed close to their sources? Or should data be centralized for easier control. • Legal issues.Data mining gives raise to a variety of legal issues.

  44. Disaster recovery.How well can an organization’s business processesrecover after an information system disaster? Internal or external?Should a firm store & maintain its databases internally or externally? Data security and ethics.Are the company’s competitive data safe from external snooping or sabotage? Ethics. Should people have to pay for use of online data? Privacy.Collecting data in a warehouse and conducting data mining may result in the invasion of privacy. Data purging.When is it beneficial to “clean house” and purge information systems of obsolete or non–cost-effective data? Data delivery.A problem regarding how to move data efficiently around an enterprise also exists. Managerial Issues (cont.)

More Related