480 likes | 739 Views
Organising Data in a Traditional File Environment . File organization conceptsDatabase: Group of related filesFile: Group of records of same type Record: Group of related fieldsField: Group of characters as word(s) or numberDescribes an entity (person, place, thing on which we store information
E N D
1. DATABASES ANDINFORMATION MANAGEMENT
2. Organising Data in a Traditional File Environment File organization concepts
Database: Group of related files
File: Group of records of same type
Record: Group of related fields
Field: Group of characters as word(s) or number
Describes an entity (person, place, thing on which we store information)
Attribute: Each characteristic, or quality, describing entity
E.g., Attributes Date or Grade belong to entity COURSE
3. The data hierarchy A computer system organizes data in a hierarchy that starts with the bit, which represents either a 0 or a 1. Bits can be grouped to form a byte to represent one character, number, or symbol. Bytes can be grouped to form a field, and related fields can be grouped to form a record. Related records can be collected to form a file, and related files can be organized into a database.
4. The data hierarchy
5. Problems with the traditional file environment Data redundancy:
Presence of duplicate data in multiple files
Data inconsistency:
Same attribute has different values
Program-data dependence:
When changes in program requires changes to data accessed by program
Lack of flexibility
Poor security
Lack of data sharing and availability
6. © 2008 Prentice-Hall, Inc.
6 Pros and Cons of Conventional Files Pros
Easy to design because of their single-application focus
Excellent performance due to optimized organization for a single application
Pros
Files generally out perform databases.
Now cheaper & more powerful computers & more efficient database technologies
Cons
Redesign
Leads to redundancy.Pros
Files generally out perform databases.
Now cheaper & more powerful computers & more efficient database technologies
Cons
Redesign
Leads to redundancy.
7. Files and Tables File – the set of all occurrences of a given record structure.
Table – the relational database equivalent of a file.
Types of conventional files and tables
Master files – Records relatively permanent though values may change
Transaction files – Records describe business events
Document files – Historical data for review without overhead of regenerating document
Archival files – Master and transaction records that have been deleted
Table lookup files – Relatively static data that can be shared to maintain consistency
Audit files – Special records of updates to other files
8. Data organisation in a traditional file environment
9. The Database Approach to Data Management Database
Serves many applications by centralizing data and controlling redundant data
Database management system (DBMS)
Interfaces between applications and physical data files
Separates logical and physical views of data
Solves problems of traditional file environment
Controls redundancy
Eliminates inconsistency
Uncouples programs and data
Enables organization to centrally manage data and data security
10. HUMAN RESOURCES DATABASE WITH MULTIPLE VIEWS
11. The Database Approach to Data Management Relational DBMS
Represent data as two-dimensional tables called relations or files
Each table contains data on entity and attributes
Table: grid of columns and rows
Rows (tuples): Records for different entities
Fields (columns): Represents attribute for entity
Key field: Field used to uniquely identify each record
Primary key: Field in table used for key fields
Foreign key: Primary key used in second table as look-up field to identify records from original table
12. RELATIONAL DATABASE TABLES
13. RELATIONAL DATABASE TABLES (cont.)
14. Operations of a Relational DBMS Three basic operations used to develop useful sets of data
SELECT: Creates subset of data of all records that meet stated criteria
JOIN: Combines relational tables to provide user with more information than available in individual tables
PROJECT: Creates subset of columns in table, creating tables with only the information specified
15. THE THREE BASIC OPERATIONS OF A RELATIONAL DBMS
16. Database Types Relational databases
Organize data in a table
Link tables to each other through their primary keys
Object-Oriented DBMS (OODBMS)
Stores data and procedures as objects
Objects can be graphics, multimedia, Java applets
Relatively slow compared with relational DBMS for processing large numbers of transactions
Hybrid object-relational DBMS: Provide capabilities of both OODBMS and relational DBMS
Multidimensional databases
Stores data in multiple dimensions
Can easily be customized
Process data much faster
17. Database Management Systems (DBMS) Application software designed to capture and analyze data
Four main operations of a DBMS are:
Creating databases and entering data
Viewing and sorting data
Extracting data
Outputting data
18. Capabilities of Database Management Systems Data definition capability: Specifies structure of database content, used to create tables and define characteristics of fields
Data dictionary: Automated or manual file storing definitions of data elements and their characteristics
Data manipulation language: Used to add, change, delete, retrieve data from database
Structured Query Language (SQL)
Microsoft Access user tools for generation SQL
Many DBMS have report generation capabilities for creating polished reports (Crystal Reports)
19. Data architecture Data architecture – a definition of how:
Files and databases are to be developed and used to store data
The file and/or database technology to be used
The administrative structure set up to manage the data resource
Data is stored in some combination of:
Conventional files
Operational databases – databases that support day-to-day operations and transactions for an information system. Also called transactional databases.
Data warehouses – databases that store data extracted from operational databases.
To support data mining
Personal databases
Work group databases
20. Typical DBMS architecture
21. Data warehouse A data warehouse is a large-scale electronic repository of data that contains and organizes in one place all the data related to an organization. Individual databases contain a wealth of information, but each database’s information usually pertains to one topic.
Data warehouses consolidate information from disparate sources to present an enterprise-wide view of business operations.
Data in the data warehouse is organized by subject. Most databases focus on one specific operational aspect of business operations.
23. Populating data warehouses Source data for data warehouses can come from three places:
Internal sources (such as company databases)
External sources (suppliers, vendors, and so on)
Customers or visitors to the company Web site
Companies can use software on their Web sites to capture information about each click that users make as they navigate through the site. This information is referred to as clickstream data.
24. Advantages of data warehouses Maintain data history, even if the source transaction systems do not.
Integrate data from multiple source systems, enabling a central view across the enterprise. This benefit is always valuable, but particularly so when the organization has grown by merger.
Improve data quality, by providing consistent codes and descriptions, flagging or even fixing bad data.
Present the organization's information consistently.
Provide a single common data model for all data of interest regardless of the data's source.
Restructure the data so that it makes sense to the business users.
Restructure the data so that it delivers excellent query performance, even for complex analytic queries, without impacting the operational systems.
Add value to operational business applications, notably customer relationship management (CRM) systems.
25. Data staging No two source databases are the same. Therefore, although two databases might contain similar information (such as customer names and addresses), the format of the data is most likely different in each database. Therefore, source data must be “staged” before entering the data warehouse.
26. Data staging Data staging consists of three steps:
1. Extraction of the data from source databases
2. Transformation (reformatting) of the data
3. Storage of the data in the warehouse
Many different software programs and procedures may have to be created to extract the data from varied sources and to reformat it for storage in the data warehouse.
27. Data Marts Looking for the data you need in a data warehouse can be daunting when there are terabytes of data. Therefore, small slices of the data warehouse, called data marts, are often created.
Whereas data warehouses have an enterprise-wide depth, the information in data marts pertains to a single department.
28. Data Warehouse
29. Business Intelligence Tools for consolidating, analyzing, and providing access to vast amounts of data to help users make better business decisions
E.g., Harrah’s Entertainment analyzes customers to develop gambling profiles and identify most profitable customers
Principle tools include:
Software for database query and reporting
Online analytical processing (OLAP)
Data mining
30. Online analytical processing (OLAP) Supports multidimensional data analysis
Viewing data using multiple dimensions
Each aspect of information (product, pricing, cost, region, time period) is different dimension
E.g., how many washers sold in the East in June compared with other regions?
OLAP enables rapid, online answers to ad hoc queries
31. MULTIDIMENSIONAL DATA MODEL
32. Data mining The process by which great amounts of data are analyzed and investigated.
The objective is to spot significant patterns or trends within the data that would otherwise not be obvious.
33. Data Mining More discovery driven than OLAP
Finds hidden patterns, relationships in large databases and infers rules to predict future behavior
E.g., Finding patterns in customer data for one-to-one marketing campaigns or to identify profitable customers.
Types of information obtainable from data mining
Associations
Sequences
Classification
Clustering
Forecasting
34. Predictive analysis
Uses data mining techniques, historical data, and assumptions about future conditions to predict outcomes of events
E.g., Probability a customer will respond to an offer
Text mining
Extracts key elements from large unstructured data sets (e.g., stored e-mails)
35. Data mining methods Data mining enables managers to sift through data in a number of ways. Each method produces different information that managers can then base their decisions on.
The following are five things managers do to make their data meaningful:
Classification
Estimation
Affinity grouping or association rules
Clustering
Description and visualisation
36. Classification Before mining, managers define data classes that they think will be helpful in spotting trends. They then apply these class definitions to all unclassified data to prepare it for analysis.
37. Estimation When managers classify data, the record either fits the classification criteria or it doesn’t.
Estimation enables managers to assign a value, based on some criterion, to data.
For example, assume a bank wants to send out credit card offers to people who are likely to be granted a credit card. The bank may run the customers’ data through a program that assigns them a score based on where they live, their household income, and their average bank balance. This provides managers with an estimate of the most likely credit card prospects so that they can include them in the mailing.
38. Affinity grouping or association rules When mining data, managers can also determine which data goes together.
In other words, they can apply affinity grouping or association rules to the data.
For example, suppose analysis of a sales database indicates that two items are bought together 70 percent of the time. Based on this data, managers might decide that these items should be pictured on the same page in the next mail-order catalog they send out.
39. Clustering Clustering involves organizing data into similar subgroups, or clusters. It is different from classification in that there are no predefined classes.
The data-mining software makes the decision about what to group together, and it is up to managers to determine whether the clusters are meaningful.
For example, the data-mining software may identify clusters of customers with similar buying patterns. Further analysis of the clusters may reveal that certain socioeconomic groups have similar buying patterns
40. Description and visualisation Often, the purpose of data mining is merely to describe data so managers can visualize it. Sometimes having a clear picture of what is going on with the data helps people to interpret it in new and different ways.
41. Modern data architecture
42. Web mining Discovery and analysis of useful patterns and information from WWW
E.g., to understand customer behavior, evaluate effectiveness of Web site, etc.
Web content mining
Knowledge extracted from content of Web pages
Web structure mining
E.g., links to and from Web page
Web usage mining
User interaction data recorded by Web server
43. Databases and the Web Many companies use Web to make some internal databases available to customers or partners
Typical configuration includes:
Web server
Application server/middleware/CGI scripts
Database server (hosting DBM)
Advantages of using Web for database access:
Ease of use of browser software
Web interface requires few or no changes to database
Inexpensive to add Web interface to system
44. Linking internal databases to the web
45. Managing Data resources Establishing an information policy
Firm’s rules, procedures, roles for sharing, managing, standardizing data
Data administration:
Firm function responsible for specific policies and procedures to manage data
Data governance:
Policies and processes for managing availability, usability, integrity, and security of enterprise data, especially as it relates to government regulations
Database administration:
Defining, organizing, implementing, maintaining database; performed by database design and management group
46. Administrators Data administrator – a database specialist responsible for data planning, definition, architecture, and management.
Database administrator – a specialist responsible for database technology, database design, construction, security, backup and recovery, and performance tuning.
A database administrator will administer one or more databases
47. Managing data resources Ensuring data quality
More than 25% of critical data in Fortune 1000 company databases are inaccurate or incomplete
Most data quality problems stem from faulty input
Before new database in place, need to:
Identify and correct faulty data
Establish better routines for editing data once database in operation
48. Managing data resources Data quality audit:
Structured survey of the accuracy and level of completeness of the data in an information system
Survey samples from data files, or
Survey end users for perceptions of quality
Data cleansing
Software to detect and correct data that are incorrect, incomplete, improperly formatted, or redundant
Enforces consistency among different sets of data from separate information systems