420 likes | 782 Views
DATA RESOURCE MANAGEMENT. Data Hierarchy in a Computer System. Entitities and Attributes. Problems with the Traditional File Environment. Data redundancy Program-Data dependence Lack of flexibility Poor security Lack of data-sharing and availability. Figure 7-3.
E N D
DATA RESOURCE MANAGEMENT
Problems with the Traditional File Environment • Data redundancy • Program-Data dependence • Lack of flexibility • Poor security • Lack of data-sharing and availability
Figure 7-3 Traditional File Processing
Database Management System (DBMS) • Creates and maintains databases • Eliminates requirement for data definition statements • Acts as interface between application programs and physical data files • Separates logical and physical views of data
Components of DBMS • Data definition language: Specifies content and structure of database and defines each data element • Data manipulation language: • Manipulates data in a database • Data dictionary:Stores definitions of data elements, and data characteristics
Figure 7-6 Relational Data Model
Three Basic Operations in a Relational Database • Select:Creates subset of rows that meet specific criteria • Join:Combines relational tables to provide users with information • Project:Enables users to create new tables containing only relevant information
Figure 7-7 Three Basic Operations in a Relational Database
Ensuring Database Integrity • Database integrity involves the maintenance of the logical and business rules of the database. • There are two kinds of “DB Integrity” that must be addressed: • Entity Integrity • Referential Integrity
Entity Integrity • Entity integrity deals with within-entity rules. • These rules deal with ranges and the permission of null values in attributes or possibly between records
Examples of Entity Integrity • Data Type Integrity: very common and most basic. Checks only for “data type” compatibility with DB Schema, such as: numeric, character, logical, date format, etc. • Commonly referred to in GIS manuals as: • Range and List domains • Ranges - acceptable Numeric ranges for input • List - acceptable text entries or drop-down lists.
Enforcing Integrity • Not a trivial task! • Not all database management systems or GIS software enable users to “enforce data integrity” during attribute entry or edit sessions. • Therefore, the programmer or the Database Administrator must enforce and/or check for “Integrity.”
Referential Integrity • Referential integrity concerns two or more tables that are related. • Example: IF table A contains a foreign key that matches the primary key of table B THEN values of this foreign key either match the value of the primary key for a row in table B or must be null. • Necessary to avoid: Update anomaly, Delete anomaly.
Querying Databases: Elements of SQL Basic SQL Commands • SELECT: Specifies columns • FROM: Identifies tables or views • WHERE: Specifies conditions
Using SQL- Structured Query Language • SQL is a standard database protocol, adopted by most ‘relational’ databases • Provides syntax for data: • Definition • Retrieval • Functions (COUNT, SUM, MIN, MAX, etc) • Updates and Deletes
SQL Examples • CREATE TABLE SALESREP • Item definition expression(s) • {item, type, (width)} • DELETE table • WHERE expression
Data Retrieval • SELECT list FROM table WHERE condition • list - a list of items or * for all items • WHERE - a logical expression limiting the number of records selected • can be combined with Boolean logic: AND, OR, NOT • ORDER may be used to format results
UPDATE tables • SET item = expression • WHERE expression • INSERT INTO table • VALUES …..
Database Normalization • Normalization: The process of structuring data to minimize duplication and inconsistencies. • The process usually involves breaking down a single Table into two or more tables and defining relationships between those tables. • Normalization is usually done in stages, with each stage applying more rigorous rules to the types of information which can be stored in a table.
Normalization • Normalization: a process for analyzing the design of a relational database • Database Design - Arrangement of attributes into entities • It permits the identification of potential problems in your database design • Concepts related to Normalization: • KEYS and FUNCTIONAL DEPENDENCE
Ex: Database Normalization (1) • Sample Student Activities DB Table • Poorly Designed • Non-unique records • John Smith • Test the Design by developing sample reports and queries
Ex: Database Normalization (2) • Created a unique “ID” for each Record in the Activities Table • Required the creation of an “ID” look-up table for reporting (Students Table) • Converted the “Flat-File into a Relational Database
Ex: Database Normalization (3) • Wasted Space • Redundant data entry • What about taking a 3rd Activity? • Query Difficulties - trying to find all swimmers • Data Inconsistencies - conflicting prices
Ex: Database Normalization (4) • Students table is fine • Elimination of two columns and an Activities Table restructuring, Simplifies the Table • BUT, we still have Redundant data (activity fees) and data insertion anomalies. Problem: If student #219 transfers we lose all references to Golf and its price.
Ex: Database Normalization (5) • Modify the Design to ensure that “every non-key field is dependent on the whole key” • Creation of the Participants Table, corrects our problems and forms a union between 2 tables. This is a Better Design!
The Normal Forms • A series of logical steps to take to normalize data tables • First Normal Form • Second • Third • Boyce Codd • There’s more, but beyond scope of this
First Normal Form (1NF) • All columns (fields) must be atomic • Means : no repeating items in columns Solution: make a separate table for each set of attributes with a primary key (parser, append query) Customers CustomerID Name Orders OrderID Item CustomerID OrderDate
Second Normal Form (2NF) • In 1NF and every non-key column is fully dependent on the (entire) primary key • Means : Do(es) the key field(s) imply the rest of the fields? Do we need to know both OrderID and Item to know the Customer and Date? Clue: repeating fields Solution: Remove to a separate table (Make Table) Orders OrderID CustomerID OrderDate OrderDetails OrderID Item
Third Normal Form (3NF) • In 2NF and every non-key column is mutually independent • means : Calculations • Solution: Put calculations in queries and forms OrderDetails OrderID Item Quantity Price Put expression in text control or in query: =Quantity * Price
Data Warehousing and Datamining Data warehouse • Supports reporting and query tools • Stores current and historical data • Consolidates data for management analysis and decision making
What is a Data Warehouse? "A warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process". Bill Inmon (1990) "A Data Warehouse is a repository of integrated information, available for queries and analysis. Data and information are extracted from heterogeneous sources as they are generated.…” Anonymous
Data Mining • ON-LINE ANALYTICAL PROCESSING (OLAP):ability to manipulate, analyze large volumes of data from multiple perspectives • MINING: Seeking relationships that are not known in advance. A function of the software and data organization.
DW Characteristics • Subject Oriented:Data that gives information about a particular subject instead of about a company's ongoing operations. • Integrated: Data that is gathered into the data warehouse from a variety of sources and merged into a coherent whole. • Time Variant: All data in the data warehouse is identified with a particular time period.
Data Acquisition • The process of moving company data from the source systems into the warehouse. • Often the most time-consuming and costly effort. • Performed with software products known as ETL (Extract/Transform/Load) tools. • Over 50 ETL tools on market.
Data Cleansing • Typically performed in conjunction with data acquisition. • A complicated process that validates and, if necessary, corrects the data before it is inserted. • AKA "data scrubbing" or "data quality assurance".