310 likes | 387 Views
Chapter 1. For IS420/IS520. Define “database”. A database is a self-describing collection of integrated tables that is computerized Self describing Collection of related tables/records Computerized
E N D
Chapter 1 For IS420/IS520
Define “database” • A database is a self-describing collection of integrated tables that is computerized • Self describing • Collection of related tables/records • Computerized • In the real world, any database can only be accessed by the Database Management System (DBMS) that created the database
Why databases • Share data • Data/information needs to be shared to run a business • We can run a company without its CEO, but not without data • Reduce redundancy • Reduce inconsistent data • Support transaction • Support data integrity • Enforce security • Support and enforce for standards • Meet complicated requirements
DBMS • It is a piece of system software • At enterprise level, it is always a piece of server level software • It is OS dependent • The major vendors • Oracle • MySQL • Microsoft • IBM • Sybase • RDS
Services of an enterprise DBMS • Provides basic services: • Moving data to/from physical files using OS • Managing concurrent access, NOT NECESSARILY by multiple users • Managing transactions so that each is an “all or nothing” unit of work • Support SQL • Provisions for backup and recovery • Security to prevent unauthorized access and modification
Layers of Data Abstraction • The DBMS presents users with distinct views of the data (views), while storingthe data only once, to support users and application with different access rights.
External Model • Represents user’s point of view • External level consists of many different external views • Each external view is created for different users or application and show different portion of the actual database • User may be unaware of other entities, attributes and relationships • External views are described in External schemas which are written in the Data Definition Language
Conceptual Model • Data architect point of view • Middle level that contains entirety of the DB • Complete logical model • Entities, attributes, relationships, constraints on data, semantic information, security and integrity • Conceptual model supports external views • Closest to E-R diagram • Some people think this is DBA’s view, I disagree • For example, a split table is not a view for this level • Conceptual model is relatively constant • It is written in DDL, stored in system database and compiled by DBMS
Physical Model • Describes how the data is stored in the DB, that is the physical implementation of the DB • Includes data structures and file organization • Works with Os • To lay out data and other DB objects on the storage devices • Build indexes, etc. • Simply, it is managed by OS but under the instruction of DBMS, which is why DBMS is always OS dependent
Why Three Level Architecture • To achieve data independence • Logical data independence • Physical data independence • This is the main reason we move from file systems to databases
Logical Data Independence • Refers to immunity of external schemas to changes in conceptual schema such as • adding and removing records, • adding tables, indexes, or views • adding columns • However, removing columns and changing access right may affect external schemas • Conceptual schema should not affect external schema or rewrites of application programs
Physical Data Independence • Refers to immunity of conceptual schema to changes in the physical file structure such as different organizations and storage devices • For example, if DBA moved database files from one disk to another should not trigger changes to conceptual or external schemas.
DBMS models • Flat Files • Hierarchical Model • Network Model • ****Relational Model • Object-Oriented Model • Object-Relational Model
Relational Data Model • Primary unit of storage is the table • Each table may be used independently or joins may be used to combine tables • Relationships defined using primary and foreign keys (This is referred as INTEGRATED in our db definition) • Easily defined integrity constraints
Relational Model -- Benefits • Benefits: • Very fast retrieval in most cases • Users need no awareness of physical storage • Complex queries relatively easily developed • Data usually more accurate • Easier application programming, relatively speaking • Standard query language (SQL) • Supported by relational algebra and relational calculus
Relational Model -- Drawbacks • Drawbacks: • Tables must be joined to retrieve related data, a very expensive operation both in terms of CPU and memory • Users must understand relationships between Tables to pull information out of the databases • Users must learn SQL or rely on some query tool
Why Relational • Most stable -- mature technology with well established ANSI and ISO standards • A few reputable vendors for DBMS • Easy to define, maintain and manipulate data with SQL • Data is well protected using defined integrity constraints
History of DBMS • GUAM (Generalized Update Access Method from North American Aviation) provided the first hierarchical structure • In the mid-60’s, IBM joined NAA to develop GUAM into IMS (Information Management System), the first commercial hierarchical DBMS • IDS (Integrated Data Store) from General Electric as the first network DBMS
History of DBMS (2) • IBM extended IMS to include some network capabilities, overcoming the “single parent” restriction • CODASYL (Conference on Data Systems Languages) and the DBTG (Database Task Group) published standards in 1969-1971
History of DBMS (3) • Dr. E.F. (Ted) Codd delivers pioneering white paper "A Relational Model of Data for Large Shared Data Banks" in June, 1970. • CODASYL published specifications for a standard Network DBMS in 1971, which began 5 years of heated controversy between a group of Network DBMS advocates (including Charles Bachman) and a group of Relational advocates.
CODASYL “Camp" Position • Relational Model too mathematical that programmers would not be able to understand and use it. • An efficient implementation of Relational could not be built. • Online transaction processing applications want to do a-record-at-a-time processing, not a set processing
Relational “Camp” Position • Nothing as complicated as the DBTG proposal could possibly be the right way to do data management. • Set-oriented queries are too difficult to program using the DBTG language. • CODASYL has no formal underpinnings (read math).
A true event • The debate came to head at the 1975 ACMSIGMOD conference in a debate where Codd and 2 others squared off against Bachman and 2 others. The audience was more confused at the end of the two talks and ensuing discussion than at the outset.
CODASYL vs. Relational • By the late 1970's, interest in CODASYL began to decline. Michael Stonebraker believes this was because of 2 factors: • Easier to use relational languages such as QUEL and SQL were developed. • Prototype Relational RDBMS's were developed that proved that implementations could be done with reasonable efficiency.
Prototype Relational RDBMSs • System R, developed by 15 IBM researchers in San Jose under the direction of Frank King from 1974 to 1978. • INGRES developed by a team of UC Berkeley students under the direction of Michael Stonebraker and Eugene Wong from 1973 to 1977. • Sybase was started with some INGRES people
Early Commercial RDBMSs • System R was built commercially and became the basis for HP ALLBASE and IDMS/SQL. • A bit later, Larry Ellison started Oracle and independently implemented the external specifications for System R. • IBM, with some rewriting, developed System R into SQL/DS and DB2.
History of DBMSs • In 1976, Peter Chen presented the Entity-Relationship model, bolstering modeling weaknesses in the relational model. Many other modeling techniques followed. • Nowadays, when we say database, we mean relational database • Most databases are there to support transactional applications (one record at a time to record, multi records to retrieve for displaying), such as Amazon • Other database application types are batch and decision support systems.
Trends • XML for document processing and exchange of information • Data mining and decision support systems • Unstructured queries • NoSQL (MongoDB) • DBMS for cloud computing such as RDS from AMAZON