240 likes | 254 Views
DATABASE & INFORMATION MODELS. Data vs Information. Data : Collection of letters, numbers or facts Information : Processed data that provides value. What is Database ?. A database is a repository of data, designed to support efficient data storage, retrieval and maintenance
E N D
Data vs Information • Data: Collection of letters, numbers or facts • Information: Processed data that provides value
What is Database ? • A database is a repository of data, designed to support efficient data storage, retrieval and maintenance • A database may be specialized to store binary files, documents, images, videos, relational data, multidimensional data, transactional data, analytic data, or geographic data to name a few • Data: stored in various forms, namely tabular, hierarchical and graphical forms
What is a database management system ? • Database Management System(DBMS), • Software system that manages databases • the terms “Database”, “DBMS”, “data server”, “database server” often used interchangeably to refer to a DBMS • Why do we need database software or a DBMS? • Security • Can handle many users with good performance • Allows for concurrency while keeping data consistent • Protects from disaster
The evolution of database management systems • In the 1960s, network and hierarchical systems such as CODASYL and IMSTM • A revolutionary paper by E.F. Codd, an IBM San Jose Research Laboratory employee in 1970 The paper titled “A relational model of data for large shared data banks” • The Structured Query Language (SQL) invented by IBM in the early 1970's has been constantly improved through the years • Today many businesses need to exchange information, and the eXtensible Markup Language (XML) is the underlying technology that is used for this purpose
Introduction to information models and data models • An information model is an abstract, formal representation of entities that includes their properties, relationships and the operations that can be performed on them • Data model : mappings of the Information Model
The difference between information model and data model • The main purpose of an Information Model is to model managed objects at a conceptual level, independent of any specific implementations or protocols used to transport the data • Data Models, on the other hand, are defined at a more concrete level and include many details
Types of information models • Network (CODASYL): 1970’s • Hierarchical (IMS): late 1960’s and 1970’s • Relational: 1970’s and early 1980’s • Entity-Relationship: 1970’s • Extended Relational: 1980’s • Semantic: late 1970’s and 1980’s • Object-oriented: late 1980’s and early 1990’s • Object-relational: late 1980’s and early 1990’s • Semi-structured (XML): late 1990’s to the present
Network model • In 1969, CODASYL (Committee on Data Systems Languages) released its first specification about the network data model • This followed in 1971 and 1973 with specifications for a record-at-a-time data manipulation language
Hierarchical model • The first hierarchical database management system was IMS (Information Management System) released by IBM in 1968. • It was originally built as the database for the Apollo space program to land the first humans on the moon. • IMS is a very robust database that is still in use today at many companies worldwide
Relational model • 1970’s and early 1980’s • The relational data model is simple and elegant • It has a solid mathematic foundation based on sets theory and predicate calculus and is the most used data model for databases today
Entity-Relationship model • In the mid 1970’s, Peter Chen proposed the entity-relationship (E-R) data model • He proposed thinking of a database as a collection of instances of entities
Object-relational model • The Object-Relational (OR) model is very similar to the relational model • treats every entity as an object (instance of a class), and a relationship as an inheritance
Other data models • The last decade has seen a substantial amount of work on semi-structured, semantic and object oriented data models • XML is ideal to store semi-structured data • Object oriented data models are popular in universities, but have not been widely accepted in the industry
FILE SYSTEMS VERSUS A DBMS • scenario: A company has a large collection (say, 500 GB3 ) of data on employees, departments, products, sales, and so on. This data is accessed concurrently by several employees. Questions about the data must be answered quickly, changes made to the data by different users must be applied consistently, and access to certain parts of the data (e.g., salaries) must be restricted. • We can try to manage the data by storing it in operating system files.
FILE SYSTEMS VERSUS A DBMS • This approach has many drawbacks, including the following: • We probably do not have 500 GB of main memory to hold all the data. • Even if we have 500 GB of main memory, on computer systems with 32-bit addressing, we cannot refer directly to more than about 4 GB of data. • We have to write special programs to answer each question a user may want to ask about the data. • We must protect the data from inconsistent changes made by different users accessing the data concurrently. • We must ensure that data is restored to a consistent state if the system crac;hes while changes are being made. • Operating systems provide only a password mechanism for security.
FILE SYSTEMS VERSUS A DBMS • A DBMS is a piece of software designed to make the preceding tasks easier. • By storing data in a DBMS rather than as a collection of operating system files, we can use the DBMS's features to manage the data in a robust and efficient rnanner. • As the volume of data and the number of users grow hundreds of gigabytes of data and thousands of users are common in current corporate databases DBMS support becomes indispensable.
ADVANTAGES OF A DBMS • Data Independence: Application programs should not, ideally, be ex- posed to details of data representation and storage, The DBJVIS provides an abstract view of the data that hides such details. • Efficient Data Access: A DBMS utilizes a variety of sophisticated tech- niques to store and retrieve data efficiently. This feature is especially important if the data is stored on external storage devices. • Data Integrity and Security: If data is always accessed through the DBMS, the DBMS can enforce integrity constraints. For example, before inserting salary information for an employee, the DBMS can check that the department budget is not exceeded. Also, it can enforce access contmlsthat govern what data is visible to different classes of users.
ADVANTAGES OF A DBMS • Data Administration: When several users share the data, centralizing the administration of data can offer significant improvements. Experienced professionals who understand the nature of the data being managed, and how different groups of users use it, can be responsible for organizing the data representation to minimize redundancy and for fine-tuning the storage of the data to make retrieval efficient. • Concurrent Access and Crash Recovery: A DBMS schedules concur- rent accesses to the data in such a manner that users can think of the data as being accessed by only one user at a time. Further, the DBMS protects users from the effects of system failures.
ADVANTAGES OF A DBMS • Reduced Application Development Time: Clearly, the DBMS sup- ports important functions that are common to many applications accessing data in the DBMS. This, in conjunction with the high-level interface to the data, facilitates quick application development. DBMS applications are also likely to be more robust than similar stand-alone applications because many important tasks are handled by the DBMS (and do not have to be debugged and tested in the application).
is there ever a reason not to use a DBMS? • Sometimes, yes. A DBMS is a complex piece of software, optimized for certain kinds of workloads (e.g., answering complex queries or handling many concurrent requests), and its performance may not be adequate for certain specialized applications. • Examples include applications with tight real-time constraints or just a few well-defined critical operations for which efficient custom code must be written. • Another reason for not using a DBMS is that an application may need to manipulate the data in ways not supported by the query language. • In such a situation, the abstract view of the datet presented by the DBMS does not match the application's needs and actually gets in the way. As an example, relational databases do not support flexible analysis of text data (although vendors are now extending their products in this direction).