220 likes | 309 Views
Database Management Systems Lecture 2. Paul Tate. History (Tate’s source for History: wikipedia.com).
E N D
Database Management SystemsLecture 2 Paul Tate Prepared By Paul Tate
History (Tate’s source for History: wikipedia.com) • The earliest known use of the term data base was in November 1963, when the System Development Corporation sponsored a symposium under the title Development and Management of a Computer-centered Data Base. Database as a single word became common in Europe in the early 1970s and by the end of the decade it was being used in major American newspapers. (The abbreviation DB, however, survives.) Prepared By Paul Tate
The first database management systems were developed in the 1960s. A pioneer in the field was Charles Bachman. Bachman's early papers show that his aim was to make more effective use of the new direct access storage devices becoming available: until then, data processing had been based on punched cards and magnetic tape, so that serial processing was the dominant activity. Prepared By Paul Tate
Two key data models arose at this time: CODASYL developed the network model based on Bachman's ideas, and (apparently independently) the hierarchical model was used in a system developed by North American Rockwell later adopted by IBM as the cornerstone of their IMS product. Prepared By Paul Tate
The relational model was proposed by E. F. Codd in 1970. He criticized existing models for confusing the abstract description of information structure with descriptions of physical access mechanisms. For a long while, however, the relational model remained of academic interest only. While CODASYL products (IDMS) and network model products (IMS) were conceived as practical engineering solutions taking account of the technology as it existed at the time, the relational model took a much more theoretical perspective, arguing (correctly) that hardware and software technology would catch up in time. Prepared By Paul Tate
Among the first implementations were Michael Stonebraker'sIngres at Berkeley, and the System R project at IBM. Both of these were research prototypes, announced during 1976. The first commercial products, Oracle and DB2, did not appear until around 1980. The first successful database product for microcomputers was dBASE for the CP/M and PC-DOS/MS-DOS operating systems. Prepared By Paul Tate
During the 1980s, research activity focused on distributed database systems and database machines. Another important theoretical idea was the Functional Data Model, but apart from some specialized applications in genetics, molecular biology, and fraud investigation, the world took little notice. Prepared By Paul Tate
In the 1990s, attention shifted to object-oriented databases. These had some success in fields where it was necessary to handle more complex data than relational systems could easily cope with, such as spatial databases, engineering data (including software repositories), and multimedia data. Some of these ideas were adopted by the relational vendors, who integrated new features into their products as a result. The 1990s also saw the spread of Open Source databases, such as PostgreSQL and MySQL. Prepared By Paul Tate
In the 2000s, the fashionable area for innovation is the XML database. As with object databases, this has spawned a new collection of start-up companies, but at the same time the key ideas are being integrated into the established relational products. XML databases aim to remove the traditional divide between documents and data, allowing all of an organization's information resources to be held in one place, whether they are highly structured or not. Prepared By Paul Tate
Introduction to Database Development Simple definition of a database: A collection of interrelated, shared, and controlled data. The following pages deal with management issues relating to: • organizational data sharing, • strategic data planning, • management control of data, • and risks and costs of databases. Prepared By Paul Tate
Data Sharing and Databases • Perhaps the most significant difference between a file-based system and a database system is that data are shared. • This requires a major change in the thinking of users, who are accustomed to feeling they “own” the data resulting from their daily work activities. • Data sharing also requires a major change in the way data are handled and managed within the organization. Part of this comes from the sheer amount of data that needs to be organized and integrated. • Let’s look at three types of data sharing: • between functional units • Between management levels; and • Between geographically dispersed locations Prepared By Paul Tate
Sharing between functional areas • The term data sharing suggests that people in different functional areas use a common pool of data, each from their own applications. Without data sharing, the marketing group of a company may have their data files, the purchasing group theirs, the accounting group theirs, and so on. Each group benefits only from its own data. • In contrast, the effect of combining data into a database is synergistic; that is, the combined data are more valuable than the sum of the data in separate files. Not only does each group continue to have access to its own data but, within reasonable limits of control, they have access to other data as well. • Data Integration is the concept of combining data for common use. Prepared By Paul Tate
Sharing between different levels of users • Different levels of users also need to share data. Three different levels of users are normally distinguished: operations, middle management, and executive. These levels correspond to the three different types of automated business systems that have evolved during the past three decades: electronic data processing (EDP), management information systems (MIS), and decision support systems (DSS). Prepared By Paul Tate
Electronic Data Processing (EDP) • Was first applied to the lower operational levels of the organization to automate the paperwork. Its basic characteristics include: • A focus on data, storage, processing, and flows at the operational level; • Efficient transaction processing; • Summary reports for management. Prepared By Paul Tate
Management Information Systems (MIS) • The MIS approach elevated the focus on information sytems activities, with additional emphasis on integration and planning of the information systems function. In practice, the characteristics of MIS include: • An information focus, aimed at the middle managers • An integration of EDP jobs by business function, such as production MIS, marketing MIS, and personnel MIS, etc.; and • Inquiry and report generation, usually with a database Prepared By Paul Tate
Decision Support System (DSS) • Is focused still higher in the organization with an emphasis on the following characteristics: • Decision focused, aimed at top managers and executive decision makers; • Emphasis on flexibility, adaptability, and quick response; • Support for the personal decision-making styles of individual managers. Prepared By Paul Tate
Sharing Data Between Different Locations • A company with several locations has important data distributed over a wide geographical area. Sharing these data is a significant problem. • A centralized database is physically confined to a single location, controlled by a single computer. Most functions for which databases are created are accomplished more easily if the database is centralized. That is, it is easier to update, back up, query, and control access to a database if we know exactly where it is and what software controls it. Prepared By Paul Tate
… • The size of the database and the computer on which it resides need not have any bearing on whether the database is centrally located. A small company with its database on a personal computer has a centralized database just as does a large company with many computers, but whose database is entirely controlled by a mainframe. Prepared By Paul Tate
A distributed database system • is made up of several database systems running at local sites connected by communication lines. A query or update is then no longer a single process controlled by one software module, but a set of cooperating processes running at several sites, controlled by independent software modules. Clearly, for a distributed database system to function effectively, adequate communications technology must be available, and the DBMSs in the system must be able to communicate while interfacing with the communications facilities. Prepared By Paul Tate
… • Distributed database systems are attractive because they make possible the localization of data: Data reside at those sites where they are referenced most frequently. With the availability of powerful database software on client/server local area networks (LANs) and even on PCs, it is reasonable to create distributed systems that allow local users to manipulate loca data, while at the same time providing means for off-site users and centrally located management to access these same data as their needs require. • This approach improves cost-effectiveness and local autonomy. The cost, in addition to that cuased by the need for data communication, is more complexity in the total database system – a problem which must be solved by system designers. Prepared By Paul Tate
The Role of the Database • As this discussion indicates, achieving the goal of sharing data is complex. With this in mind, let’s take a second look at the definition of an effective database. As we said, a database is a collection of interrelated, shared, and controlled data. Both sharing and controlling are facilitated through data integration. Thus, this definition contains three criteria for an effective database. • First, data must be shared. As we have seen, data can be shared between functional units, between levels of management, and between different geographical units. Prepared By Paul Tate
Second, data use must be controlled. Control is provided by a database management system (DBMS) whose facilities are managed by personnel known as database administrators. • Third, data are integrated in a locally sound fashion so redundancies are eliminated, ambiguities of definition are resolved, and internal consistency between data elements is maintained. The local structure of data integration makes data sharing and control practical on a large scale. Without integration, it would be extremely difficult to manage and maintain consistency between large numbers of different files. End of lecture 2 Prepared By Paul Tate