1 / 32

Lec1 - Introduction

Lec1 - Introduction. CSCI-4380 Database Systems. Content of CSCI4380. Course Wiki Page. Why Database Systems?. Massive amounts of data being collected in different business and scientific disciplines Banking, Airlines, Human Resources, Finance, Health

Download Presentation

Lec1 - Introduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lec1 - Introduction CSCI-4380 Database Systems

  2. Content of CSCI4380 • Course Wiki Page

  3. Why Database Systems? Massive amounts of data being collected in different business and scientific disciplines Banking, Airlines, Human Resources, Finance, Health WWW, Web Search, Online Stores (Amazon, eBay, etc) Biology, Chemistry, Materials science, Astronomy, Ecology, Geology, Economics, and many more Provide for a systematic way to address the challenges across/at the intersection of the diverse fields Enable Data-science & informatics: web science and bio-, chem-, eco-, geo-, astro-, materials- informatics

  4. Economics & Finance

  5. World Wide Web

  6. Bioinformatics Datasets: Genomes Protein structure DNA/Protein arrays Interaction Networks Pathways Metagenomics Integrative Science Systems Biology Network Biology

  7. Astro-Informatics: US National Virtual Observatory (NVO) New Astronomy Local vs. Distant Universe Rare/exotic objects Census of active galactic nuclei Search extra-solar planets Turn anyone into an astronomer

  8. Ecological Informatics Analyze complex ecological data from a highly-distributed set of field stations, laboratories, research sites, and individual researchers

  9. Geo-Informatics

  10. Cheminformatics Structural Descriptors Physiochemical Descriptors Topological Descriptors Geometrical Descriptors AAACCTCATAGGAAGCATACCAGGAATTACATCA…

  11. Materials Informatics

  12. Massive and distributed datasets: tera-/peta-scale Various modalities: Tables Images Video Audio Text, hyper-text, “semantic” text Networks Spreadsheets Multi-lingual Database Challenges

  13. Bigger Picture: New Science of Data New data models: dynamic, streaming, etc. New mining, learning, and statistical algorithms that offer timely and reliable inference and information extraction: online, approximate Self-aware, intelligent continuous data monitoring and management Data and model compression Data provenance Data security and privacy Data sensation: visual, aural, tactile Knowledge validation: domain experts

  14. DATABASE MANAGEMENT SYSTEMS (DBMS)

  15. File Processing • Data are stored in files with interface between programs and files. • Various access methods exist (e.g., sequential, indexed, random) • One file corresponds to one or several programs. PROGRAM 1 Data Management FILE 1 File System Services PROGRAM 2 Redundant Data Data Management PROGRAM 3 FILE 2 Data Management

  16. Problems With File Systems • Data are highly redundant • sharing limited and at the file level • Data is unstructured • “flat” files • High maintenance costs • data dependence; accessing data is difficult • ensuring data consistency and controlling access to data • Sharing granularity is very coarse • Difficulties in developing new applications

  17. Database Approach USER 1 DBMS Query Processor Integrated Database PROGRAM 1 Transaction Mgr … PROGRAM 2

  18. What is a Database? • A database (DB) is a structured collection of data about the entities that exist in the environment that is being modeled. • The structure of the database is determined by the abstractdata model that is used. • A database management system (DBMS) is the generalized tool that facilitates the management of and access to the database.

  19. Typical DBMS Architecture Naïve users Casual users Application programmers Database administrator Forms Application Front ends DML Interface CLI DDL SQL Commands DBMS Query Evaluation Engine DDL Compiler Transaction & Lock Manager File & Access Methods Buffer Manager Recovery Manager Disk Space Manager Indexes System Catalog Data files

  20. Data Model • Formalism that defines what the structure of the data are • within a file • between files • File systems can at best specify data organization within one file • Alternatives for business/scientific data • Hierarchical; Network • Relational; Object-oriented • Semi-structured • This structure is usually called the schema • A schema can have many instances

  21. Conceptual Data Model • Conceptual models describe what data is to be stored and how. • What are the rules that define integrity? • What are the common processes? • What actions should be taken if integrity is compromised? • Who is allowed to access what information?

  22. Physical Data Model • Physical models describe how data is stored on disk and what access methods are available to it • Which data type is assigned to which attribute ? • What indices are available • Data independenceis the separation of the physical implementation from the conceptual view of the database. • Data independence is enhanced by: • Query optimization • Transaction processing

  23. Data Independence • Invisibility (transparency) of the details of conceptual organization, storage structure and access strategy to the users • Logical • transparency of the conceptual organization • transparency of logical access strategy • Physical • transparency of the physical storage organization • transparency of physical access paths

  24. DBMS Languages • Data Definition Language (DDL) • Defines conceptual schema, external schema, and internal schema, as well as mappings between them • Language used for each level may be different • The definitions and the generated information is stored in system catalog • Data Manipulation Language (DML) • Can be • embedded query language in a host language • “stand-alone” query language • Can be • Procedural: specify where and how (navigational) • Declarative: specify what

  25. Database Functionality • Integrated schema • Users have uniform view of data • They see things only as relations (tables) in the relational model • Declarative integrity and consistency enforcement • 24000  Salary  250000 • No employee can have a salary greater than his/her manager. • User specifies and system enforces. • Individualized views • Restrictions to certain relations • Reorganization of relations for certain users

  26. Database Functionality (cont’d) • Declarative access (relational model) • Query language - SQL • Find the names of all electrical engineers. SELECT ENAME FROM EMP WHERE TITLE = “Elect. Eng.” • Find the names of all employees who have worked on a project as a manager for more than 12 months. SELECT EMP.ENAME FROM EMP,WORKS WHERE RESP = “Manager” AND DUR > 12 AND EMP.ENO = WORKS.ENO • System determined execution • Query processor & optimizer

  27. Database Functionality (cont’d) • Transactions • Execute user requests as atomic units • May contain one query or multiple queries • Provide • Concurrency transparency • Multiple users may access the database, but they each see the database as their own personal data • Concurrency control • Failure transparency • Even when system failure occurs, database consistency is not violated • Logging and recovery

  28. Database Functionality (cont’d) • ACID Transaction Properties • Atomicity • All-or-nothing property • Consistency • Each transaction is correct and does not violate database consistency • Isolation • Concurrent transactions do not interfere with each other • Durability • Once the transaction completes its work (commits), its effects are guaranteed to be reflected in the database regardless of what may occur

  29. Types of DBMS • Relational – data model based on tables • Network – data model based on graphs with records as nodes and relationships between records as edges • Hierarchical – data model based on trees • Semi-structured – XML based on trees/graphs • Object-Oriented – data model based on the object-oriented programming paradigm • Object-Relational – model objects via tables • Distributed – composed of several independent DBMSs running at the nodes of a communications network

  30. Brief History of Databases • 1960s: • Early 1960s: Charles Bachmann developed first DBMS at Honeywell (IDS) • Network model where data relationships are represented as a graph. • Late 1960s: First commercially successful DBMS developed at IBM (IMS) • Hierarchical model where data relationships are represented as a tree • Still in use today (SABRE reservations; Travelocity) • Late 1960s: Conference On DAta Systems Languages (CODASYL) model defined. This is the network model, but more standardized.

  31. Brief History of Databases • 1970s: • 1970: Ted Codd defined the relational data model at IBM San Jose Laboratory (now IBM Almaden) • Two major projects start (both were operational in late 1970s) • INGRES at University of California, Berkeley • Became commercial INGRES, followed-up by POSTGRES which was incorporated into Informix • System R at IBM San Jose Laboratory • Became DB2 • 1976: Peter Chen defined the Entity-Relationship (ER) model

  32. Brief History of Databases • 1980s • Maturation of relational database technology • SQL standardization (mid-to-late 1980s) through ISO • The real growth period • 1990s • Continued expansion of relational technology and improvement of performance • Distribution becomes a reality • New data models: object-oriented, deductive • Late 1990s: incorporation of object-orientation in relational DBMSs  Object-Relational DBMS • New application areas: Data warehousing and OLAP, Web and Internet, interest in text and multimedia

More Related