1 / 51

IT-390: Survey of Database Systems Lecture 1

IT-390: Survey of Database Systems Lecture 1. Yosef Mendelsohn College of CDM, DePaul University. My thanks for Prof. Alexander Rasin for the original version of these notes. Topics. Administrative details Review of DBMS fundamentals Review of basic SQL Review of Normalization.

quirion
Download Presentation

IT-390: Survey of Database Systems Lecture 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IT-390: Survey of Database SystemsLecture 1 Yosef Mendelsohn College of CDM, DePaul University My thanks for Prof. Alexander Rasin for the original version of these notes.

  2. Topics • Administrative details • Review of DBMS fundamentals • Review of basic SQL • Review of Normalization

  3. Course Administration • See Syllabus • See course web page at: • http://condor.depaul.edu/ymendels/390

  4. Important Details • Many common questions are answered in the syllabus. • For this reason, please read it and check it out when you have a question. • No electronic devices (unless taking notes) • Be sure to read DePaul’s Academic Integrity Policy

  5. What is a database? • Data is information that is/needs to be recorded • A database is an organized collection of logically related data • Persistent: is stored on a stable medium • Shared: has multiple uses and interested users • Interrelated: forms a bigger picture

  6. Database Management System • A database management system (DBMS) is a collection of software components that lets you accomplish the following database functionality: • create (define, construct) • maintain (update, modify, enforce constraints in) • control access to (secure, allow concurrency in) • other When you see a term in bold, that means it's a term with which I would like you to be familiar and comfortable using.

  7. DBMS • DBMS Examples: Oracle, IBM DB2, MS Access/SQL Server, MySQL, postgreSQL • We can work with a DBMS directly or through an application that supplies a particular interface (which could be very general or quite restrictive) • Direct Access: Oracle, SQLite • Via an application: CampusConnect • The database and DBMS together make up a database system

  8. Why would you use a database? • Early data processing systems used files of data in text form • Problem: program-data dependence led to • limited data sharing • duplication of data • excessive time for development and maintenance

  9. Benefits of Database Systems • A database system uses a single repository of data accessed by multiple users • Contains information on the structure of the data • Supports different views of the data • Allows sharing of and concurrent access to data • The costs are higher overhead for the design, implementation, and maintenance of the data • Why not store data in a TXT file?

  10. Retrieval Problems • Data retrieval: • Find the students (names) who took CSC455 • Find the students with GPA > 3.5 For every query we would need to write a program! • Want database retrieval to be: • Easy to write • Executed efficiently

  11. Data retrieval • Query = declarative data retrieval • describes what data to retrieve, not how to retrieve it • Compare the query vs coding examples: • Query: Give me the students with GPA > 3.5 • Coding: Scan the student file and retrieve the records with GPA > 3.5 • The query version • Easier to write • More efficient to execute

  12. Data Integrity • Data Integrity in flat file (file system) model • Poor support for sharing: • Simultaneous modifications (transaction control) • Poor coping mechanisms for system crashes • No means of Preventing Data Entry Errors (checks must be hard-coded in the programs) • Securityproblems • Database systems offer solutions to all the above problems

  13. The Relational Model • First introduced by Edgar F. Codd in 1970 • A database is made up of two-dimensional tables called relations, each representing some type of entity recorded in the database • Each relation has a name, describing what entity it represents

  14. Relational Model • First model to separate the logical structure of the database from its physical structure • Data are divided into two-dimensional tables called relations • Relationships between tables are given by shared keys • Rules exist for dividing data among tables • A standardized query language (SQL) for interacting with the relations

  15. Relational Model Example Attributes Customer- street Customer- city Account- number Customer- name Customer-id Johnson Smith Johnson Jones Smith 192-83-332 019-28-553 192-83-290 321-12-312 019-28-321 Alma North Alma Main North A-101 A-215 A-201 A-217 A-201 Palo Alto Rye Palo Alto Harrison Rye Records Schema = Customer (Customer-id, customer-name, customer-city, account-number) Key This whole table is an instance of the Customer schema.

  16. The Relational Model • Each row of a relation is a tuple(or record), representing one instance of that entity • Each column of a relation is an attribute for which each tuple has a value assigned • Each attribute has a domain from which its values are taken • A domain consists of atomic values – no multi-valued or divisible attributes are allowed • Missing/unknown values are indicated by NULL • The 'null' value is going to be important

  17. Typical Domains • Numerical values • integers, fixed-point values, floating-point values • Character strings • fixed-length or variable-length • Dates and times • We can also enforce constraints on values • ranges, upper/lower bounds, allowed/forbidden values, no NULLs, et cetera

  18. Properties of a Relation • Each relation has a unique name (in database) • Each attribute has a unique name (in relation) • Each entry of a relation contains a single value from its attribute’s domain • The order of the records does not matter • The order of the attributes does not matter • No two records in a relation are completely identical

  19. Relation Schema • A relation schema of consists of the name of a relation followed by a list of its attributes • Typically, we write out a schema by indicating the title, followed by the attributes in parentheses. The primary key attribute is/are underlined. • eg: STUDENT(StudentID, FirstName, LastName) • Schemas can also be represented graphically as a row of rectangles, one for each attribute • When creating the relation, the domain must be specified for each attribute • e.g., ID is a seven-digit number, FirstName and LastName are both strings of length at most 20

  20. Schema vs. Snapshot • The schemas that make up a database describe the structure of the DB, but do not tell us what is in it • Creating the schemas is defining the database • The information (data) stored in the database at some point in time make up a snapshot (or databasestate) • Inputting data into the DB is known as populatingthe database

  21. Relation • A relation for a given schema is a set of tuples, where each tuple contains a value for every attribute. That value may be either: • an element from the attribute's domain • the value NULL • eg: Suppose we have the following schema for a relation called 'FACULTY': • FACULTY(FacID, FirstName, LastName) • We might populate with: • { (1234567, ‘Eric’, ‘Schwabe’), (9999999, ‘Amber’, ‘Settle’), (33334444,'Alex','Rasin') } Relations are frequently referred to as 'tables'. This is fine, but strictly speaking, some tables do not qualify as relations.

  22. NULL • For some attributes, NULL can be used in place of a value from the domain • Most attributes allow NULL values unless they are a primary key, or have a 'NOT NULL' constraint • NULL can have several meanings: • The value is unknown • A value exists, but is not currently available • The attribute does not apply to that tuple • Preferable to avoid NULLs when possible

  23. Primary Keys • One or more attributes in the relation may be defined to be the primary key. • The primary key must be a minimal set of attributes that uniquely identifies each tuple in the relation • NULL values are not allowed in the primary key of a relation (“entity integrity”) • The primary key is underlined in the schema Examples: COURSE(CourseID, CourseName, CreditHours, Department) ALL_GRADES(StudentID, CourseID, Grade)

  24. Foreign Keys • It is possible to establish a connection between two relations by using a shared key. This key must be the primary key in one of the two relations. • In the referencing relation, this key is called a foreign key • In drawing the schemas, we draw an arrow from the foreign key in one relation, to the primary key in the other • The foreign key associates each tuple in the referencing relation with exactly one tuple in the referenced relation

  25. Foreign key: Referential Integrity • Every foreign key in a tuple must have a value that matches a primary key in the referenced relation • This restricts the changes that can be made: • Can only insert or update a tuple if the value of any foreign key present in the tuplealso appears as a primary key in the relation that it references • Can only delete or update a tuple if the value of its primary key does not appear among the values of any of the foreign keys that reference it Tip: The key word here is 'referential'. I.E. Referential integrity refers to the fact that we are discussing two tables that reference one another.

  26. Constraints • Sometimes we wish to limit the possible values that can be entered into an attribute. • Example: In a 'Cost' attribute, we might want to indicate that the value must be >0. • Some constraints are required by the DBMS • Example: Primary Keys may not ever have a value of NULL • Constraints are maintained by the DBMS: • Domain constraints: All values of each attribute must come from the specified domain • Entity integrity: No attribute in a primary key can contain a NULL value • Referential integrity: Every foreign key value must match the value of the primary key in some tuple of the table that it references • Custom constraints • E.g. account balance must be positive

  27. Handling Violations • When an operations violates a constraint, the default behavior for the DBMS is to disallow the operation. • However, there are other options: • Prompt user to change offending value(s) • Use default value in place of offending value(s) • Use NULL in place of offending value(s) • Allow operation, but cascade changes from primary keys to foreign keys as needed

  28. Summary: Relational Database Schema • Collection of relation schemas, each including: • Name of relation • Attributes listed in parentheses (or in rectangles) • Primary key is underlined (solid) • Arrows from foreign keys to linked primary keys

  29. SQL Structured Query Language (SQL) is the industry standard for relational databases SQL enables us to create, maintain, update, and query databases Used to be known as SEQUEL (Structured English Query Language), developed at IBM … but trademarked by Hawker Siddeley, a UK aircraft company So it is typically pronounced by the letters, as opposed to 'sequel'.

  30. Classes of SQL Commands Data Definition Language (DDL) Create schemas, tables, constraints, views Data Manipulation Language (DML) Modify and update tables, retrieve information Data Control Language (DCL) Grant and revoke access to parts of database Most people who interact with a DB only have access to the DML This is because most users are more interested in querying and perhaps updating the DB, than in creating DBs and relations.

  31. SQL All major DBMSs support some version of SQL (SQL-99 is the one you are likely to see) SQL statements can be issued interactively batched in script files embedded in a program in general-purpose programming language (e.g., Java, C++) We will experiment with all of these in the course

  32. Running SQL Scripts A script is sequence of SQL commands stored in a file Create in a text editor, or an editor window provided by the DBMS Typically saved with a '.sql' extension Each DBMS has its own way to allow you to run these scripts

  33. Working with SQL Scripts Create file scriptname.sqlin Notepad Be sure to use .sql extension To add comments: -- for a single-line comment /* ... */ for a multi-line comment Can incorporate a large number of SQL commands in one batch operation

  34. Creating a Table CREATE TABLE TABLE_NAME(Attribute1 DOMAIN_1, Attribute2 DOMAIN_2, ... AttributeN DOMAIN_N); You can list as many attributes and domains as you want, separated by commas Recall that the domain refers to the allowable data type of the attribute. Eg: Text, Numbers, Dates, etc

  35. SQL Domains Numerical domains: NUMBER: A general floating-point number NUMBER(x, y): A fixed-precision number with x total digits, and y digits to the right of the decimal point NUMBER(*, 0): A general integer [* refers to the max digits which is approx. 38] E.g. for up to 100, you would say NUMBER(4,2) E.g. if you said (2,2) you’d be limited to 0.99 NUMBER(n): An n-digit integer NOTE: Domains often have subtle variations between different DBMSs. The examples in these slides are for Oracle, and may not work for certain others. We will explore some of these differences in this course.

  36. SQL Domains String domains: CHAR(n): A fixed-length string of exactly n characters VARCHAR2(m) or VARCHAR(m): A variable-length string of up to m characters Note: varchar2 is specific to Oracle Dates: DATE: A date in dd-mon-yy format (dd = day, mon = month, yy = year) EG: [ 30,sep,14 ]  note 3 letters for month

  37. Table Definition Example CREATE TABLE Student ( sidNUMBER(7), name VARCHAR2(16), login VARCHAR2(32), age NUMBER(3), gpaNUMBER(3,2) ); CREATE TABLE Enrolled ( sidNUMBER(7), cidVARCHAR2(32), grade CHAR(1) ); Integer Range Variable String Length Fixed String Length

  38. Defaults and Constraints Placed after the attribute and domain, but before the comma: Can add default value for the attribute with DEFAULT value Can disallow NULL values with NOT NULL Can impose general constraints with CHECK (condition) e.g., to require that attribute is within a range, add CHECK (value1 <= Attribute AND Attribute <= value2) The DBMS will check / apply constraints whenever a tuple is added / modified

  39. Creating Primary/Foreign Keys Set up primary and foreign keys using CONSTRAINT clauses within CREATE, after all attributes have been specified CONSTRAINT PKNamePRIMARY KEY (Att1, Att2, ...) CONSTRAINT FKNameFOREIGN KEY (Att1, Att2, …) REFERENCES TABLE_NAME(Att1, Att2, ...) Separate all clauses with commas

  40. Table Definition Example CREATE TABLE Student ( sidNUMBER(7), name VARCHAR2(16), login VARCHAR2(32), age NUMBER(3), gpaNUMBER(3,2), CONSTRAINT StudentPK PRIMARY KEY (sid) ); Primary Key

  41. Table Definition Example CREATE TABLE Enrolled ( sidNUMBER(7), cidVARCHAR2(32), grade CHAR(1), CONSTRAINT EnrolledFK1 FOREIGN KEY (sid) REFERENCES STUDENT(sid); ); Foreign Key

  42. ALTER TABLE • Use 'ALTER TABLE' if you wish to make modifications to an existing table's schema ALTER TABLE TABLE_NAME{ADD (Attribute DOMAIN)} {DROP COLUMN Attribute CASCADE}

  43. UPDATE UPDATE TABLE_NAMESET Attribute = valueWHERE condition; • Sets Attribute to value in exactly those tuples that satisfy condition

  44. DELETE DELETE FROM TABLE_NAMEWHERE condition; • Removes from the table exactly those tuples that satisfy condition

  45. Handling Constraint Violations • Operations that violate domain constraints, entity integrity, or referential integrity are rejected by default • SQL can specify alternatives for foreign and primary keys ON UPDATE and ON DELETE • CASCADE, SET NULL, SET DEFAULT • Example of a DBMS-specific requirement: Oracle only supports ON DELETE, if you include either CASCADE or SET NULL…

  46. Populating a Table To insert a record into a table: INSERT INTO TABLE_NAMEVALUES (value1, value2, value3, ... ); Values of attributes must be given in the same order as in the schema Will generate an error if any constraints are violated (domain constraints, entity integrity, referential integrity, user-defined constraints)

  47. Populating a Table (continued) To insert a record that specifies only some of the attributes: INSERT INTO TABLE_NAME(Attr1, Attr2,...) VALUES (value1, value2, ... ); Missing attributes will be filled in with NULL Unless default values were specified when the table was created

  48. Removing Tables To remove a table: DROP TABLE TABLE_NAME ; There are also ways to remove or modify individual records within tables… TIP: When you are practicing / experimenting with creating tables, it's a good idea to include this command at the top. Otherwise, when you try to create a table that already exists, you'll get an error.

  49. Displaying Table Contents SELECT * FROM TABLE_NAME ; • This will display the entire contents of the table TABLE_NAME (all rows and columns) • Where the results are displayed varies depending on the DBMS • This is an example of a very simple query • Addition of a WHERE clause lets us display only a subset of the records

More Related