610 likes | 624 Views
Databases. Anirban Banerjee Anirban@cs.ucr.edu University of California, Riverside. What is this talk about?. Idea about Databases What are Databases How to use it? Implementation issues!. Roadmap. Databases Introduction (9 AM -10:15 AM) Break : 10-15 - 10:30
E N D
Databases Anirban Banerjee Anirban@cs.ucr.edu University of California, Riverside
What is this talk about? • Idea about Databases • What are Databases • How to use it? • Implementation issues!
Roadmap • Databases • Introduction (9 AM -10:15 AM) • Break: 10-15 - 10:30 • Advanced Concepts (10:30 AM - 12 PM) • Break: 12 - 1 • Practical Aspects (1 PM - 2 PM) • Break: 2 – 2:15 • Group Activity (2:15 PM - 3 PM)
Databases • Introduction: • Definitions • Elements • Basic database concepts
Databases • Definitions • DBMS • A very large, integrated collection of data. • Models real-world enterprise. • Entities (e.g., students, courses) • Relationships (e.g., Madonna is taking CS564) • A Database Management System (DBMS)is a software package designed to store and manage databases.
Databases • Examples • Student Management System • Airline Ticket Management System • Inventory Management System • Public Transport Management System
Databases • Why use a dbms? • Data independence and efficient access. • Reduced application development time. • Data integrity and security. • Uniform data administration. • Concurrent access, recovery from crashes.
Databases • Data independence • Change in one piece of data should not effect the whole system • Insulate the rest of the system • Efficient access • Fast • Use less machine resources
Databases • Reduced application development time • Programmers can concentrate on making applications which use the database • Standard methods used to access the database • Minimum time spent on database per se
Databases • Data integrity and security • Integrity: user cannot enter invalid value • Cannot enter negative number for age • Easy to check values as they are being entered • If not present, one wrong value can crash the whole query (e.g. division by zero) • Security: everyone cannot have access to all types of data • Information can be very sensitive/personal
Databases • Uniform data administration • All data will be entered using some common rules • No data will have special characters like #,@,$ etc.. • Makes it easy to implement system wide policies
Databases • Concurrent access • Large databases have many people making entries at the same time • Airline reservation system • Must handle thousands of requests in a short amount of time • Must “lock” and “serialize” requests
Databases • Error recovery • What happens when a Hard disk fails • Have you lost the data • Back-ups are necessary • Automated process • Partial images • Log files can help
Databases • Definitions • Data model: is a collection of concepts for describing data • Schema : description of a particular collection of data, using the a given data model *** • Table: A collection of Rows and columns storing data • Field: A single column in a table describing a feature of the data
Databases • Data Model • The structural part: a collection of data structures used to create databases representing the entities or objects. • The integrity part: a collection of rules governing the constraints. • The manipulation part: a collection of operators which can be applied to the data structures.
Databases • Schema • map of concepts and their relationships. • Each PERSON may be the vendor in one or more ORDERS. • Each ORDER must be from one and only one PERSON.
Databases • Table • A format for storing data. • Like a diary*** • You have the choice of entering some specific information • Some predefined features have been chosen for you • Name, Address, Date, Phone #
Databases • Field • A feature in a table • Name • Age • Address • Important to select the correct fields for efficiency
A typical DBMS has a layered architecture. Remember the cake! Network layers Why are they important? Databases
Easy to make changes to layers. The figure does not show the concurrency control and recovery components. This is one of several possible architectures; each system has its own variations. Query Optimization and Execution Relational Operators Files and Access Methods Buffer Management Disk Space Management DB Databases
Databases • Definitions • Key • A Field using which a search will be conducted in the DB • Primary key (unique: SSID) • Secondary key (Hair Color) • Foreign key (unique: SSID from another table) • Uniquely identify an entry in a table
Databases • Definitions • Query • A question to which the DB must provide an answer • Names of all people who bought a red car in the last month • SELECT customer_name FROM CAR-TABLE WHERE car_color = ‘red’;
Databases • Definitions • example CREATE TABLE Dept_Mgr( did INTEGER, dname CHAR(20), budget REAL, ssn CHAR(11), since DATE, PRIMARY KEY (did), FOREIGN KEY (ssn) REFERENCES Employees)
Databases • Definitions • example CREATE TABLE Manages( ssn CHAR(11), did INTEGER, since DATE, PRIMARY KEY (did), FOREIGN KEY (ssn) REFERENCES Employees, FOREIGN KEY (did) REFERENCES Departments)
Roadmap • Databases • Introduction (9 AM -10:15 AM) • Break: 10-15 - 10:30 • Advanced Concepts (10:30 AM - 12 PM) • Break: 12 - 1 • Practical Aspects (1 PM - 2 PM) • Break: 2 – 2:15 • Group Activity (2:15 PM - 3 PM)
Advanced Concepts • Definitions • Constraint • A rule to make data conform to a specification • No duplicates, no NULL values • E.g. Age can’t be –ve. • Bonus = # of years worked/age * salary • If age < 0 , system can crash
Entity Relationship:Designing a DB*** • Definitions • Entity-Relationship model • Entity:Real-world object distinguishable from other objects. • Car • Building • Human • Computer • Cake • Shirt
Advanced Concepts • Definitions • Entity-Relationship model • An entity is described (in DB) using a set of attributes • Car: color, model, engine-type, number-of-doors • Building: Number-of-floors, address • Human: Name, age, hair-color, height
Advanced Concepts • Definitions • Entity Set: A collection of similar entities. E.g., all employees. • All entities in an entity set have the same set of attributes (usually). • Each entity set has a key. • The set of machines in a company • Attributes: Model, color, serial #, type, cost, repair-date • Key: serial #
Advanced Concepts • Definitions • Entity-Relationship model • Relationship:Association among two or more entities. • Anirban works in Pharmacy department. • Sharon studies at UCR • Bill eats at McDonalds
Advanced Concepts • Definitions • Entity-Relationship model • Relationship Set:Collection of similar relationships. • Same entity set could participate in different relationship sets, or in different “roles” in same set. • Entity 1 <works at> Entity 2
since name dname ssn budget lot did Works_In Employees Departments Advanced Concepts
Advanced Concepts • Elements • Access Modules • Why is this module important? • Check who has access • Grant access to correct data
Advanced Concepts • Elements • Concurrency • Manage multiple users at same time • Keep log files to track changes • Possibility for Rollback
Accessing the DB*** • Elements • Queries • Can be specified in SQL language • Can be used for complex analysis • Can be pre-computed
Advanced Concepts • Elements • Forms • Interface for a query • Easily integrated with web-pages SELECT [DISTINCT] target-list FROMrelation-list WHERE qualification
Advanced Concepts • Elements • Network Modules • Manages connections over the network to other databases • Handles connections from remote users
Advanced Concepts • Elements • Reports • Efficient way to represent output of queries • Various formats (Crystal-reports) • Easy to read and understand
Advanced Concepts • How things Interact • Consistency • Locking mechanism to prevent data corruption • 2 phase locking (strict and non-strict)
Advanced Concepts • How things Interact • 2 Phase locking • Phase 1 : acquire locks • Phase 2 : release locks A Resource 1 Resource 2 Resource 3
Advanced Concepts • How things Interact • Concurrency • Multi-threaded engine • Must keep track of who is modifying which part • Rollback support for each user necessary
Advanced Concepts • Distributed databases • Data is not stored at one location • Data distribution must be transparent • Transactions must be transparent
Advanced Concepts • Distributed databases • Distributed db is different from replicated db • Schema, access issues can be very complicated • Can cost more, $$$
Advanced Concepts • Pop Quiz • What is a distributed db? • What are its benefits? • What are the downsides?
Roadmap • Databases • Introduction (9 AM -10:15 AM) • Break: 10-15 - 10:30 • Advanced Concepts (10:30 AM - 12 PM) • Break: 12 - 1 • Practical Aspects (1 PM - 2 PM) • Break: 2 – 2:15 • Group Activity (2:15 PM - 3 PM)
Practical Aspects • Performance Issues • IMPORTANT: think about a good DB design • Remember Schema • Think about appropriate tables • Think about appropriate fields • Think about appropriate keys
Practical Aspects • Performance Issues • Design the application correctly (fewer queries) • Decide how the application is going to access the data • Which part of the data is going to be used • How frequently is each part going to be used • Use standard efficient interfaces
Practical Aspects • Performance Issues • Create indexes*** for frequently used columns • Optimize queries (select * is bad) • Predict what kind of queries will be asked • Use indexes, keys and constraints • Design db such that 90% of queries answered really fast
Practical Aspects • Performance Issues • Network latency • Simply having a good design is not enough • You need network bandwidth • Remember the types of connections, Aug-4th • Test your system properly • You may want to set up a part of the db near where most queries originate
Practical Aspects • Performance Issues • Hardware • Simply having a good design and bandwidth is not enough • You need fast hardware • Which Hard Drives • 7200 RPM, 10k+ RPM • SCSi, SATA, SATA II, PATA • How big 500 GB, 1 TB, 1TB+ • Solid State Drives, SSD