1 / 61

Databases

Databases. Anirban Banerjee Anirban@cs.ucr.edu University of California, Riverside. What is this talk about?. Idea about Databases What are Databases How to use it? Implementation issues!. Roadmap. Databases Introduction (9 AM -10:15 AM) Break : 10-15 - 10:30

sallison
Download Presentation

Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Databases Anirban Banerjee Anirban@cs.ucr.edu University of California, Riverside

  2. What is this talk about? • Idea about Databases • What are Databases • How to use it? • Implementation issues!

  3. Roadmap • Databases • Introduction (9 AM -10:15 AM) • Break: 10-15 - 10:30 • Advanced Concepts (10:30 AM - 12 PM) • Break: 12 - 1 • Practical Aspects (1 PM - 2 PM) • Break: 2 – 2:15 • Group Activity (2:15 PM - 3 PM)

  4. Databases • Introduction: • Definitions • Elements • Basic database concepts

  5. Databases • Definitions • DBMS • A very large, integrated collection of data. • Models real-world enterprise. • Entities (e.g., students, courses) • Relationships (e.g., Madonna is taking CS564) • A Database Management System (DBMS)is a software package designed to store and manage databases.

  6. Databases • Examples • Student Management System • Airline Ticket Management System • Inventory Management System • Public Transport Management System

  7. Databases • Why use a dbms? • Data independence and efficient access. • Reduced application development time. • Data integrity and security. • Uniform data administration. • Concurrent access, recovery from crashes.

  8. Databases • Data independence • Change in one piece of data should not effect the whole system • Insulate the rest of the system • Efficient access • Fast • Use less machine resources

  9. Databases • Reduced application development time • Programmers can concentrate on making applications which use the database • Standard methods used to access the database • Minimum time spent on database per se

  10. Databases • Data integrity and security • Integrity: user cannot enter invalid value • Cannot enter negative number for age • Easy to check values as they are being entered • If not present, one wrong value can crash the whole query (e.g. division by zero) • Security: everyone cannot have access to all types of data • Information can be very sensitive/personal

  11. Databases • Uniform data administration • All data will be entered using some common rules • No data will have special characters like #,@,$ etc.. • Makes it easy to implement system wide policies

  12. Databases • Concurrent access • Large databases have many people making entries at the same time • Airline reservation system • Must handle thousands of requests in a short amount of time • Must “lock” and “serialize” requests

  13. Databases • Error recovery • What happens when a Hard disk fails • Have you lost the data • Back-ups are necessary • Automated process • Partial images • Log files can help

  14. Databases • Definitions • Data model: is a collection of concepts for describing data • Schema : description of a particular collection of data, using the a given data model *** • Table: A collection of Rows and columns storing data • Field: A single column in a table describing a feature of the data

  15. Databases • Data Model • The structural part: a collection of data structures used to create databases representing the entities or objects. • The integrity part: a collection of rules governing the constraints. • The manipulation part: a collection of operators which can be applied to the data structures.

  16. Databases • Schema • map of concepts and their relationships. • Each PERSON may be the vendor in one or more ORDERS. • Each ORDER must be from one and only one PERSON.

  17. Databases • Table • A format for storing data. • Like a diary*** • You have the choice of entering some specific information • Some predefined features have been chosen for you • Name, Address, Date, Phone #

  18. Databases • Field • A feature in a table • Name • Age • Address • Important to select the correct fields for efficiency

  19. A typical DBMS has a layered architecture. Remember the cake! Network layers Why are they important? Databases

  20. Easy to make changes to layers. The figure does not show the concurrency control and recovery components. This is one of several possible architectures; each system has its own variations. Query Optimization and Execution Relational Operators Files and Access Methods Buffer Management Disk Space Management DB Databases

  21. Databases • Definitions • Key • A Field using which a search will be conducted in the DB • Primary key (unique: SSID) • Secondary key (Hair Color) • Foreign key (unique: SSID from another table) • Uniquely identify an entry in a table

  22. Databases • Definitions • Query • A question to which the DB must provide an answer • Names of all people who bought a red car in the last month • SELECT customer_name FROM CAR-TABLE WHERE car_color = ‘red’;

  23. Databases • Definitions • example CREATE TABLE Dept_Mgr( did INTEGER, dname CHAR(20), budget REAL, ssn CHAR(11), since DATE, PRIMARY KEY (did), FOREIGN KEY (ssn) REFERENCES Employees)

  24. Databases • Definitions • example CREATE TABLE Manages( ssn CHAR(11), did INTEGER, since DATE, PRIMARY KEY (did), FOREIGN KEY (ssn) REFERENCES Employees, FOREIGN KEY (did) REFERENCES Departments)

  25. Roadmap • Databases • Introduction (9 AM -10:15 AM) • Break: 10-15 - 10:30 • Advanced Concepts (10:30 AM - 12 PM) • Break: 12 - 1 • Practical Aspects (1 PM - 2 PM) • Break: 2 – 2:15 • Group Activity (2:15 PM - 3 PM)

  26. Advanced Concepts • Definitions • Constraint • A rule to make data conform to a specification • No duplicates, no NULL values • E.g. Age can’t be –ve. • Bonus = # of years worked/age * salary • If age < 0 , system can crash

  27. Entity Relationship:Designing a DB*** • Definitions • Entity-Relationship model • Entity:Real-world object distinguishable from other objects. • Car • Building • Human • Computer • Cake • Shirt

  28. Advanced Concepts • Definitions • Entity-Relationship model • An entity is described (in DB) using a set of attributes • Car: color, model, engine-type, number-of-doors • Building: Number-of-floors, address • Human: Name, age, hair-color, height

  29. Advanced Concepts • Definitions • Entity Set: A collection of similar entities. E.g., all employees. • All entities in an entity set have the same set of attributes (usually). • Each entity set has a key. • The set of machines in a company • Attributes: Model, color, serial #, type, cost, repair-date • Key: serial #

  30. Advanced Concepts • Definitions • Entity-Relationship model • Relationship:Association among two or more entities. • Anirban works in Pharmacy department. • Sharon studies at UCR • Bill eats at McDonalds

  31. Advanced Concepts • Definitions • Entity-Relationship model • Relationship Set:Collection of similar relationships. • Same entity set could participate in different relationship sets, or in different “roles” in same set. • Entity 1 <works at> Entity 2

  32. since name dname ssn budget lot did Works_In Employees Departments Advanced Concepts

  33. Advanced Concepts • Elements • Access Modules • Why is this module important? • Check who has access • Grant access to correct data

  34. Advanced Concepts • Elements • Concurrency • Manage multiple users at same time • Keep log files to track changes • Possibility for Rollback

  35. Accessing the DB*** • Elements • Queries • Can be specified in SQL language • Can be used for complex analysis • Can be pre-computed

  36. Advanced Concepts • Elements • Forms • Interface for a query • Easily integrated with web-pages SELECT [DISTINCT] target-list FROMrelation-list WHERE qualification

  37. Advanced Concepts • Elements • Network Modules • Manages connections over the network to other databases • Handles connections from remote users

  38. Advanced Concepts • Elements • Reports • Efficient way to represent output of queries • Various formats (Crystal-reports) • Easy to read and understand

  39. Advanced Concepts • How things Interact • Consistency • Locking mechanism to prevent data corruption • 2 phase locking (strict and non-strict)

  40. Advanced Concepts • How things Interact • 2 Phase locking • Phase 1 : acquire locks • Phase 2 : release locks A Resource 1 Resource 2 Resource 3

  41. Advanced Concepts • How things Interact • Concurrency • Multi-threaded engine • Must keep track of who is modifying which part • Rollback support for each user necessary

  42. Advanced Concepts • Distributed databases • Data is not stored at one location • Data distribution must be transparent • Transactions must be transparent

  43. Advanced Concepts • Distributed databases • Distributed db is different from replicated db • Schema, access issues can be very complicated • Can cost more, $$$

  44. Advanced Concepts • Pop Quiz • What is a distributed db? • What are its benefits? • What are the downsides?

  45. Roadmap • Databases • Introduction (9 AM -10:15 AM) • Break: 10-15 - 10:30 • Advanced Concepts (10:30 AM - 12 PM) • Break: 12 - 1 • Practical Aspects (1 PM - 2 PM) • Break: 2 – 2:15 • Group Activity (2:15 PM - 3 PM)

  46. Practical Aspects • Performance Issues • IMPORTANT: think about a good DB design • Remember Schema • Think about appropriate tables • Think about appropriate fields • Think about appropriate keys

  47. Practical Aspects • Performance Issues • Design the application correctly (fewer queries) • Decide how the application is going to access the data • Which part of the data is going to be used • How frequently is each part going to be used • Use standard efficient interfaces

  48. Practical Aspects • Performance Issues • Create indexes*** for frequently used columns • Optimize queries (select * is bad) • Predict what kind of queries will be asked • Use indexes, keys and constraints • Design db such that 90% of queries answered really fast

  49. Practical Aspects • Performance Issues • Network latency • Simply having a good design is not enough • You need network bandwidth • Remember the types of connections, Aug-4th • Test your system properly • You may want to set up a part of the db near where most queries originate

  50. Practical Aspects • Performance Issues • Hardware • Simply having a good design and bandwidth is not enough • You need fast hardware • Which Hard Drives • 7200 RPM, 10k+ RPM • SCSi, SATA, SATA II, PATA • How big 500 GB, 1 TB, 1TB+ • Solid State Drives, SSD

More Related