170 likes | 418 Views
CMPUT 291 File and Database Management Systems. Objectives. File processing What are files? How are file systems organized? What is the functionality of file systems? Database Management What is database management? How does it differ from file systems?
E N D
Objectives • File processing • What are files? • How are file systems organized? • What is the functionality of file systems? • Database Management • What is database management? • How does it differ from file systems? • What is the basic functionality that they offer? • Data Modeling • How does one use a database management system? • What is the data modeling process? • What are the basic techniques?
Course Documents • Available as a single package at the bookstore: • J. D. Ullman and J. Widom. A First Course in Database Systems. Prentice-Hall,1997. • H. Garcia-Molina, J. D. Ullman and J. Widom. Database System Implementation. Prentice-Hall,1999. • Lab manual • Available on-line • Lecture notes are available on-line, accessible from the CMPUT 291 home pages: • http://ugweb.cs.ualberta.ca/~c291/B1 • http://ugweb.cs.ualberta.ca/~c291/B2
Administravia • Office Hours • B1 (Prof. Nascimento): WF 15:00 - 16:00 in GSB773 • B2 (Prof. Özsu): TR 14:00 -15:00 in GSB 779 • Also by appointment • Grading • Assignments 20% • Project 25% • Midterm 25% • Final 30% • Announcements • In class; material will also be available electronically • Re-examination • None • Collaboration • Collaborate on assignments, but do not merely copy. • Newsgroup • ualberta.cs.c291 – make sure you check this regularly
Laboratories • Oracle DBA • Shauna Grabinsky, shauna@cs.ualberta.ca • Do not contact her as your first source • There are four TAs • Shu Lin, shulin@cs.ualberta.ca • Vishal Chitkara, chitkara@cs.ualberta.ca • Bin Yao, yao@cs.ualberta.ca • Peng Wang, peng@cs.ualberta.ca
What is “Data”? • ANSI definition: • Data • A representation of facts, concepts, or instructions in a formalized manner suitable for communication, interpretation, or processing by humans or by automatic means. • Any representation such as characters or analog quantities to which meaning is or might be assigned. Generally, we perform operations on data or data items to supply some informationabout an entity. • Volatile vs. non-volatile data • Our concern is primarily with non-volatile data
PROGRAM 1 DATA SET 1 Data Management PROGRAM 2 DATA SET 2 Data Management PROGRAM 3 DATA SET 3 Data Management Manual Data Management • Data are not stored • Programmer defines both logical data structure and physical structure (storage structure, access methods, I/O modes, etc) • One data set per program. High data redundancy.
Problems • There is no persistence. • All data is transient and disappears when the program terminates. • Random access memory (RAM) is expensive and limited • All data may not fit available memory • Programmer productivity low • The programmer has to do a lot of tedious work.
File Processing • Data are stored in files with interface between programs and files. • Various access methods exist (e.g., sequential, indexed, random) • One file corresponds to one or several programs. PROGRAM 1 Data Management FILE 1 File System Services PROGRAM 2 Redundant Data Data Management PROGRAM 3 FILE 2 Data Management
File System Functions • Mapping between logical files and physical files • Logical files: a file viewed by users and programs. • Data may be viewed as a collection of bytes or as a collection of records (collection of bytes with a particular structure) • Programs manipulate logical files • Physical files: a file as it actually exists on a storage device. • Data usually viewed as a collection of bytes located at a physical address on the device • Operating systems manipulate physical files. • A set of services and an interface (usually called application independent interface – API)
File System Services • Mapping a logical file to a physical file assign(logical_file, ‘physical_file’) • Opening a file file_desc=open(logical_file, flags, [protect]) • flags indicate the mode in which the file is to be opened • e.g.: create, read only, write only, read/write,append • protect is the file protection code in case of create • Closing a file close(file_desc)
File System Services • Reading from a file read(source_file, destination_addr, size) • source_file is the file descriptor obtained by opening • destination_addr is the memory address where data will be read into • Writing to a file write(destination_file, source_addr, size)
File System Services • Seeking a location in a file seek(source_file, offset) • moves the read/write head to a particular position • avoids sequential reading
Performance Considerations • Disk access is very slow • RAM access: 120 nanosecond • Disk access: 30 millisecond • Disk access is 250,000 times slower • This has direct performance implications on applications
Principles of Disk Access • Go to disk as few times as possible • index structures • Every time you go, bring as much relevant data as possible • clustering • Make each access to disk as efficient as possible • random access rather than sequential