1.65k likes | 1.85k Views
CS157B - Fall 2004. Entity-Relationship Data Model. Entity Sets a collection of similar entities Attributes properties of the entities in an entity set Relationships connection among two or more entity sets Constraints Keys : uniquely identify an entity within its entity set
E N D
Entity-Relationship Data Model • Entity Sets • a collection of similar entities • Attributes • properties of the entities in an entity set • Relationships • connection among two or more entity sets • Constraints • Keys : uniquely identify an entity within its entity set • Single-value constraints : requirement that the value in a certain context be unique • Referential integrity constraints : existence test • Domain constraints : range test • General constraints : data set constraints
E/R - Cont. • Constraints Are of the Schema • Should be part of the DB design • Must be carefully constructed • Can be troublesome if major redesign is required • Subclasses • Special "isa" relationship • Similar to OO's definition • Weak Entity Sets • Some of its key may be composed of attributes which belong to another entity set • Need additional qualifier
The Relational Data Model • Relation • Represented by a two-demensional table • Attribute • Represented as columns in a table • Schema • The name of a relation and the set of attributes of this relation • A database schema consists of one of more relation schemas • Tuple • A row in the table • Domain • Each attribute of a relation is an elementary type, thus each tuple must be atomic
Relational - Cont • Converting Subclass Structures to Relations • E/R-Style Conversion • Object-Oriented Approach • Null Values • Functional Dependencies (FD) • An unique-value constraint in a relational schema design • Keys of Relations • {A(1), ..., A(n)} is a key of relation R if • no distinct tuples agree on all of the {...} • no other subset of {} determines all other attributes of R, must be minimal • If more than one key, then designate one of the keys as the primary key • Superkey : Composed by a set of attributes
Relational - Cont • Design of Relational Database Schemas • Anomalies • Redundancy : Many repeated information in each tuple • Update Anomalies : One tuple update does not cause other tuples to be updated • Deletion Anomalies : Lose other information as a side effect when a set of values becomes empty • Decomposing Relations • Splitting the attributes of R to make the schemas of two new relations • Boyce-Codd Normal Form (BCNF) • The left side of every nontrivial FD must be a superkey • Recursively decompose into smaller sets of tables until they all satisfy the BCNF rule • Must be able to join all set of tables into the original tuple
Relational - Cont • Third Nomal Form (3NF) • A relation R is in 3NF if: whenever A(1)...A(n) ->B is a nontrivial FD, either {A(1)...A(n)} is a superkey, or B is a member of some key • May allow minimal redundancy in the end • Multivalued Dependencies (MVD) • A statement that two sets of attributes in a relation have sets of values that appear in all possible combinations • Example of MVD in p.118 figure 3.29 • No BCNF violation • Fourth Normal Form (4NF) • A relation R is in 4NF if whenever A(a)..A(n)->->B(1)..B(m) is a nontrivial MVD, {A(1)...A(n)} is a superkey • Can be decomposed similar to the BCNF decomposition algorithm
Other Data Models • Review of Object-Oriented Concepts • Type System • Record structures • Collection types • Reference types • Classes and Objects • Contain attributes and methods • Object Identity • OID • Methods • Functions within a class • Hierarchies • Superclass and subclass relationship • Introduction to ODL • Object-Oriented Design • Three main properties: attributes, relationships, and methods
Other Data Models - Cont • Class Declarations • e.g. class <name> { <list of properties> } • Attributes in ODL • can be simple or complex type • Relationships in ODL • e.g. in Movie: relationship Set<Star> stars; • Inverse Relationships • e.g. in Star: relationship Set<Movie> starredIn; • Methods in ODL • signatures • Types in ODL • Atomic types: • integer • float • character • character string • boolean • enumerations
Other Data Models - Cont • Structure types • e.g. class name • Structures - similar to a tuple of values • Collection types • Set - distinct unordered elements • Bag - unordered elements • List - ordered elements • Array - e.g. Array<char,10> • Dictionary - e.g. Dictionary(Keytype, Rangetype) • Additional ODL Concepts • Multiway Relationships in ODL • ODL supports only binary relationships • Use several binary, many-one relationship instead • Subclasses in ODL • Inherits all the properties of its superclass
Other Data Models - Cont • Multiple Inheritance in ODL • Needs to resolve name conflicts from multiple superclass • Keys in ODL • Optional because of the existence of an OID • Can be declared with one or more attributes • ODL Designs to Relational Designs • Many issues • Possibly no key in ODL • Relational can't handle structure and collection types directly • Convert any ODL type constructor can lead to a BCNF violation • Representing ODL Relationships • one relation for each inverse pairs
Other Data Models - Cont • Object-Relational Model • O/R Features • Structured types for attributes • Methods • Identifiers for tuples (like OID) • References • Compromise Between Pure OO and Relational • Semistructured Data • "Schemaless" - the schema is attached to the data itself • Represented in a collection of nodes • Interior node describes the data with a label • Leaf node contains the data of any atomic type • Legacy-database problem
Relational Algebra • Algebra of Relational Operations • Bags rather than sets can be more efficient depending on the operation such as an union of two relations which contain duplicates • Components • Variables that stand for relations and Constants which are finite relations • Expressions of relational algebra are referred to as queries • Set Operations on Relations • R u S, the union of R and S (distinct) • R n S, the intersection of R and S (distinct) • R - S, the difference of R and S, is the set of elements that are in R but not in S; different from S - R • Tuples must be in the same order and attribute types must be the same
Relational Algebra - Cont • Projection • The projection operator is used to produce from a relation R a new relation that has only some of R's columns • e.g. Views • Selection • The selection operator produces a new relation R with a subset of R's tuples; the tuples in the resulting relation are those that satisfy some condition C that involves the attributes of R • e.g. SQL select statement • Cartesian Product • A cross-product of two sets R and S • e.g. R x S ; if R has 2 tuples and S has 3 tuples, the result has 6 tuples • Natural Joins • Use matching attributes
Relational Algebra - Cont • Theta-Joins • Join two relations with a condition denoted by 'C' • Combining Operations to Form Queries • Multiple queries can be combined into a complex query • e.g. AND, OR, (), ... • Renaming • Control the names of the attributes used for relations that are constructed by applying relational-algebra operations • Dependent and Independent Operations • Some relational expression can be "rewritten" in a different expression • e.g. R n S = R - (R - S) • e.g. Glass half-empty or half-full?
Relational Algebra - Cont • Extended Operators of Relational Algebra • Duplicate Elimination • This operator converts a bag to a set • e.g. SQL keyword DISTINCT • Aggregation Operators • SUM(), AVG(), MIN(), MAX(), COUNT() • Grouping • This operator allows us to group a relation and/or aggregate some columns • Extended Projection • This allows expressions as part of an attribute list • Sorting Operator • This operator is anomalous, in that it is the only operator in our relational algrebra whose result is a list of tuples, rather than a set • Outerjoin • Takes care of the dangling tuples; denote with special "null" symbol
SQL • Simple Queries in SQL • select A(...) from R where C • Projection & Selection in SQL • select title, length from Movie where studioName = 'Disney' and year = 1990 • String Comparison • Bit Strings (bit data) • e.g. B'011' or X'7ff' • Lexicographic order • e.g. 'A' < 'B'; 'a' < 'b'; 'a' > 'B'? • Depends on encoding scheme or standard • (e.g. ASCII, UTF8) • LIKE keyword • "s like p" denotes where s is a string and p is a pattern • Special character % • Dates and Times • DATE '2002-02-04' • TIME '19:00:30.5'
SQL - Cont • Null Values • 3 cases: unknown, inapplicable, withheld • NOT a constant • Test expression for IS NULL • Truth-Value "unknown" • Pitfalls regarding nulls • e.g. select * from Movie • where length <= 120 or length > 120; • Ordering the Output • ORDER BY <list of attributes> • Can be ASC or DESC, default is ASC • Queries with > 1 Relation • e.g. select name from Movie, MovieExec where title = 'Star Wars' and producerC# = cert# • Disambiguating Attributes • select MovieStar.name, MovieExec.name from MovieStar, MovieExec where MovieStar.address = MovieExec.address;
SQL - Cont • UNION, INTERSECT, EXCEPT Keywords • Same logic as the set operators of u, n, and - • Subqueries • Can return a single constant or relations in the WHERE clause • Can have relations appear in the FROM clauses • Scalar Value • An atomic value that can appear as one component of a tuple (e.g. constant, attribute) • e.g. select name from MovieExec where cert#=(select producerC# from Movie where title='Star Wars'); • Conditions Involving Relations • If R is a relation, then EXISTS R is a condition that is true if R is not empty
SQL - Cont • s IN R is true if s is equal to one of the values in R; s NOT IN R is true if s is equal to no value in R • s > ALL R is true if s is greater than every value in unary relation R • s > ANY R is true if s is greater than at least one value in unary relation R • EXISTS, ALL, ANY operators can be negated by putting NOT in front of the entire expression • Subqueries in FROM Clauses • Can substitute a R in the FROM clause with a subquery • e.g. select name from MovieExec, (select producerC# from Movie, StarsIn where title = movieTitle and year = movieYear and starname = 'Harrison Ford'
SQL - Cont • Cross Joins • Known as Cartesian product or just product • e.g. Movie CROSS JOIN StarsIn; • Natural Joins • The join condition is that all pairs of attributes from the two relations having a common name are equated, and no other conditions • One of each pair of equated attributes is projected out • Outerjoins • e.g. MovieStar NATURAL FULL OUTER JOIN MovieExec; • Full-Relation Operations • Eliminating Duplicates • Use the DISTINCT keyword in SELECT • Performance consideration
SQL - Cont • Duplicates in U, I, and D • By default, UID operations convert bags to sets • Use keyword ALL after UNION, INTERSECT EXCEPT keywords to prevent the elimination of duplicates • Grouping and Aggregation in SQL • Use the special GROUP BY clause • Aggregation Operators • SUM, AVG, MIN, MAX are used by applying them to a scalar-valued expression, typically a column name, in a SELECT clause • COUNT(*) is used to counts all the tuples in R that is constructed from the FROM clause and WHERE clause of the query • HAVING Clause • Use in conjunction with GROUP BY to narrow the aggregated list
SQL - Cont • Database Modifications (DML) • Insertion • Insert into R(A1...An) values (V1,...Vn); • e.g. insert into Studio(name) values('S1'); • Can insert multiple tuples with subquery • e.g. insert into Studio(name) select distinct studioName from Movie where studioName not in (select name from studio); • Deletion • Delete from R where <condition>; • Can delete multiple tuples with 1 delete statement depending on <condition> • Updates • Update R set <new-value assignment> where <condition>; • Can update multiple tuples with 1 update statement depending on <condition>
SQL - Cont • DDL in SQL • Data Types • Character strings, fixed or variable length • Bit strings, fixed or variable length • Boolean: true, false, unknown • Integer or int; shortint • Floating-point numbers • e.g. decimal(n,d) where n is total number of digits with d is the decimal point from the right; • 1234.56 can be described as decimal(6,2) • Dates and times can be represented by the data types DATE and TIME respectively • Table Declarations • Use the keywords CREATE TABLE followed by the R name and list of As and their types • e.g. create table MovieStar(name char(30), address varchar(256), gender char(1), birthday DATE);
SQL - Cont • Modifying Relation Schemas • Drop table MovieStar; • Alter table MovieStar add phone char(16); • Alter table MovieStar drop birthdate; • Default Values • Use the DEFAULT keyword to set default values for a column • e.g. alter table MovieStar add phone char(16) default 'unlisted'; • Indexes • Allow faster access to data • e.g. create index YearIndex on Movie(year); • Can be one or more attributes • e.g. create index KeyIndex on Movie(title,year); • Delete the index using drop index statement • Selection of Indexes • Selection vs IUD performance
SQL - Cont • View Definitions • View does not contains any physical data • "virtual relation" • Declaring Views • Create view <view-name> as <view-definition> • Querying Views • Use the normal select syntax with a view name in place of the table name • Renaming Attributes • Can map table attribute name from the base table to a new name in a view definition • Modifying Views • Updatable views are useful in special cases; selective IUDs
Constraints & Triggers • Keys & Foreign Keys • Primary Keys • Each relation can have only one primary key • Primary key attribute(s) can not be NULL • Two ways to specify the primary key • 1) create table MovieStar( • name char(30) primary key, • address varchar(255), • gender char(1), • birthdate date); • 2) create table MovieStar( • name char(30), • address varchar(255), • gender char(1), • birthdate date, • primary key(name, birthday)); • Unique Keys • Each relation can have >1 unique keys • Declared the same way as primary key • NULL is allowed
Constraints & Triggers - Cont • Enforcing Key Constraints • During insertion or update to the relation • Foreign-Key • The "referenced" attribute(s) must be declared unique or the primary key for their relation; it must not have a NULL value • create table Studio( • name char(30) primary key, • address varchar(255), • presC# int references MovieExec(cert#) • ); • create table Studio( • name char(30) primary key, • address varchar(255), • presC# int, • foreign key (presC#) references MovieExec(cert#) • );
Constraints & Triggers - Cont • Maintaining Referential Integrity • Reject Violating Modifications (Default) • Insert or update Studio tuple whose presC# value is not NULL and is not the cert# component of any MovieExec tuple • Delete a MovieExec tuple and its cert# component appears as the presC# component of one or more Studio tuples • Update a MovieExec tuple cert# value; but the old cert# is the value of presC# of some movie studio in Studio • Cascade Policy • When deleting the MovieExec tuple for the president of a studio, then it will delete the referencing tuple from Studio • By changing the cert# value for a MovieExec tuple from c1 to c2 and there was some Studio tuple with c1 as the value of its presC# component, then it will update this presC# component to have the value c2
Constraints & Triggers - Cont • Set-Null Policy • Can handle the delete and update problem by setting the presC# to NULL • e.g. create table Studio ( • name char(30) primary key, • address varchar(255), • presC# int references MovieExec(cert#) • on delete set null • on update cascade • ); • Deferring Checking of Constraints • Do selective insert to default the presC# to null • Insert tuple into MovieExec with new cert# • Update the Studio tuple with matching presC# • Use keyword DEFERRABLE and DEFERRED to delay the checking until the whole tranaction is "committed" • Reverse the DEFERRED case with keyword IMMEDIATE
Constraints & Triggers - Cont • Constraints on Attributes and Tuples • Not-Null Constraints • Use the NOT NULL keywords in create table statement for any attribute • Attribute-Based Constraints • Use the CHECK keyword in create table statement • Limit the value for an attribute • e.g. gender char(1) check (gender in ('F','M')) • Tuple-Based Constraints • Use the CHECK keyword in create table statement • Can compose of complex expression of multiple attributes • Constraints Modification • Naming Constraints • In order to change, it must have a name • Use the CONSTRAINT keyword
Constraints & Triggers - Cont • Altering Constraints on Tables • Can use ALTER TABLE to add or drop a constraint • Can use SET CONSTRAINTS to set it for deferred or immediate • Schema-Level Constraints and Triggers • Assertions (General Constraint) • A boolean-valued SQL expression that must be true at all times • create assertion <name> check (<condition>) • e.g. • create assertion RichPres check (not exists (select * from Studio, MovieExec where presC# = cert# AND netWorth < 10000000)); • Event-Condition-Action Rules (ECA Rules) • Triggers are awakened by certain events • The "action" will be preform only if C = true
Constraints & Triggers - Cont • Triggers in SQL • create trigger NetWorthTrigger • after update of netWorth ON MovieExec • referencing • old row as OTuple, • new row as NTuple • for each row • when (OTuple.netWorth > NTuple.netWorth) • update MovieExec • set netWorth = OTuple.netWorth • where cert# = NTuple.cert#; • Default is "for each statement" • Besides update, can use insert and delete • Action can be "before" or "after" the event • Use BEGIN...END for multiple statements • Instead-Of Triggers • Not part of SQL-99 • Replace event with new defined operations • Very powerful when used on a view
System Aspects of SQL • SQL Programming Environment • Host language + Embedded SQL • v • Preprocessor • v • Host language + Function calls • v • Host-language compiler <= SQL Library • v • Object-code program • Impedance Mismatch Problem • Different data model between SQL statements and programming langauges • SQL/Host Language Interface • Use EXEC SQL keywords in front of an SQL statement • Use shared (host) variables for SQL stmt • Check SQLSTATE for SQL errors
System Aspects of SQL - Cont • The DECLARE Section and Its Usage • Shared variables are declared between two embedded SQL statements. • e.g. • EXEC SQL BEGIN DECLARE SECTION; • char studioName[50], studioAddr[256]; • char SQLSTATE[6]; • EXEC SQL END DECLARE SECTION; • A shared variable can be used within the SQL statement by placing a colon in front it. • e.g. • EXEC SQL INSERT INTO • Studio(name, address) • VALUES (:studioName, :studioAddr); • Single-Row Select Statement • e.g. • EXEC SQL SELECT netWorth • INTO :presNetWorth • FROM Studio, MovieExec • WHERE presC# = cert# AND • Studio.name = :studioName;
System Aspects of SQL - Cont • Cursors • Allow programs to "fetch" multiple rows from a relation • Here are the steps for using a cursor • EXEC SQL DECLARE <cursor> CURSOR FOR <query> • EXEC SQL OPEN <cursor> • EXEC SQL FETCH FROM <cursor> INTO <list-of-variables> • If SQLSTATE is "02000", then goto close <cursor>; otherwise fetch next row • EXEC SQL CLOSE <cursor> • Row Modification with Cursor • Use the WHERE CURRENT OF keywords • e.g. • EXEC SQL DELETE FROM MovieExec • WHERE CURRENT OF execCursor; • EXEC SQL UPDATE MovieExec • SET netWorth = 2 * netWorth • WHERE CURRENT OF execCursor;
System Aspects of SQL - Cont • Concurrent Update of Tuple • Use keywords INSENSITIVE CURSOR to ignore new changes which may affect the current cursor • Use Keywords FOR READ ONLY to signal that this cursor does not allow any modification • Scrollable Cursors • Allow a set of movements within a cursor • Dynamic SQL • Flexibility to enter SQL statement at run time • Use EXEC SQL EXECUTE IMMEDIATE or • ( EXEC SQL PREPARE ... and • EXEC SQL EXECUTE ... ) • e.g. • EXEC SQL BEGIN DECLARE SECTION; • char *query; • EXEC SQL END DECLARE SECTION; • /* Allocate memory pointed to by query • and fill in the SQL statement */ • EXEC SQL EXECUTE IMMEDIATE :query;
System Aspects of SQL - Cont • Procedures Stored in the Schema • Persistent Stored Modules (PSM) • Can build module to handle complex computations which cannot be expressed using SQL • PSM Functions & Procedures • CREATE PROCEDURE <name> (<param>) • local declarations • procedure body; • Procedure parameter can be input-only, output-only, or both • CREATE FUNCTION <name> (<param>) RETURNS <type> • local declarations • function body; • Function parameter can only be input as PSM forbids side-effects in functions
System Aspects of SQL - Cont • Statements in PSM • Call statement • CALL <proc name> (<arg list>); • e.g. EXEC SQL CALL Foo(:x, 3); • RETURN <expression>; • DECLARE <name> <type>; • SET <variable> = <expression>; • BEGIN ... END • IF <condition> THEN • <statement list> • ELSEIF <condition> THEN • <statement list> • ELSEIF • ... • ELSE <statement list> • END IF; • SELECT <attr> INTO <var> FROM <table> • WHERE <condition> • LOOP <statement list> END LOOP;
System Aspects of SQL - Cont • FOR <loop name> AS <cursor name> CURSOR FOR • <query> • DO • <statement list> • END FOR; • Support WHILE and REPEAT loops • Exception Handler in PSM • DECLARE <where to go> HANDLER FOR <condition list> <statement> • <where to go> can be: • CONTINUE - executing the handler statement and then execute the next statement after the one which cause the exception • EXIT - execute the handler statement and then control leaves the BEGIN...END block in which the handler is declared • UNDO - same as EXIT except that any changes to the DB or local variables that were made by the statements of the block are "undone"
System Aspects of SQL - Cont • SQL Environment • Schema • A collection of tables, views, assertions, triggers, PSM modules, etc • CREATE SCHEMA <name> <declarations> • Use SET SCHEMA to change schema name • Catalog • A collection of schemas • CREATE CATALOG <catalog name> • Use SET CATALOG to change the current catalog • Cluster • A collection of catalogs • Can be view as a set of all catalogs accessible to a user • Client/Server • Both client and server can be on the different or the same machine
System Aspects of SQL - Cont • Connection • CONNECT TO <server name> AS <connection name> AUTHORIZATION <name and password> • SET CONNECTION <name>; • DISCONNECT <name>; • Call-Level Interface (CLI) • In C, each CLI program must include sqlcli.h where it contains all the function, structure, constant, and type definitions • 4 kinds of records: SQLHENV, SQLHDBC, SQLHSTMT, and SQLHDESC. • Use SQLAllocHandle(hType, hIn, hOut) • Processing Statements • Use SQLPrepare(sh, st, sl); & SQLExecute(sh); • or use SQLExecDirect(sh, st, sl); • Use SQLFetch(sh) from a query result
System Aspects of SQL - Cont • Use SQLBindCol(sh, colNo, colType, pVar, varSize, varInfo) for column binding • Can use SQLGetData(...) in place of SQLBindCol(...) to extract data from a query • Passing Parameters to Query • e.g. • SQLPrepare(myStmt, "INSERT INTO Studio(name, address) VALUES (?, ?)", SQL_NTS); • SQLBindParameter(myStmt, 1, ..., studioName, ...); • SQLBindParameter(myStmt, 2, ..., studioAddr, ...); • SQLExecute(myStmt); • Transactions in SQL • Serializability • Multiple selects followed by multiple updates to the same tuple; e.g. chooseSeat() • Use locks to handle this problem
System Aspects of SQL - Cont • Atomicity • Single user transaction may have multiple updates to different tables; e.g. transfer from account A to account B • Only "commit" after all the changes are made • Transaction • A collection of one or more operations on the database that must be executed atomically • Use START TRANSACTION to begin • Use SQL COMMIT to commit • Use SQL ROLLBACK to abort and undo the changes prior to the start of the transaction • Can set the transaction to READ ONLY • Dirty Reads (Uncommitted Reads) • Data read that were "dirty" or uncommitted • Isolation Levels • Serializable, uncommitted read, committed read, and repeatable-read
System Aspects of SQL - Cont • Security & User Authorization in SQL • Privileges • SQL defines nine types of privileges: SELECT, INSERT, DELETE, UPDATE, REFERENCES, USAGE, TRIGGER, EXECUTE, and UNDER • Authorization Checking • First at connect time • Second at statement time • Additional checks with modules • Grant & Revoke • GRANT <privilege list> ON <database element> TO <user list> • Allow other user to perform certain actions • REVOKE <privilege list> ON <database element> FROM <user list> • Disallow a previously granted privilege
Data Storage • Megatron 2002 Database System • Store relation in ASCII text file • Store the schema also in ASCII file • Obvious problems: • Tuple layout on disk is not flexible; any small change may shuffle the whole file • Searching is expensive; must read the whole file • Query-processing is by brute force; nested loop to examine all possibilities • No memory buffering, every query requires direct access to disk • No concurrency control • No reliability; e.g. no crash recovery • The Memory Hierarchy • Cache Memory • Fast access to and from processor or I/O controller
Data Storage - Cont • Main Memory • Random access (RAM) • Both OS and applications reside in RAM • Virtual Memory • Allows each application to have their own private memory space which mapped to physical memory (RAM) or disk memory • A page is a memory block used by main memory to/from disk • Secondary Storage • Much slower than main memory • Two type of disk I/O • Disk read means moving a block from disk to main memory • Disk write means moving a block from main memory to disk • Most DBMS will manage disk blocks itself, rather than relying on the OS file manager
Data Storage - Cont • Volatile and Nonvolatile Storage • Main memory is typically volatile; thus when the power is off, the content is gone • Flash memory are nonvolatile but it is very expensive and currently not used in main memory • An alternative is to use "RAM disk" combine with a battery backup to the power supply • Disks • Disk Components • Head, platter (2 surfaces each), cylinder, tracks, sectors, gap • Disk Controller • Controls the movement of the disk head(s) to a specific track and preforms reads and writes • Tranfers data to and from main memory
Data Storage - Cont • Effective Use of Secondary Storage • CS studies of algorithm often assumes that the data are always in main memory; this is not a valid assumption for DBMS • I/O Model of Computation • Dominance of I/O cost • If a block needs to be moved between disk and main memory, then the time taken to perform the read/write is much larger than the time for manipulating that data in main memory; thus the I/O time is a good approximation of the total time • Similar to Big O notation for algorithm study • Sorting Data in Secondary Storage • If we need to sort 1.64 billion bytes and a disk block is configured to handle 16384 bytes, then 100000 blocks are required to read each tuple once from disk • Quicksort is one of the fastest algorithm but its assumption is all entries are in memory
Data Storage - Cont • Two-Phase, Multiway Merge-Sort (TPMMS) • Consists of 2 phases • Phase 1: Sort main-memory-sized pieces of the data, so every record is part of a sorted list that just fits in the availabe main memory; the results are a set of sorted sublists on disk which we merge in the next phase • Phase 2: Merge all the sorted sublists into a single sorted list • Example: • If we have 100 MB of main memory using 16384 size block sorting 1.64 billion bytes, we can fit 6400 blocks at a time in main memory; thus the results from phase 1 will have 16 sorted sublists • If merge two sublists at a time, we need 8 disk I/O's performed on it • The better approach is to read the first block from each of the sorted list into main-memory buffer. Find smallest element into a output buffer and flush/reload when necessary.