350 likes | 667 Views
Physical Design. 3. Introduction. Database Design Methodology requirements specification ER/EER modelling validation of ER/EER models; aggregation of different views transformation of ER/EER model into relational model normalisation physical design monitor and tune operational system.
E N D
Introduction • Database Design Methodology • requirements specification • ER/EER modelling • validation of ER/EER models; aggregation of different views • transformation of ER/EER model into relational model • normalisation • physical design • monitor and tune operational system
Outline • overview of physical design • base relations and enterprise constraints • known from before • transactions analysis • file organisation and indexes
Physical Design – Rationale • all the documentation produced until physical design represents a detailed specification of what we intend to build, of what is required • informally, physical design is the process that transforms these specifications into a good working system, using the functionality provided by the chosen DMBS
Overview of Physical Design • express logical model using the DDL language of the chosen/target DBMS • design optimal storage • design user views • design security mechanisms in next lecture
Express Logical Model in Target DBMSd • essentially covered in the first term • design and implement base relations • analyse and document derived data • design enterprise constraints
Design Base Relations • are domains supported? • are attribute constraints supported? • NOT NULL, UNIQUE • are keys supported? • candidate • primary • alternate (UNIQUE + NOT NULL) • foreign • referential integrity • FK rules
Analyse and Document Derived Data • derived attributes • not present in the relational model; • but present in the EER model • it is possible that not all derived attributes are documented thus far • then, this issue can be given further consideration at this point • trade-off • calculate a derived attribute each time it is being used • may be too time-expensive • store it in the database • redundancy, therefore more space required (space-expensive) and possibility of inconsistencies • to maintain consistency – integrity constraints of active rules; now this may become time-expensive
Design Enterprise Constraints • support for integrity constraints • e.g., SQL’s • CONSTRAINT … CHECK … NOT EXISTS … • support for active rules or triggers • e.g., Postgres’: • CREATE RULE … ON UPDATE TO … WHERE … DO … • example?
Design Optimal Storage • criteria • maximise transaction throughput • maximise response time • minimise storage space • (they determine and are determined by the system resources) • issues • transactions analysis • file organisation and indexes
Analyse Transactions • identify critical transactions to the functionality of the information system • e.g., transactions that should never fail • identify transactions that put a significant load on the DBMS • run frequently • processing time is high • identify the periods when the database system is heavily used (down to individual transactions) • identify type of users by whom or locations from where the database is going to be heavily used
I I I I U U U U R R R R D D D D Map Transactions [Paths] to Relations • identifies mostly used relations • matrix • transactions • relations • elements of the matrix: • Y/N (i.e. used, not used) • number of accesses (per hour, day …) T1 T2 T3 T4 rel 1 rel 2 rel 3 rel 4 rel 5
Determine Frequency Information • determine average, maximum and minimum number of times a transaction runs per hour, per day, … • transaction usage map • determine peak periods • determine demanding transactions that have in common some of the resources they access • problems due to locking
Tutor Student name office email name programme email Booking day time topic Bookings for Personal Tutor • (A) check availability • see if tutor is available at a specific time • (B) see my appointments • list all my appointments for given period • (C) see details of my appointments • list all my appointments, including the tutor’s nameand office (C) With For (A) (B)
Tutor Booking Student 300 20 1500 Transaction Usage Maps frequency: per hour 1 1 For With avg: 20 max: 100 (C) (A) (B) avg: 40 max: 150 avg: 20 max: 200 0..* 0..*
Analyse Data Usage • more detailed analysis • (relations) attributes and the type of access • updated attributes – not candidates for access structures • attributes used in predicates • are candidates for access structures • attributes involved in joins • candidates for access structures • attributes affecting performance of critical transactions • higher priority for access structures • transaction analysis form • refer to Connolly, p.490
Analyse Transactions - Conclusion • essentially, the transaction analysis identifies the critical aspects related to the usage of the database (e.g., relations used frequently, attributes involved in “expensive predicates”, …) • on its basis, file organisations and indexes can be chosen
File Organisation • data has to be stored in an efficient way • all 4 operations (insert, retrieve, update and delete) require efficiency • efficient has different meanings in different contexts • different storage structures or file organisations represent efficient ways for different contexts; e.g. • “heap” structure is suitable for bulk-loading and bulk-retrieval (all student names, all programmes, …) • “hash” structure is suitable for “exact match” queries (student name = ‘Joe Bloggs’) • note that a structure that is efficient in one context may not be efficient in a different context
Index • a structure that allows the DBMS to locate records in a table/file more quickly • the decision as of • which attributes to be chosen as indexes, and • which type of indexes they should be (which type of file organisation) • … is determined by the results of the transaction analysis
File Organisation vs Index • file organisation – method of storing data on disk, with our without the use of indexes • index – data structure used to access records more quickly • primary and clustering index – part of the storage of the actual records; the records themselves are physically ordered according to the index • secondary index – auxiliary data structure; the records may be (usually are) unordered according to the index; • index is sometimes used to mean secondary index • the issue of indexes is subsumed by the issue of file organisation
File Structures/Types • Heap / unordered • Index Sequential Access Method • Hash • B+ tree
Heap / Unordered • records are written in the file in the same way as they are inserted, at the end of the file • insertion is efficient, in particular for bulk-loading • there is no ordering • retrieval is very inefficient if it involves predicates/conditions • similarly update and deletion • the space freed by deleted records is not automatically reused • administrator has to run routines
Indexed Sequential Access (ISAM) / Ordered • records are ordered on the basis of some attributes – the index field • primary index – ordering attributes are a key • one index value to a tuple • clustering index – ordering attributes are not a key • one index value to a group of tuples • sequentially ordered secondary indexes can also be created • what is the difference be between a clustering index and a secondary sequential index?
Use of ISAM Files • recommended • exact matching (based on the index field) • range of values (based on the index field) • drawback • ISAM indexes are static (created when the file is created) • not recommended • updates to index field • the access key sequence deteriorates
Hash (“random” or “direct” access) • a hash function calculates the address where each record is to be stored • the calculation is performed on the basis of some fields • records appear to be randomly distributed across the file space • the function should be chosen such that it leads to an as good as possible distribution of the records in the available space • problem : most hashing functions do not guarantee a unique address, because the file space much smaller than possible values of hash field • address generated by hash function BUCKET (with SLOTS) • COLLISION (SYNONYMS) same bucket, different slots • hash attributes – secondary index
Use of Hash Files • recommended • for retrievals based on exact matches, in particular when the access order (the order in which queries arise) is random • not recommended • retrieval based on pattern match • retrieval based on ranges of values • retrieval based on other fields than the exact hash filed • when the hash field is frequently updated
B+ Tree • B Balanced tree • more versatile than the hash structure • details – optional issue • recommended • exact match (index filed) • pattern matching (index filed) • range of values (index filed) • part index filed specification • advantage • B+ tree is dynamic – it grows as the relation grows • performance does not deteriorate with updates • B+ tree – secondary index
Choosing Indexes • consider results from transactions analysis • primary/clustering index • attribute(s) used in joins • attribute(s) most often used to access the relation • secondary indexes • trade-off: maintenance of an index vs. efficiency of queries • work in class on maintenance operations • choosing secondary indexes • index primary key (if not already a primary index) • index attributes often involved in joins and selection criteria • do not index attributes which are frequently updated • …
File Organisation - Conclusion • pre-defined file-structures exists that provide better efficiency of certain database operations in certain contexts
Conclusions • physical design • what it consists of • transaction analysis • identifies “hot-spots” of the database • file organisation and indexes • make work with the “hot-spots” more efficient