500 likes | 651 Views
Creating and Maintaining a Database. The DBA’S Job. Design Logical Design Physical Design Documentation Implementation Test Performance Security Concurrent Updates. Maintenance Backup Recovery Data Integrity New Releases SIR Application. Tasks. Designing a Relational Database.
E N D
Creating and Maintaining a Database The DBA’S Job New York Conference 2005
Design Logical Design Physical Design Documentation Implementation Test Performance Security Concurrent Updates Maintenance Backup Recovery Data Integrity New Releases SIR Application Tasks
Designing a Relational Database • Normalization • Eliminate redundant data • Identify data dependencies – keys • 1st Normal Form • One value per column • Unique primary key • 2nd Normal Form • No subsets of data in multiple rows of a table • 3rd Normal Form • All columns fully dependant on primary key
Possible Tables • Order - Order # • Customer # • Order Item - Order # Line # • Product Code • Qty • Unit Price ? • Customer – Customer # • Address • Product – Product # • Description • Unit Price
Keys • Must be unique • Good if real world • Employee Id/Product Code etc. • May not be the only access required • Should be short • Avoid unformatted alphabetic • If subordinate repeating group, consider sequence number
Normalized Implementation • Know the rules • Know the application • Alternatives • How many repeats of a column/group? • Dependent data volatility/convenience • Document • Variables – labels, descriptions • Records – keys, variables, foreign keys
Case Definition Case Id Max Counts Record Definition Key Fields Max Counts Default Security Variables within records Documentation command for case and record Variable Definition Format & Position Missing Values Valid Values Value Labels Categorical Vars Variable Ranges Variable Label Extended label for variable documentation Variable Security SIR Schema
60+ database functions 30 tabfile functions Examples: NRECS RECNAME NKEY KEYNAME NVARS VARNAME VARLABSC VFORMAT VTYPE Sec Index Functions DBINDS DBINDR DBINDV DBINDT Schema functions in PQL
Quick Data Dictionary • Four Record Types • Variables • Records • Record keys • Record data • Populate from any database • Check consistency
Example Data Dictionary • Create • Populate from MNYR • 55 record types • 2216 variables in records • Check consistent use of variables • Labels • Formats • Types • Identify foreign keys • Look at secondary indexes
SIR Structures • Multiple Database • Until SIR2000 exactly one database in SIR session • Design suggests separate databases for separate hierarchies • Had to use ‘dummy’ cases in single database • Inverted Lists • Until SIR2002 no secondary index • Had to use ‘dummy’ cases for inverted list • Auto Increment Keys
Physical Structure • Single Data File • Two types of blocks • Index • Contain keys plus pointers to other blocks • Single top level block • From one to six further levels • Bottom level points to data block • Data • Contain keys and data
SIR Data File Top LevelIndex Index Level 1 Index Level 1 Index Level 1 Bottom Level Index Data Blocks
Data Record • Header • Size • Update level • Lock status • Separate Key in front of record • All keys same size in single database • Case id, record number,record key fields • Special so can be searched • Record organized by data format • Real8, real4, I4, I2, I1, Character
CIR • One per case • Count for each record type • e.g. max rec types – 100100 integers • I1 – up to 123 • I2 – up to 32,000 • I4 – over this • Common variables
Size of Blocks • Index • Calculated from key size and number of records • Minimum 2K (253 dwords) Maximum 32K • Data • Calculated from maximum record size and number of records • Minimum 2K (254 dwords) Maximum 32K • Minimum 4 records per block
Index Calculation • Example: Key Length 16Number of records 1 million • At 4 per block need 250,000 blocks • Each index entry takes 3 dwrds • Key in dwords + 1 dwrd for pointer & count) • Minimum block holds 253/3 – 84 entries • Top level Single block 84 entries • Second level 84 blocks each with 84 entries • Two level index points to 7,056 data blocks • Three level points to 592,704 data blocks
Add first record Start with 3 blocks Top Level Index 1 entry key of record Points to second level Second level index 1 entry Key of record Points to data block Data Block - 1 entry Add records Find data block Put record in block in correct key sequence If new record first in block, update higher level that pointed Block Too Big Split into two blocks Add new key to higher level Data Block Growth
Loading Factor • How to split block • Records being added randomly • Split block in middle • Any block added to likely to have space • .5 loading factor • Records added in ascending key • Leave original block as full as possible • All adding to new block • .99 loading factor
Suggested Loading Factors • Standard Updates - .5 • If set very high and activity all on original block, lots of empty new blocks • IMPORT - .99 • Let standard updates split blocks when needed • RELOAD - .99 • no block splitting • Block filled to loading factor • Space for largest record
List Stats Info • Number of Index Levels 2 • Max Entries Per Index Block 42 • Index/Data Block Size 253/3314 • Active/Inactive Data Blocks 92/0 • Active/Inactive Index Blocks 4/0 • Keysize In Bytes 34 • Min/Max Record Size 0/808
Inactive Blocks • New Blocks added at end • List maintained of empty blocks • Re-used when new block needed • All records deleted in stand alone • Block update strategy in Master
Master • Allow multiple users to update a database concurrently • Intended primarily for multiple interactive users • Communicates via TCP/IP • Machine dependent database access • Provides a consistent database view for independent retrievals from database
Master Operation • Start Master • Starts with an address • Waits for client message • Does nothing else, NO database access • Master is NOT permanently connected to any specific database • Client tells master which database to connect to
Database Access • Database is opened and closed during SIR session as needed • During PQL retrievals • During utilities (No master) • Batch Data Input • Export, Unload, Spreadsheet, … • During schema updates (No master) • ‘Old’ Forms while form is running
Database Access • Database open for write • Single User - Exclusive Use • Master - Shared Read • SIR database files • sr1/sr2 - meta data – needed by both client (read only) and master • sr3 - data – controlled by master • sr4 - procedures – controlled by client • sr5 - journal – controlled by master • sr6 - sec. index – controlled by master
Single user SIR allows Multiple Readers OR a Single Writer to a database Master: How it WorksPart 1 SIR (Copy 1) SIR (Copy 2) SIR (Copy 3) OR User AReads & Writes Exclusively Users B & C ShareRead Only
Master allows multiple writers, readers plus independent readers Master: How it WorksPart 2 SIR (Copy 1) SIR (Copy 2) SIR (Copy 3) Users A & BReads & Writes User C IndependentRead Only Master SIR Data File SIR Data File
How does Master work? • Client changes access to use Master • Lock Manager for clients accessing through Master • Delayed view of updates ‘Difference File Copy’ for independent readers
Client • SIR session switches from single user to use a specific master • Master must be available at this point • All subsequent retrievals then automatically use Master • Various utilities not available • Sends Master a request for single data record at a time • Master selects on key values • Client does any selection on data values • Data requests preceded by lock requests
Master • Gets initial logon from client • Allocates identifier • Gets database open from client • Checks if already known (open by another client) • Allocates identifier • Database Identification • Full pathname is passed by client • Path is ‘as seen’ by client • Client needs to find database • Master needs to find database • Master needs to know that database referenced by multiple clients is same database
Master Resource Control • Gets request for lock on resource (e.g. case/record key) • Checks lock table • Creates entry if resource available • Gets request for record retrieval • Gets request for record update
Lock Types • Transmitted by client from PQL. Checks existing lock on resource 1 = Null - becomes exclusive in Update, concurrent read in retrieval 2 = Concurrent read - Fails if exclusive 3 = Concurrent write - Fails if protected or exclusive 4 = Protected read - Fails if concurrent write, protected write or exclusive 5 = Protected write - Fails if concurrent write, protected or exclusive 6 = Exclusive - Fails if any prior lock
Locks in PQL • Ignored in single user mode • RETRIEVAL • LOCK = CR, CW,PR,PW,EX (2,3,4,5,6) • CIRLOCK, RECLOCK • Default: Update –Ex Retrieval – CR • CASE/RECORD commands • LOCK = numeric_expression • Nested case/records inherit outer lock • Lock held until NEXT or EXIT at this level
Locked Case/Record • Block is entered • Variables set to undefined • Test status with functions: • SYSTEM(36) = 1 Record available • SYSTEM(37) = 1 Case available • SYSTEM(38) = 1 Master mode • Wait and retry, tell user with option,… • RETRY CASE|RECORD
‘Delayed’ Updates • Enables independent retrieval to have consistent view of data i.e. no updates seen while retrieval running • Master creates local copy of master index block • Whenever index or data block rewritten for first time, Master allocates new block • Keeps list of redundant blocks (index & data) • Identical process on secondary indexes
Difference File Copy • Makes updates available to independent retrieval • Increments update level • Creates journal header • Writes master index • Makes redundant blocks available if no other users (can get exclusive access)
Managing Master • Start Parameters • MST = • PW = • DFC=
MST = parameter • Master finds machine name, port 3000 • MST = change port number to even_number • CLIENT MST = machine_name[:port] • The machine name consists of a host and a domain. It makes the start up for clients faster to quote both the host and domain name (DNS) • Start Master - Master started SirNT:3000 • Start Forms • MST=SirNT • MST=SirNT.sir.com.au
Other parameters • PW = password • Any remote user who wants to administer master must specify a matching password • DFC = minutes since a difference file copy which would force an automatic copy
Administering Master • Interrupt • No users being served • Commands • No password • Usage Statistics • Remote • Other users still active • Menu driven • Password Protected
Administering Master • List logged on users • List attached databases • Stop • Immediately • After users logoff • Difference File copy • Set interval
Backup and Recovery • Unload • Header • Internal copy of database • Machine specific/SIR version specific • Brings all unloaded records up to current schema definition • Can have multiple unloads on same file • Accessed sequentially • Journal • Header • Copy of database record after update
Update Level & Restructure • Update level incremented when database open/closed for update • When record written, update level held in record header • If schema changed old and new version kept with update level • When record read, if record update level earlier than schema change record is transformed • When record written, in new format
Immediate Unload • Length of key changes • Record type in key so increase in length forces (Number of record types > 123) • Key definition changes for existing record • Record is locked until reloaded
Reload • Reload takes specific unload • Defined by update level • Database is recreated • If complete set of journals, can be applied to bring reloaded database up to date • ITEMIZE lists unloads or journals
EXPORT • Text version of database • Machine Independent • IMPORT rebuilds completely • Machine Independent • SIR version independent • Choose for long term archive
VERIFY • Walks index • Retrieves each data block • Checks counts and pointers • Reports structural problems • Patch puts calculated value in counts • Clear corruption flag • Walks secondary indexes
Creating and Maintaining Databases New York Conference 2005