400 likes | 526 Views
Lookups or “ the Good, the Bad and the Ugly ” John S. Lemon. What do these have in common ?. One person knows – rest of you will have to wait !. What I hope to cover. Based on the Aberdeen Maternity and Neo-natal Data Bank ( AMNDB ) History of lookups – Dummy case SQL tabfiles
E N D
What do these have in common ? One person knows – rest of you will have to wait !
What I hope to cover • Based on the Aberdeen Maternity and Neo-natal Data Bank ( AMNDB ) • History of lookups – • Dummy case • SQL tabfiles • Second data base • Second data base + secondary indexes + lookup command
What is a “lookup” • Converting text -> numeric code • Or numeric code -> text • Data entry staff enter ‘uterine adhesions’ • Converted to ( and stored as ) 621.5 • Why ? • Space
Why use a ‘lookup’ • Alternatives are storing full text • ‘uterine adhesions’ - stored as 19 bytes / characters • This is one of the shorter ones !!! • Problem of different ‘spelling’ • Uterine Adhesions • UTERINE ADHESIONS etc. • Which to use when searching
Why use a ‘lookup’ • Store ‘uterine adhesions’ as 621.5 - stored as 8 bytes • less if using integers • or fixed length string codes ‘A6215’ - stored as 7 bytes
Why use a ‘lookup’ • Value labels - not practical • Size limit on label length • Record specific • Problems managing an increasing list • Ordering / Sorting • Large numbers of codes:text pairs Occupations 13872 Drugs 5357 Operations 2754 Diseases / Illnesses / Complications 4404
Dummy case • Only option in version 2 • Use a CASEID which was unique • In AMNDB used –999999 • This case only contained data for the ‘lookup’ record types • It worked but ….
Dummy case - problems • Specific to that data base • Not shared – unless • SIR FILE DUMP - from ‘active’ • READ INPUT DATA - to ‘research’ • Have to repeat for all data bases that need lookups
Dummy case - problems • Can only go ‘one way’ unless duplicate data with different Key Fields RECORD SCHEMA 99, ICD2NAMESORT IDS ICDCODE RECORD SCHEMA 98, NAME2ICDSORT IDS ICDNAME • Huge problems of maintenance
Dummy case - problems • MAX KEY SIZE • May have seen in LIST STATS • What is it ? • Need to have quick look at the ‘structure’ of a SiR record • Simplified not definitive view • If you want a complete definition – ask Tony !!
SiR record structure 1 CASEID Max Key Size Data 2 CASEID Max Key Size Data Actual Size CASEID Max Key Size Data 3 Actual Size • Essentially two components • Key - fixed length - same for ALL records • May not be ‘filled’ for all records • Data - variable length
Dummy case - problems • Without dummy case • Max Key Size - approx 20 bytes long • Text field – 50 characters long • Max Key Size - approx 60 bytes long • Extra 40 bytes • For EVERY record • 1.5 million * 40 bytes = approx 60 Mbytes • Trivial today but in 1988 ……………
Dummy Case – solutions (sic) • How to resolve size problems • Reduced length of text field – 30 chars • One way only • Only in one data base – data entry • Used ‘marvellous feature’ of sequential data base access • SIR3 file held on tape(s) • Only PROCESS CASES ALL • No CASE IS
Lookups use in AMNDB • Two main programmes / suites of programmes • Batch run to convert text -> code • Interactive programme for each record that requires lookups • Need both for different reasons
Lookups use in AMNDB • Batch run to convert text -> code • Does all record types in data base that has data for looking up • Only converts records where text code is already available • Marks successful lookups • Leave unsuccessful ones for interactive
Lookups use in AMNDB • One interactive programme per record with data for looking up • Functions in same way as batch except • If no match prompts for one of following - • Retype • Delete • Edit • Add new lookup code
SQL tabfiles • Stuck with Dummy case until • SQL tabfiles • SAVE TABLE • SQL indexes • One external file could hold lookups
SQL tabfiles • One external file – many advantages • One multi-user file for all lookups • Simplified maintenance - one file to backup • Multiple indexes per table ( cf. record ) • Max Key Size problem removed • BUT ……………
SQL tabfiles - problems • Can only see the ‘whole’ file picture • No equivalent of File dump / Add recs • Need SQL+ - clumsy • PQL programs - not intuitive • Could use Forms but not easy • Journalling was suspect • Ended up using EXPORT for backup • Exporting SQL tabfiles was idiosyncratic
SQL tabfiles - problems • If tabfile is ‘volatile’ – small frequent updates • Tabfile can get ‘corrupt’ • Verify ‘drops’ table instead of correcting • Updates appeared to work but later find records missing / corrupt • Just had to ‘live with it’ ……… and hope for better
Second data base • No worries about Max Key Size • Multi user • Reliable DBMS utilities and functionality • UNLOAD • Journalling • File Dump / Add Recs • Easily look at the data • But ……………………
Second data base • Considered but not used • Why ? • No alternative ‘views’ / indexes • Back to two copies of data RECORD SCHEMA 99, ICD2NAMESORT IDS ICDCODERECORD SCHEMA 98, NAME2ICDSORT IDS ICDNAME • No real advantages – ‘devil you know’
What do these have in common ? Still puzzled ??
Two data bases + Secondary Index • Then in SiR2002 • Two, or more, data bases • Secondary Indexes • LOOKUP command • Decided to ‘go for broke’ • Use all three ‘features’ to remove previous problems
Two data bases + Secondary Index • Two things spring to mind • Can of worms • Snail in a well
Two data bases + Secondary Index • The can of worms was trying to understand code written many years ago • How many of you have ‘revisited’ PQL you wrote 5 years ago ? • Can you remember what you were trying to do ?
Two data bases + Secondary Index • Do you add Comments to explain for future benefit C Only get records for Males over 60 • Use | for ‘inline’ comments . END PROCESS REC | PREGNANCY • It was ‘grey hair’ time – yet again !! • Not sure which is worse • teenage daughters • old SiR code
Two data bases + Secondary Index • By hard work and perseverance managed to ‘decode’ the old code • This time I added Comments as I worked it out !! • So that was the can of worms sorted
Two data bases + Secondary Index • Just left the snail in the well • How does this relate to SiR code • Climb 3 feet during day – slip back two feet at night • At almost every stage I encountered ASFs • ‘Another SiR Feature’
Two data bases + Secondary Index • Enormous thanks to Tony for help, aid, assistance and patience • Six new versions of SiR within one week • Gradually felt that I was ‘climbing’ higher and higher
Two data bases + Secondary Index • One day after yet another new version - programme ran OK • Message to Tony “I’m out of the well !! “ • Even so did some more testing
Two data bases + Secondary Index • Two hours later I sent a message using a phrase from this show “I’ve fallen in the water” • Yet another problem
Two data bases + Secondary Index • Yet again Tony came to the rescue • Apart from one bit of my code I need to sort the programmes are working OK • So how do we use • A second data base • Secondary Indexes • The LOOKUP command
Two data bases + Secondary Index • All lookups are held in a second, caseless data base • The key fields are the numeric codes to keep MAX KEY SIZE to minimum • Journalling is turned on
Two data bases + Secondary Index • The same code might refer to multiple text strings • Threatened Abortion • Thr Abr • TAbr • All mean same • Use AUTOKEY to cope • Secondary indexes on the text strings
Two data bases + Secondary Index • Experiences so far • Still testing but looks good • Faster and more reliable • Can look at the data easily • Correcting invalid data is easy • All the power and features of vPQL and DBMS utilities for maintenance • Only one system to learn / remember
Lookups – a summary • Dummy case • Rigid • Inflexible • Cumbersome • Can use DBMS etc. for maintenance • Now luckily replaced
Lookups – a summary • SQL tabfiles • Very flexible • Obtuse • No easy maintenance • SQL+ is cumbersome • Reliability / integrity of tabfiles • Better but for long term – flawed • Ad-hoc work – re-building every time
Lookups – a summary • Final solution with Second data base, Indexes & LOOKUP • Reliable • PQL is familiar • Fast • Combines good features of dummy case and tabfiles • Perhaps time will tell – but looks good