Lookups or “ the Good, the Bad and the Ugly ” John S. Lemon

Lookupsor “the Good, the Bad and the Ugly ”John S. Lemon

What do these have in common ? One person knows – rest of you will have to wait !

What I hope to cover • Based on the Aberdeen Maternity and Neo-natal Data Bank ( AMNDB ) • History of lookups – • Dummy case • SQL tabfiles • Second data base • Second data base + secondary indexes + lookup command

What is a “lookup” • Converting text -> numeric code • Or numeric code -> text • Data entry staff enter ‘uterine adhesions’ • Converted to ( and stored as ) 621.5 • Why ? • Space

Why use a ‘lookup’ • Alternatives are storing full text • ‘uterine adhesions’ - stored as 19 bytes / characters • This is one of the shorter ones !!! • Problem of different ‘spelling’ • Uterine Adhesions • UTERINE ADHESIONS etc. • Which to use when searching

Why use a ‘lookup’ • Store ‘uterine adhesions’ as 621.5 - stored as 8 bytes • less if using integers • or fixed length string codes ‘A6215’ - stored as 7 bytes

Why use a ‘lookup’ • Value labels - not practical • Size limit on label length • Record specific • Problems managing an increasing list • Ordering / Sorting • Large numbers of codes:text pairs Occupations 13872 Drugs 5357 Operations 2754 Diseases / Illnesses / Complications 4404

Dummy case • Only option in version 2 • Use a CASEID which was unique • In AMNDB used –999999 • This case only contained data for the ‘lookup’ record types • It worked but ….

Dummy case - problems • Specific to that data base • Not shared – unless • SIR FILE DUMP - from ‘active’ • READ INPUT DATA - to ‘research’ • Have to repeat for all data bases that need lookups

Dummy case - problems • Can only go ‘one way’ unless duplicate data with different Key Fields RECORD SCHEMA 99, ICD2NAMESORT IDS ICDCODE RECORD SCHEMA 98, NAME2ICDSORT IDS ICDNAME • Huge problems of maintenance

Dummy case - problems • MAX KEY SIZE • May have seen in LIST STATS • What is it ? • Need to have quick look at the ‘structure’ of a SiR record • Simplified not definitive view • If you want a complete definition – ask Tony !!

SiR record structure 1 CASEID Max Key Size Data 2 CASEID Max Key Size Data Actual Size CASEID Max Key Size Data 3 Actual Size • Essentially two components • Key - fixed length - same for ALL records • May not be ‘filled’ for all records • Data - variable length

Dummy case - problems • Without dummy case • Max Key Size - approx 20 bytes long • Text field – 50 characters long • Max Key Size - approx 60 bytes long • Extra 40 bytes • For EVERY record • 1.5 million * 40 bytes = approx 60 Mbytes • Trivial today but in 1988 ……………

Dummy Case – solutions (sic) • How to resolve size problems • Reduced length of text field – 30 chars • One way only • Only in one data base – data entry • Used ‘marvellous feature’ of sequential data base access • SIR3 file held on tape(s) • Only PROCESS CASES ALL • No CASE IS

Lookups use in AMNDB • Two main programmes / suites of programmes • Batch run to convert text -> code • Interactive programme for each record that requires lookups • Need both for different reasons

Lookups use in AMNDB • Batch run to convert text -> code • Does all record types in data base that has data for looking up • Only converts records where text code is already available • Marks successful lookups • Leave unsuccessful ones for interactive

Lookups use in AMNDB • One interactive programme per record with data for looking up • Functions in same way as batch except • If no match prompts for one of following - • Retype • Delete • Edit • Add new lookup code

SQL tabfiles • Stuck with Dummy case until • SQL tabfiles • SAVE TABLE • SQL indexes • One external file could hold lookups

SQL tabfiles • One external file – many advantages • One multi-user file for all lookups • Simplified maintenance - one file to backup • Multiple indexes per table ( cf. record ) • Max Key Size problem removed • BUT ……………

SQL tabfiles - problems • Can only see the ‘whole’ file picture • No equivalent of File dump / Add recs • Need SQL+ - clumsy • PQL programs - not intuitive • Could use Forms but not easy • Journalling was suspect • Ended up using EXPORT for backup • Exporting SQL tabfiles was idiosyncratic

SQL tabfiles - problems • If tabfile is ‘volatile’ – small frequent updates • Tabfile can get ‘corrupt’ • Verify ‘drops’ table instead of correcting • Updates appeared to work but later find records missing / corrupt • Just had to ‘live with it’ ……… and hope for better

Second data base • No worries about Max Key Size • Multi user • Reliable DBMS utilities and functionality • UNLOAD • Journalling • File Dump / Add Recs • Easily look at the data • But ……………………

Second data base • Considered but not used • Why ? • No alternative ‘views’ / indexes • Back to two copies of data RECORD SCHEMA 99, ICD2NAMESORT IDS ICDCODERECORD SCHEMA 98, NAME2ICDSORT IDS ICDNAME • No real advantages – ‘devil you know’

What do these have in common ? Still puzzled ??

Two data bases + Secondary Index • Then in SiR2002 • Two, or more, data bases • Secondary Indexes • LOOKUP command • Decided to ‘go for broke’ • Use all three ‘features’ to remove previous problems

Two data bases + Secondary Index • Two things spring to mind • Can of worms • Snail in a well

Two data bases + Secondary Index • The can of worms was trying to understand code written many years ago • How many of you have ‘revisited’ PQL you wrote 5 years ago ? • Can you remember what you were trying to do ?

Two data bases + Secondary Index • Do you add Comments to explain for future benefit C Only get records for Males over 60 • Use | for ‘inline’ comments . END PROCESS REC | PREGNANCY • It was ‘grey hair’ time – yet again !! • Not sure which is worse • teenage daughters • old SiR code

Two data bases + Secondary Index • By hard work and perseverance managed to ‘decode’ the old code • This time I added Comments as I worked it out !! • So that was the can of worms sorted

Two data bases + Secondary Index • Just left the snail in the well • How does this relate to SiR code • Climb 3 feet during day – slip back two feet at night • At almost every stage I encountered ASFs • ‘Another SiR Feature’

Two data bases + Secondary Index • Enormous thanks to Tony for help, aid, assistance and patience • Six new versions of SiR within one week • Gradually felt that I was ‘climbing’ higher and higher

Two data bases + Secondary Index • One day after yet another new version - programme ran OK • Message to Tony “I’m out of the well !! “ • Even so did some more testing

Two data bases + Secondary Index • Two hours later I sent a message using a phrase from this show “I’ve fallen in the water” • Yet another problem

Two data bases + Secondary Index • Yet again Tony came to the rescue • Apart from one bit of my code I need to sort the programmes are working OK • So how do we use • A second data base • Secondary Indexes • The LOOKUP command

Two data bases + Secondary Index • All lookups are held in a second, caseless data base • The key fields are the numeric codes to keep MAX KEY SIZE to minimum • Journalling is turned on

Two data bases + Secondary Index • The same code might refer to multiple text strings • Threatened Abortion • Thr Abr • TAbr • All mean same • Use AUTOKEY to cope • Secondary indexes on the text strings

Two data bases + Secondary Index • Experiences so far • Still testing but looks good • Faster and more reliable • Can look at the data easily • Correcting invalid data is easy • All the power and features of vPQL and DBMS utilities for maintenance • Only one system to learn / remember

Lookups – a summary • Dummy case • Rigid • Inflexible • Cumbersome • Can use DBMS etc. for maintenance • Now luckily replaced

Lookups – a summary • SQL tabfiles • Very flexible • Obtuse • No easy maintenance • SQL+ is cumbersome • Reliability / integrity of tabfiles • Better but for long term – flawed • Ad-hoc work – re-building every time

Lookups – a summary • Final solution with Second data base, Indexes & LOOKUP • Reliable • PQL is familiar • Fast • Combines good features of dummy case and tabfiles • Perhaps time will tell – but looks good

Lookups or “ the Good, the Bad and the Ugly ” John S. Lemon