400 likes | 519 Views
Design data retrieval and manipulation for subset of ‘Gombe’ database using QBE Durga Gumaste Advisor: Dr. Shashi Shekhar June 10, 2003. Agenda. Objective Background and Motivation Related work and my contribution Porting of tables Query description Query optimization Summary Demo.
E N D
Design data retrieval and manipulation for subset of ‘Gombe’ database using QBE Durga Gumaste Advisor: Dr. Shashi Shekhar June 10, 2003
Agenda • Objective • Background and Motivation • Related work and my contribution • Porting of tables • Query description • Query optimization • Summary • Demo
Objective Design and implement queries to access and manipulated ‘Gombe’ chimpanzees data subset, such that queries can be modified by the user having no background of any Data Manipulation Language(DML)
Agenda • Objective • Background and Motivation • Related work and my contribution • Porting of tables • Query description • Query optimization • Summary • Demo
Background and motivation • Data about ‘Gombe’ chimpanzees • Collected since 1953 • Behavioral and location data • 15 tables • Average size: 10-12 MB • Dr. Jane Goodall has done active research of ‘Gombe’ chimpanzee for last 35 years • Jane Goodall Institute's Center for Primate Studies at the University of Minnesota • Data retrieval for analysis on the data • Frequent query modification • Ease of modification
Agenda • Objective • Background and Motivation • Related work and my contribution • Porting of tables • Query description • Query optimization • Summary • Demo
Related work • Earlier implementations • Oracle • Paradox • Limitations of earlier implementations • Ecologists not comfortable in modifying PL/SQL queries in Oracle • Paradox is not licensed at University of Minnesota
Present Implementation • Database: Microsoft Access 2000 • Ecologists familiar with MS Access environment • Desktop database • Microsoft office suite • University of Minnesota has a license for Microsoft Office • Provides QBE
My contribution • Port ‘Gombe’ database to MS Access • Implement new queries using MS Access • Helped behavioral ecologists to modify queries using MS Access QBE • Optimize queries
Agenda • Objective • Background and Motivation • Related work and my contribution • Porting of tables • Query description • Query optimization • Summary • Demo
Port tables to MS Access • Create tables in design view • Apply primary key constraints • Apply referential integrity constraints • Import tables using import utility
Verification of porting • Number of records present in .txt files • Follow: 8459 records • Count of records by count query
Agenda • Objective • Background and Motivation • Related work and my contribution • Porting of tables • Query description • Query optimization • Summary • Demo
Queries Nested, join, range Q1: Find all chimps arriving alone Q2: Include mothers arriving with off springs in Q1 Q3: Include siblings in Q1 Q4: include mothers and siblings in Q1 Q5: Find chimps arriving together with other chimp Single table, aggregate, point Q6: Find food count of food items in a particular month of a year (Find % food counts) Q7: Find duration for which food items are eaten in a particular month of a year(Find % food duration)
follow_arrival (A) follow_arrival (B) 1. Inner join (self join) on follow_arrival 2. Select chimps having fa_time_start difference more than 5 minutes for a particular follow on a particular date Inner join on A and B (self join) A.date=B.date A.follow=B.follow A.chimp<>B.chimp 3. Take location coordinates for such chimps from follow_map_position table by joining follow_arrival table with follow_map_position table Inner join with F A.date=F.date A.follow=F.follow A.chimp<>F.focal A.seq = F.seq Result Set Follow_map_position (F) Implementation (Q1) Chimps are said to be alone when arrival time between 2 chimps is more than 5 minutes Arrival time difference between A.chimp and B.chimp > 5 minutes
MS Access Implementation for Q1 • Few Inner joins conditions cannot be displayed in QBE • MS Accesses uses Dyna sets • Sub-query over base query • Base query in SQL (views) • Sub-query in QBE • Sub-query easy to change by ecologists
Q1 extension Chimps arriving alone • Mothers arriving with off springs are counted as arriving alone (Q2) • Chimps which arrive with their siblings are counted as arriving alone(Q3) • Both Q2 and Q3 (Q4)
chimp_id AL AO AP AR AT Sum of certainties Follow_arrival Certainty Value 1 1 0 0 not observed blank Follow_map_time Time interval adjustment 10:03 10:00 10:11 10:15 Follow_Arrival Derived table • Written in VB Group_composition_table AL AO AP AR 1 1 0 1 3 AL740101 1/1/1974 10:00 AM 2
Agenda • Objective • Background and Motivation • Related work and my contribution • Porting of tables • Query description • Query optimization • Summary • Demo
Query optimization in MS Access • Cost bases query optimization • MS Jet 4.0 • Table statistics • Rushmore optimization • Efficient use of indexes • Index intersection,union,minus
Compacting database Compact database using Compact utility provided by MS Access • De-fragmentation • Reordering database pages • Reclaim unused space • Original size: 1.1 GB • After compaction: 284 MB • Flags queries needing recompilation
Summary • Porting of data to MS Access • Query modification using QBE • Ease of writing and modifying queries • GUI • Queries over views • Base queries in SQL and sub-queries in QBE • Access uses dyna-sets • Derived tables created in VB • Multiple queries • Onetime queries • Indexes on join attributes improve the performance by 90-95%
Future work • Updating group composition table if new chimp gets added to chimp table
Demo • Query Description • Find out chimpanzees arriving alone (include relations) • Base query (view) in SQL • Sub-query in QBE • Modifications in QBE • Results of modifications
Acknowledgements • I would like to thank Prof. Shekhar for giving me this wonderful opportunity to work on this project and his precious guidance from time to time • I would also like to thank Prof. Pusey, Carson Murray, and Ian Gilby of Ecology Department for their help in understanding Gombe database. • I would like to thank Prof. Pusey and Prof. Srivastava for their time today