210 likes | 338 Views
Requirements of a Taxonomy Database Tcl-DB a Prototype. Outline Requirements Hierarchy Alternative Search Terms: Synonyms and Vernaculars Alternative Spellings Alternative Classifications Tcl-DB Prototype System Tcl-DB Structure 2NF Extensibile: Adding a new data source e.g. NCBI
E N D
Outline • Requirements • Hierarchy • Alternative Search Terms: Synonyms and Vernaculars • Alternative Spellings • Alternative Classifications • Tcl-DB Prototype System • Tcl-DB Structure • 2NF • Extensibile: Adding a new data source e.g. NCBI • Tcl-DB: UID Tracking • Tcl-DB: Stats • Utility and Further Work
3. Alternative Spellings: Caenorabditis elegans, C elegans and Caenorhabditis elegans
Assertion: Resolving the M:M with an association entity
Node: Hierarchical Queries Nested Set, Path and Connect by >select count(name_id) from node start with name_id = ‘100891' connect by prior name_id = parent_name_id; >select count(name_id) from node where path like '/%'; >select count(name_id) from node where left_id between 1 and 9290;
synonym_name and vernacular: subtypes,multi-valued attributes or weak entities
Tcl-DB: Procedures, Packages and Functions: Adding a new data source e.g. NCBI
Step 4: fill synonym_name table in tcl schema Step 5: fill vernacular table in tcl schema
Tcl-DB: UID Tracking • after name data load: • Run two joins on name and nids_mv • Nids – name_id when the name_text exist • Null – name_id when the name_text not exist • Update name and give all new names a NID • Update name give all names their original NID • Refresh the NID_view
Tcl-DB: Utility and Further Work • Computing Interesting Stats: • How much overlap between ITIS and NCBI? • How many names unique to NCBI? • How many of these are binomials Vs ‘environmental sample 256’ • How many of these names can be matched allowing for 1 – 3 letter mismatches. • NCBI taxonomy – data quality, Integrity and Usability? • Transitively closing the Synonyms Table and Vernacular Table • Building an interface. • Spell checkers
Lots of Questions?How do we use this to build taxonomically aware databases?How about updates to the data?Database links , Web services, Simple DB Cross References?Use Genbank Model?Open to Suggestions/Ideas!Do we need to think about:PhyloCode?Type Specimens?