1 / 12

PepcDB Reporting at CESG: More Trials and Fewer Tribulations

PepcDB Reporting at CESG: More Trials and Fewer Tribulations. PPCW Bottlenecks Meeting 20 March 2007 Craig A. Bingman cbingman@biochem.wisc.edu (U54 GM074901-01 P50 GM064598 JLM, P.I.). CESG Bioinformatics. George N. Phillips Jr.: Faculty Executive Craig Bingman: Section leader

stesha
Download Presentation

PepcDB Reporting at CESG: More Trials and Fewer Tribulations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PepcDB Reporting at CESG:More Trials and Fewer Tribulations PPCW Bottlenecks Meeting 20 March 2007 Craig A. Bingman cbingman@biochem.wisc.edu (U54 GM074901-01 P50 GM064598 JLM, P.I.)

  2. CESG Bioinformatics • George N. Phillips Jr.: Faculty Executive • Craig Bingman: Section leader • Xiaokang Pan: PepcDB, domains • Gary Wesenberg: Scoring, RT, PDB • Bryan Ramirez: System administrator • Tony Kamenick: Assistant sysadmin Sesame • John L. Markley: CESG P.I. • Zsolt Zolnai: Sesame Project Managment • John Primm: Project Manager • David Aceti: QA, Sesame “Lab Master” All CESG Team Members

  3. TargetDB vs. PepcDB • TargetDB was conceived early/pre-PSI-1 as a mechanism for avoiding duplication of effort between structural genomics centers. • Asynchronous communication between centers and NIH. • TargetDB communicates project status of target only. • TargetDB is single-threaded. • TargetDB was not meant to communicate information to the outside scientific community. • PepcDB was conceived as a mechanism for communication of scientific details between centers and the outside world. • Asynchronous communication with the outside world. • PepcDB communicates target status, protocols and timeline of efforts. • PepcDB is multi-threaded. • PepcDB is a contractural obligation for all PSI-2 centers. • Along with structures deposited in PDB, and the materials repository, PepcDB will be one of the enduring legacies of PSI.

  4. CESG PepcDB, Past and Present Year 2-3 data • Successful implementation of Sesame (hierarchical relationships between db items.) • TargetDB-centric, single-threaded view • Targets were constrained to exist in one workgroup from selection to structure solution. • Protocols were primitive. • Year 4-5 data • Protocols became more descriptive. • Protocols described multiple pipeline stages. • Targets moved through multiple workgroups • Pipeline was assumed to move unidirectionally from Selection->Deposition • PSI-2 data • Atomic protocols describing single pipeline stage. • Pipeline is multipass, multithreaded, characterized by extensive salvage. • Targets move back to vector selection, from initial selection, PCR and entry vector • Pipeline is non-deterministic, adaptive, dynamic to maximize success.

  5. Failure of CESG PepcDB, Mark 1 Codebase had grown by accretion, not design. Code assumed linear, forward progression through pipeline stages. More than half of the code was devoted to data entry error trapping/handling. Global reset was required to handle new pipeline practices, dominated by multipath cloning strategy, multipath expression strategy, salvage intensive operation. New conceptualization of our PepcDB reporting was required. Core concept: Well-formed PepcDB = finite, directed, acyclic graph. Database items = nodes Directed links = edges Data in Sesame needed to be corrected.

  6. Visualization Tool for Graphs dot, a language for describing graphs dot has a very simple syntax digraph G { A -> B -> D; A -> C; } dot has powerful layout minimizers to display hierarchical graphs Implementations are available for perl, python, java, others CESG has used the perl variant of dot/Graphviz to produce plots of linkages between database items.

  7. Digraph G { A -> B; A -> C -> D; } Digraph G { A -> B -> D; A -> C -> D; } Digraph G { A -> B -> D; D -> A; }

  8. CESG PepcDB Stats • Protocols 68 • Targets 7553 • Trials 14044 • Protocol Instances 57195 • Each target has on average two trials • Each trial has on average about four protocols

  9. PSI PepcDB Toolkit • Project database capable of establishing hierarchical relationships between units of work. • Establish master database that manages unique keys for work units. • Implement barcodes (e.g. ZPL) that extend database to physical items. • Implement atomic protocols and associated actions. • Develop tool set for visualizing data. • Develop code capable of assembling lists of parent-child units of work, protocols, actions. • Rehearse data entry prior to pipeline implementation of new techniques. • Reach project-wide agreement on definition of actions and how to link units of work.

  10. Future • Push towards zero errors in PSI-2 PepcDB. • Continue correcting PSI-1 data. • Implement data visualization tools in Sesame. • Expand the scope of data reported to PepcDB. • Report all crystallization trials (year 5-> now) • Consolidate and report data for new tags (elemental analysis, mass, etc.) • Switch over to Sesame for PepcDB report generation.

More Related