1 / 20

wwPDB Common D&A Project January 28, 2010

wwPDB Common D&A Project January 28, 2010. Steering Committee Project Update. Status of D&A initial production deliverable: Sequence Editor tool development Integration within existing pipelines Status of WF infrastructure initial implementation:

Download Presentation

wwPDB Common D&A Project January 28, 2010

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. wwPDB Common D&A Project January 28, 2010 Steering Committee Project Update

  2. Status of D&A initial production deliverable: Sequence Editor tool development Integration within existing pipelines Status of WF infrastructure initial implementation: Sequence Processing components (external search, internal analysis etc) integrated by WF engine and manager into the “new” Sequence Processing Module. Integration of Sequence Processing Module into existing pipeline. RECONSIDER Timeline Estimate and Strategy Next Phase Ligand Processing: Planning Update report

  3. Overview of deliverable status for:Sequence Editor tool Deliverable timelines have been extended to enable full response to user testing input (expanded requirements) and to ensure development to agreed upon design. • Completion of Interface with additional prioritized requirements - projected Feb 15 • Integration within current production pipelines • Initial implementation of Master Format and format conversion support • In Use by annotators by Feb 25

  4. Sequence Editor Tool Technologies and Standards • Model View Controller (MVC) Design – • Separates data/application from presentation as much as possible • Client/Server protocol – • AJAX using JSON protocol • REST style service definitions • Server • Apache with embedded WSGI (mod_wsgi) • Application – • Python with C++ extensions (Boost/Python) All the good acronyms!

  5. Sequence Editor ToolArchitecture for Current and Future Deployment Current DP Pipeline PDB/FASTA PDBx/PreBlast PDB/PDBx Sequence Editor Tool Sequence Editor Sequence Data Store Annotated Sequence Data WFE/WFM Future Workflow DP Pipeline WFE/WFM

  6. Annotator graphical interface for Sequence Editing Prototype evaluation and prioritization of additional requirements by Annotators at all sites completed Jan 12 Expanded functionality development expected to be completed and available for user testing Feb. 15, including: Implements the capability to incrementally undo a process step (UNDO) Summarization of sequence conflicts Global editing features Integration of this Sequence Editor tool (interface) into the existing data processing pipelines (Feb 26) Input accepts existing sequence data files at PDBe and RCSB (e.g. PDBx + Blast report or PDB + FASTA) Output integration via intermediate file to be integrated via Maxit Accomplishments

  7. Master Format implementation (for current data model) PDB to Master Format translation working with MAXIT Final Test at PDBe Validation and testing at all sites. PDBj creation of new tool for Master Format Validation with extended diagnostics. Issues with Master Format will be ongoing - with evolution of the PDB format, Hybrid methods etc. Accomplishments

  8. Sequence Editor Tool DevelopmentLessons Learned • Iterative development and active Annotator involvement is essential – and takes time. • Addressing integration issues with existing systems in terms of modularity, process ordering and data availability poses significant challenges. • Agile process of development and planning supports adaptation to evolving requirements. • We will need to further consider the most efficient level of granularity for the deployment of new functionality in existing systems in future planning.

  9. Design Convergence AccomplishmentsMaster Format, API, WFM, WFE, UI Distributed development on a complex project is challenging Tag team development of WFE and API’s • Straw men articulation – flush out WFE/API requirements for representative Use Cases • WFE pseudo code developed against straw men. • API integration layer will be developed against this pseudo code. • WFE will then be implemented against the API

  10. Tracking and Status DB developed and installed at RCSB and PDBe for development purposes. Work Flow Manager (WFM) Prototype user testing on-going Requirements refined and prototype updated Infrastructure complete – to be deployed for testing this week Work Flow Manager User Interface (WFM UI) User prototype created, input received and prototype enhanced Initial Level 1 annotator interface signed off by annotators Level 2/3/4 interfaces prototyped and under review Level 3 /4 under further development Accomplishments: WF infrastructure -Integration of Sequence Processing

  11. PDBe resource • Workflow XML • Luana/Tom : 1 day total to complete annotator requirements • WFE component supporting Sequence Processing : • Tom, 1-2 days per week ongoing, estimating 5-6 days (3 actual weeks) to complete after all api’s are in place • WFM • Luana : currently full time – work is being prioritised to define the subset of requirements to be delivered in March. • Web resources : interfaces and WFM • External services –technology requirements have been defined. Timeline tbd. Critical Path. • Other resources • Wim : python expertise • Swanand : python expertise (after 13th Feb) – fall-back

  12. RCSB Resources • Web Tools - • Currently supporting development and alpha-testing sites • Will add production site for Feb deployment • Database Support – • MySQL database server for status and tracking database • Application Support • Project SVN code repository • JIRA issue tracking system • Project documentation and information site (Drupal) • Automated build system for API and application tools • People – • Vladimir – API and build system (Python/C++) • Li – DB system and status and tracking API (Python/SQL) • Rahip – Sequence Editor Tool (Javascript/CSS) • Zukang/Raul/John – DP applications (C++/Python)

  13. Updated Timeline Summary Sequence Processing 1. Sequence Editor Tool • Completion of Interface with prioritized additional requirements and beginning of final user testing - projected Feb 15 • Integration with current pipelines using Master Format In test by annotators by Feb 25 • In production – best estimate early March 2. Integration of Sequence processing components with new architecture (WFE/API and WFM) • User testing – April 3. Integration of module into Pipeline • Plan by end of March

  14. Competing/Complementary Priorities • Address On-going data quality issues and remediation • Three Validation task forces • Implementation of recommendations • New PDB Format – with the next 6 months? • De-programming Kim • For Ligand Processing: timeline end of March – early April Other strategic considerations • Stakeholders • Stress testing of new solutions against expectations and existing solutions must be managed and will take some time.

  15. Next Phase - Timeline Ligand Processing • Requirements • Plans in place for Annotator exchange • March requirements consolidation, initial design plan • March create overview plan and initial timeline • Kick off development • Deployment • Strategy to be defined based on current and ongoing lessons learned.

  16. These are cornerstone deliverables requiring intense study and design consideration – beyond the proof of concept. Organization of data, communication protocols, etc. Clear consensus of design features has required an evolution of understanding – requiring wetting of hands Ramp up of skill sets: Python, mmCIF (PDBe), EBI External services: web-service set up Site specific integration challenges Resource issues Things that have kept us up at night

  17. BACK UP SLIDES

  18. Data and Application API Design • Unified Python language implementation • Provides all access to data and applications for the workflow manager and workflow engine • Subcomponents of the API provide access to: • Data objects and data values • Applications and tools • Tracking and status information • Site level configuration information

  19. Functional Architectural design Will present progress and tracking information Will start/stop and restart the workflow engine in executing data processing tasks Will work in a fully distributed web-based mode Will provide a launch point for tasks requiring interactive or graphical interactions. Two modes defined – Immediate mode – all processing occurs in a single session (simple case). Deferred mode – requests for input are registered with the workflow manager for later processing by annotator Deliverable update: WFM Design

  20. Process Overview With GO BACK functionality

More Related