1 / 44

Digital Archives for Molecular Microscopy

A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK Matthew T. Dougherty NCMI - Baylor College of Medicine Houston, Texas. Digital Archives for Molecular Microscopy. Bioimage Informatics. Informatics in support of biological imaging

roland
Download Presentation

Digital Archives for Molecular Microscopy

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A community database for biological research Christoph Best European Bioinformatics Institute, Cambridge, UK Matthew T. Dougherty NCMI - Baylor College of Medicine Houston, Texas Digital Archives forMolecular Microscopy

  2. Bioimage Informatics Informatics in support of biological imaging Why? Image data rapidly increasing (Confocal) Fluorescence microscopy (Cellular B.) EMDB: Electron Microscopy (Structural Biology) High-throughput methods (Genome Biology) Enabling science by making data accessible, reliable, and understandable S.Haertel, U. Chile EMDB, EBI J. Swedlow, U. Dundee Open Microscopy Environment Quality assessment Standards&Conventions Public Databases

  3. Structural Databases at EBI Protein Databank (PDB) Atomic structures (positions of atoms) PDB file format, mmCIF Derived from X-ray crystallography Long tradition, curated data base Huge: 65,000+ entries, 3 wwPDB sites Electron Microscopy Databank (EMDB) Part of PDB at EBI and Rutgers 600 density maps of macromolecular structures and subcellular complexes Started 2002 Curated, but limited metadata, experiment info XML-based

  4. SCIENTIFIC BACKGROUND

  5. Electron microscope From Schweikert, 2004 Biocenter, U Helsinki

  6. Single-particle method • Molecular structure • Many images computationally combined • 3D from 2D • resolution increase by avaraging Tripeptidyl-peptidase II (TPP II) courtesy of B. Rockel, Martinsried

  7. Single-particle analysis: GroEL to 4A Ludtke et al, Structure 2008

  8. Data Management Issues Initial EM images: O(1000), 4k x 4k -> O(10GPixel) Particle stacks: O(100,000), 256x256 -> O(10 GPixel) Final data set: 1 MVoxel small Processing power: O(100) cores, some weeks, lab-owned clusters Software: 1970s FORTRAN codes, 1990s C codes fragmented communities, lack of standards

  9. Electron tomography 3D reconstruction by taking a series of images from different angles Difficulty: Nanometer accuracy Problems: Limited tilt range ↔ missing wedge⇒ distortion Imperfections of the tilt ↔ alignment⇒ limited resolution Computational reconstruction algorithms

  10. Tomography of eukaryotic cells PROJECTION SLICE O. Medalia et al, Science, 2002 Dictyostelium discoideum

  11. Image enhancement Before Cytoskeleton of Spiroplasma melliferum J. Kürner et al., Science, 2005

  12. Image enhancement After J. Kürner et al.,Science, 2005 yellow: geodetic line

  13. Automated image analysis Automatic segmentation to identify points/lines/surfaces A. Linaroudis, Ph.D. Thesis, 2006 Automatic Manual

  14. Data Management Issues Original data: 60 images, 8k x 8k -> O(4 GPixel) Reconstruction: 8k x 8k x 256 -> O(16 GPixel) ? Software: 1970s algorithm in 1990s software Visualization: “let's buy more memory” Future: web-based applications (Google Maps) ?

  15. The Electron Microscopy Data Bank contains EM-derived density maps complementary to coordinate sets in PDB established 2002 @ EBI (Kim Henrick) web-based submission and retrieval hand-curated (R. Newman) A bit like Ebay – and you won't make any money, either

  16. THE ELECTRON MICROSCOPY DATA BANK

  17. A Unified Data Resource for EM NIH-funded joint project Baylor College of Medicine, Houston (W. Chiu, M. Baker) Rutgers University, New Jersey [H. Berman, C. Lawson) PDBe, EBI, Cambridge, UK [K. Henrick, C. Best, R. Newman Baylor College of Medicine Houston, TX European Bioinformatics Institute, Cambridge, UK Rutgers University, Piscataway, NJ

  18. Characteristics Curated Community Archive: PDB and EMDB NIH, EU (in past), and BBSRC funding (+ EMBL) Worldwide cooperation Advisory boards and task forces from the community Open deposition and retrieval→ Alternative access systems by other institutions 760 entries, 26 GB data ca 100 entries/year curation both in Europe and US

  19. Growth of EMDB

  20. EMDep deposition system 750 entries, current rate approx. 15-20/month Contents of an entry:Metadata (XML header) → experimental metadataMap (any format, converted to CCP4/MRC)Additional files Java/Tomcat/XML

  21. Unified data resource plan

  22. Joint deposition system

  23. EMDB search system Java/Tomcat

  24. EMDB search system Java/Tomcat

  25. EMDB Atlas pages XSLT

  26. ISSUES

  27. Metadata management Difficult: many rounds of consulting the community Still most fields remain empty Data harvesting LIMS, PIMS -> rarely used Processing pipelines, image processing software-> Lack of standards, idiosyncrasies Image formats: Appalling lack of standards

  28. Data issues Current: Deposit final result of experiment and computation How much of original/intermediate data should be deposited? Issues: Cost / Practicability Reproducibility of experiment Intellectual property (un-exploited results?) Usefulness

  29. Non-data issues Embargo: Image data can be withheld up to two years Allows original researcher to further exploit them Journals and funders must define: what data must be deposited when they are to be released Quality Standards: Require community acceptance Technically difficult Data Bank does enrich/annotate, but does not do science → quality standards must be set by scientists

  30. Image data formats Current: Variety of historical ad hoc formats Unclear definitions, variations in different software Need: Interoperability Standards Technical level? Acceptance? → Question for the community HDF5 Common container format to deal with numerical data Heavyweight library, but widely available (but Java?) Would at least solve low-level format problems Metadata format still needs to be specified

  31. Ontologies Systematic way to define classes of objects attributes of these objects relationships between objects Provides framework for metadata models Advantage: Powerful formal method Disadvantage: Not yet widely used

  32. TECHNICAL DEVELOPMENTS

  33. Rich data sets Submissions consist of maps (increasingly more than one) relations between data sets → unexpressed XML-based standards for represen-ting relationships between data: Subject-predicate-object relationships (RDF framework) Harvesting interface to EM processing software Web-based visualization for sub-mission and retrieval, complex sub-missions assembled interactively (AJAX)

  34. Rich data submissions

  35. Possible XML representation

  36. Bioimage informatics tools • Current EMDB interface: • simple and efficient • but must be extended to accommodate more complex experiments • OMERO interface: • geared at labs, notpublic databases • All the beauty of AJAX • high-performancevisualization

  37. Bioimage informatics tools BISQUE/BISUICK (UCSB) multichannel images lab notebook tagging image markup

  38. Current Imaging Workflow Paradigm No Standards Experiment? Image? Analytics? Annotations? Jason Swedlow (U. Dundee)

  39. Towards Image Informatics

  40. OMERO in 2007/8/9 Jason Swedlow (Univ. Dundee)

  41. CONCLUSIONS

  42. A Virtual Research Community Grid/cloud computing /storage Imaging Centers in house storage storage distribution quality assessment acquisition, storage, and management of images storage and computing engines data submission Databases Software data harvesting USERS

  43. CONCLUSIONS Community data bases are a central part of the Scientific Data Infrastructure Image databases rapidly growing Technical challenges: data formats, size Standards and interoperability Improve metadata collection Keep the community engaged

More Related