1 / 22

Data Curation and Management activities within the UCT Computational Biology Group

Explore UCT group's activities in high-throughput biology, sequence annotation, and DAS development. Learn about data curation challenges, standards, and ontologies. See the importance of controlled vocabularies and compliance with international standards.

bruni
Download Presentation

Data Curation and Management activities within the UCT Computational Biology Group

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Curation and Management activities within the UCT Computational Biology Group Dr Nicky Mulder

  2. Outline • Activities at UCT: • High-throughput biology data • Sequence annotation • DAS annotation development • Issues we face • A note on standards and ontologies

  3. High-throughput biology data • Close ties with CPGR • Microarray data storage –BASE • Proteomics data: • Annotation –pipeline required • Storage –LIMS required

  4. BASE • BioArray Software Environment • Open source database for storage of array-type data • Manages raw data (images) and annotations • Has limited LIMS options • Can include specifications for MIAME compliance

  5. BASE Sample Information

  6. BASE Sample Information

  7. BASE experimental info

  8. Proteomics Data • Still in progress • Peptide identification programs • Additional cross-linking from results to public database annotations • Storage of experimental data and resulting identifications • Include MIAPE compliance • Linking to genomics data –standards required

  9. Sequence Annotation 1 • Paeano pipeline for annotation of cDNAs from non-model organisms • Uses collection of publicly available and custom software • Results are stored under projects • Links provided to array data in BASE

  10. Sequence Annotation 2 • Glossina (Tsetse) EST annotation project • Held annotation jamboree at UWC • Worked with Twiki tool developed by JBIRC • Data to be submitted to public databases

  11. Twiki system

  12. Twiki system

  13. DAS Annotation Tool • Distributed Annotation System –allows viewing of annotation from different sources • Can overlay your own data/annotation • Facilitates information sharing without issue of updates • Repositories distributed in different geographical locations • Extension of DASTy2 –developed at NBN • Development of DAS annotation tool underway

  14. DASTy

  15. Links to other DAS viewers

  16. DAS annotation tool Collaborative visual annotation tool - Annotation - Comments - Sequences - Features - Non positional features - Methodology of trust on a collaborative annotation process

  17. Data curation and management issues • HTB software licenses are expensive • Open Source not always maintained • Ensuring regular backups (data size) • Keeping data up to date • Researchers leave data after project –not updated to new versions • Privacy –researchers share data only with collaborators, patient data is private • Sharing and linking data

  18. Standards and ontologies • Use a controlled vocabulary (controlled list of terms) or ontology(set of terms with relations) • Enables easy data retrieval and sharing • Easy comparison of results from different labs • Compatibility with other labs/databases world-wide • Ease of uploading data into public databases • Unambiguous report of research

  19. Open Biomedical Ontologies • Central location for accessing well-structured controlled vocabularies and ontologies for use in the biological and medical sciences • Provides simple format for ontologies • Scope include anatomy, phenotype, development, disease, “omics”, experiment, etc. • http://obo.sourceforge.net

  20. Data exchange standards • Microarray standards –MIAME and MAGE • Proteomics Standards Initiative (PSI) • Systems Biology Markup Language (SBML) –computer-readable format for representing models of networks • Biological Pathways Exchange (BioPAX) –format for representing pathways

  21. Conclusions • Some tools in place for curation and management of different data types • Need better education of researchers to encourage this • Ontologies and standards are important in digital data curation and management, need to encourage compliance with international standards

  22. Acknowledgements • Funding: • Collaborations: • CPGR • Researchers at UCT

More Related