260 likes | 480 Views
The power of bioinformatics tools in cancer research. Early Detection Research Network, JPL Mentors: Dr. Chris Mattmann, Andrew Hart Andrew Clark Southern California Bioinformatics Summer Institute, 2009. Introduction Biomarkers and cancer research Early Detection Research Network (EDRN)
E N D
The power of bioinformatics tools in cancer research Early Detection Research Network, JPL Mentors: Dr. Chris Mattmann, Andrew Hart Andrew Clark Southern California Bioinformatics Summer Institute, 2009 EDRN @ JPL, SoCalBSI '09
Introduction Biomarkers and cancer research Early Detection Research Network (EDRN) The NCI & JPL EDRN Infrastructure Project objective eCAS Curator additions The eCAS Catalog and Archive Service Data curation Architectural & design considerations Software engineering Meta-data processing Results & conclusions Acknowledgements Agenda EDRN @ JPL, SoCalBSI '09
Introduction • Biomarkers and cancer research • Constant research is underway to discover and identify reliable biomarkers of cancer in the human body. • What is a biomarker? • “A biological molecule found in blood, other body fluids, or tissues that is a sign of a normal or abnormal process, or of a condition or disease.” • source: http://www.cancer.gov/dictionary/?searchTxt=biomarker EDRN @ JPL, SoCalBSI '09
Biomarker research • The more information that is collected and shared between research sites and medical laboratories: • The more effective diagnosis will become. • The more specialized treatments can be devised to minimize the devastating effects of cancer on its host. EDRN @ JPL, SoCalBSI '09
The Early Detection Research Network • The NCI is concerned with managing biomarker research data and disseminating information to the public. • Formed the EDRN in 1999 • “to provide up-to-date information on biomarker research” to the scientific and medical communities and to the general public. source: http://edrn.nci.nih.gov/about-edrn EDRN @ JPL, SoCalBSI '09
The Jet Propulsion Laboratory • FFRDC, operated by Cal-Tech, for NASA • JPL’s technology for cataloging and managing extremely large sets of data provided the underlying infrastructure needed by the EDRN to accomplish its own mission. EDRN @ JPL, SoCalBSI '09
The EDRN Infrastructure • My mentors, Dr. Chris Mattmann and Andrew Hart, and their team continue ongoing development of the underlying software grid. • JPL software engineers work with bioinformatics experts to develop the public interface to the EDRN, a web-based portal available to the general public: http://edrn.nci.nih.gov EDRN @ JPL, SoCalBSI '09
Project objective • Overall: • To participate as a bioinformatics software engineer at JPL. • To contribute to the EDRN software infrastructure. • Specifically: • Improve the functionality of the eCAS Curator. EDRN @ JPL, SoCalBSI '09
EDRN Catalog and Archive Service • JPL software customized for cataloging and archiving biomarker data, including specimen details, specimen images and related information. 1. Data Ingestion 3. Product Release A EDRN Staging Server EDRN Public Portal WWW B 2. Curation -Meta-data edits -Pub. survey & cross reference -Expert review C Research data Curator Pre-release data xml Released data Dataset meta-data EDRN @ JPL, SoCalBSI '09
eCAS data curation • Data ingested from research sites undergoes a curation phase before its publication to the public portal. 1. Data Ingestion 3. Product Release A EDRN Staging Server EDRN Public Portal WWW B 2. Curation -Meta-data edits -Pub. survey & cross reference -Expert review C Research data Curator Pre-release data xml Released data Dataset meta-data EDRN @ JPL, SoCalBSI '09
eCAS Curator • The curation activities would benefit from additional software tools as part of the overall eCAS workflow. 1. Data Ingestion 3. Product Release A EDRN Staging Server EDRN Public Portal WWW B 2. Curation -Meta-data edits -Pub. survey & cross reference -Expert review C Research data Curator Pre-release data xml Released data Dataset meta-data EDRN @ JPL, SoCalBSI '09
Architectural & design considerations • Software engineering: • EDRN tools are primarily web applications • Design and integrate modular components • Meta-data management: • Meta-data: information that describes the content of other information. • Meta-data management is crucial to the data curation and the operation of the EDRN system. EDRN @ JPL, SoCalBSI '09
Data curation with eCAS 1 Internal EDRN policy files contain meta-data definitions and configuration details that describe the dataset expected from each research site. 1. Data Ingestion 3. Product Release A EDRN Staging Server EDRN Public Portal WWW B 2. Curation -Meta-data edits -Pub. survey & cross reference -Expert review C Research data Curator Pre-release data xml Released data Dataset meta-data EDRN @ JPL, SoCalBSI '09
Data curation with eCAS 2 Curators edit and revise dataset meta-data to make the final product records complete and accurate. 1. Data Ingestion 3. Product Release A EDRN Staging Server EDRN Public Portal WWW B 2. Curation -Meta-data edits -Pub. survey & cross reference -Expert review C Research data Curator Pre-release data xml Released data Dataset meta-data EDRN @ JPL, SoCalBSI '09
Data curation with eCAS Accepted data made available through web portal. Meta-data definitions provide searchable fields and descriptions of dataset contents to portal users. 3 1. Data Ingestion 3. Product Release A EDRN Staging Server EDRN Public Portal WWW B 2. Curation -Meta-data edits -Pub. survey & cross reference -Expert review C Research data Curator Pre-release data xml Released data Dataset meta-data EDRN @ JPL, SoCalBSI '09
A dataset policy file . . . EDRN @ JPL, SoCalBSI '09
Dataset meta-data configuration EDRN @ JPL, SoCalBSI '09
Curator tool Browser based meta-data editor. EDRN @ JPL, SoCalBSI '09
Curator tool Selecting datasets for metadata editing Metadata items retrieved from backend. EDRN @ JPL, SoCalBSI '09
Results and conclusions • Final result • Meta-data management tool integrated with the eCAS and curation functionality incorporated into the workflow. EDRN @ JPL, SoCalBSI '09
Conclusion • The goal of software engineering in bioinformatics should be to: • support scientists’ activities • facilitate better research and collaboration • simplify/bring clarity to complex tasks EDRN @ JPL, SoCalBSI '09
Conclusion The combined effectiveness of software tools and expert curation make the EDRN a more powerful scientific resource that helps drive progress in biomarker research. EDRN @ JPL, SoCalBSI '09
Acknowledgements • Thanks to my mentors and supporters at JPL: • Chris Mattmann, Andrew Hart • Thanks to the SoCalBSI faculty and staff: • Dr. Momand, Drs. Johnston, Dr. Sharp, Dr. Warter-Perez, Ronnie Cheng • Thanks to the SoCalBSI funding sources: • The National Science Foundation • The National Institutes of Health • Economic and Workforce Development EDRN @ JPL, SoCalBSI '09