910 likes | 1.22k Views
Bioinformatics: Definitions, Challenges and Impact on Health Care Systems. Joyce Mitchell, Ph.D. University of Utah Sept 29, 2005 NLM’s Wood’s Hole Informatics Course. Outline for Talk. What is Bioinformatics? Health Informatics compared to Bioinformatics
E N D
Bioinformatics: Definitions, Challenges and Impact on Health Care Systems Joyce Mitchell, Ph.D. University of Utah Sept 29, 2005 NLM’s Wood’s Hole Informatics Course
Outline for Talk • What is Bioinformatics? • Health Informatics compared to Bioinformatics • Problems considered in Bioinformatics • Genomics, proteomics, transcriptomics, etc • Genomics data and patient care • Impact of Bioinformatics on Health Information Systems
Central Dogma of Molecular Biology Transcription DNA RNA Phenotype Protein Phenotype Translation Replication This happens in Cells.
1. What is Bioinformatics? Definitions first
NIH Working Definition Bioinformatics: Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data. http://www.bisti.nih.gov/CompuBioDef.pdf
Another Definition • An interdisciplinary area at the intersection of biological, computer, and information sciences necessary to manage, process, and understand large amounts of data, for instance from the sequencing of the human genome, or from large databases containing information about plants and animals for use in discovering and developing new drugs. www.isye.gatech.edu/~tg/publications/ecology/eolss/node2.html
Another definitionNCBI (National Center for Biotechnology Information Bioinformatics is the field of science in which biology, computer science, and information technology merge into a single discipline. The ultimate goal of the field is to enable the discovery of new biological insights and to create a global perspective from which unifying principles in biology can be discerned. There are sub-disciplines in bioinformatics. http://www.ncbi.nlm.nih.gov/About/primer/bioinformatics.html
2. Health Informatics Compared to Bioinformatics Same methods, different application domains
Different Areas of Strengths • Bioinformatics has much more data available on the Internet than Health Informatics • Much more progress on database integration across multiple data sources • Health Informatics has much more need for aggregation of national statistics • Much more progress on terminologies for integration of data
Bioinformatics & Health Informatics • Bioinformatics is the study of the flow of information in biological sciences. • Health Informatics is the study of the flow of information in patient care. • These two field are on a collision course as genomics data becomes used in patient care. • Russ Altman,MD, Ph.D., Stanford Univ.
3. Problems Considered in Bioinformatics OMES and OMICS
Omes and Omics • Genomics • Primarily sequences (DNA and RNA) • Databanks and search algorithms • Proteomics • Sequences (Protein) • Mass spectrometry, X-ray crystallography • Databanks, knowledge bases, terminologies • Functional Genomics (transcriptomics) • Microarray data • Databanks, analysis tools, traversal techniques • Systems Biology (metabolomics) • Metabolites and interacting systems (interactomics) • Graphs, visualization, modeling, networks of entities
Central Dogma of Molecular Biology DNA RNA Phenotype Protein Phenotype Genomics Proteomics Transcriptomics Functional Genetics
Genome and Genomics • Genome – entire complement of DNA in a species • Both nuclear and mitochondrial/chloroplast • Variants among individuals • Genomics – study of the sequence, structure and function of the genome. Study of whole sets of genes rather than single genes. • Comparative genomics – study of the differences among species. Usually covers evolutionary studies of differences & conservation over time.
A Genome Database (e.g., GenBank) • Consists of long strings of DNA bases – ATCG….. • Consists of “annotations” of this database to attach meaning to the sequence data. • Example entry from GenBank: • http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=NM_000410&dopt=gb Hemochromatosis gene HFE
Human Genome Project • Human Genome Project - International research effort • Determine sequence of human genome and other model organisms • Began 1990, completed 2003 • Next steps for ~20,000 genes • Function and regulation of all genes • Significance of variations between people • Cures, therapies, “genomic healthcare”
“The Human Genome Project has catalyzed striking paradigm changes in biology - biology is an information science.” Leroy Hood, MD, PhD Institute for Systems Biology Seattle, Washington
Published complete genomes: Ongoing prokaryotic genomes: Ongoing eukaryotic genomes: 8/28/03 9/16/05 12/4/01 10/3/02 156 386 246 297 737 526 72 255 158 104 316 218 Genomes In Public Databases 1560 http://www.genomesonline.org/
Genomics activities • Sequence the genes and chromosomes – done by breaking the DNA into parts • Map the location of various gene entities to establish their order • Compare the sequences with other known sequences to determine similarity • Across species, conserved sequence “motifs” • Predict secondary structure of proteins • Create large databases – GenBank, EMBL, DDBJ • Develop algorithms and similarity measures • BLAST and its many forms
Central Dogma of Molecular Biology DNA RNA Phenotype Protein Phenotype Tissues Organs Organisms Genomics Proteomics Transcriptomics Functional Genetics
Proteome and Proteomics • Proteome – the entire set of proteins (and other gene products) made by the genome. • Proteomics – study of the interactions among proteins in the proteome, including networks of interacting proteins and metabolic considerations. Also includes differences in developmental stages, tissues and organs.
Catalysis Transport Nutrition and storage Contraction and mobility Structural elements Cytoskeleton Basement membranes Defense mechanisms Regulation Genetic Hormonal Buffering capacity Protein Functions
Protein Databases • SwissProt • PIR http://www.pir.uniprot.org/ • GENE http://www.ncbi.nlm.nih.gov/gene • InterPro http://www.ebi.ac.uk/interpro/ • Correspond to (and derived from) Genome data bases • All connected by Reference Sequences (NCBI) UniProt
Gene/Protein Database entries • HFE record in Entrez GENE (NCBI) • http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?&db=gene&cmd=retrieve&dopt=Graphics&list_uids=3077
Structure & Function Determination • X-ray crystallography • Nuclear magnetic resonance spectroscopy and tandem MS/MS • Computational modeling • Sequence alignment from others • Homology modeling
Structure Databases • Contain experimentally determined and predicted structures of biological molecules • Most structures determined by X-ray crystallography, NMR • Example – MMDB molecular modeling db http://www.ncbi.nlm.nih.gov/Structure/MMDB/mmdb.shtml • HFE Entry • http://www.ncbi.nlm.nih.gov/Structure/mmdb/mmdbsrv.cgi?form=6&db=t&Dopt=s&uid=9816
Protein Interaction Databases • Record observations of protein-protein interactions in cells • Attempts to detail interactions observed in thousands of small-scale experiments described in published articles • Examples: • BIND: Biomolecular Interaction Network Database • DIP: Database of Interacting Proteins • MIPS: Munich Information Center for Protein Sequences • PRONET: Protein interaction on the Web • Many others, both academic and commercial
Central Dogma of Molecular Biology DNA RNA Phenotype Protein Phenotype Genomics Proteomics Transcriptomics Functional Genetics
Proteome vs Transcriptome • Functional genomics (transcriptomics) looks at the timing and regulation of the gene products (both RNA and proteins) • This is different from looking at what gene products can be produced – it looks at the circumstances under which production occurs. • Involves experimental conditions.
Functional Genomics –Microarrays • Transcriptome and transcriptomics • High throughput technique designed to measure the increase in RNA (or sometimes proteins, tissues, etc) in a cell in response to an experiment. • Also called “gene expression” analysis • Microarrays called “gene chips” (although now there are protein and tissue chips)
How Do Microarrays Work? • Conceptual description: • Set of targets (cDNA, proteins, tissues, etc) are immobilized in predetermined positions on a substrate • Solution containing tagged molecules capable of binding to the targets is placed over the targets • Binding occurs between targets and tagged molecules. • Fluorescent tags allow you to visualize which targets have been bound (and tell you something about the molecules that were present in your solution).
Animation of Microarrays • http://www.bio.davidson.edu/courses/genomics/chip/chip.html
How Do Microarrays Work? • Conceptual description: • Set of targets (cDNA, proteins, tissues, etc) are immobilized in predetermined positions on a substrate • Solution containing tagged molecules capable of binding to the targets is placed over the targets • Binding occurs between targets and tagged molecules. • Fluorescent tags allow you to visualize which targets have been bound (and tell you something about the molecules that were present in your solution).
How Spotted Arrays Work • Result: • Spots where cDNA from the reference sample hybridized look green • Spots where cDNA from the experimental sample hybridized look red • Spots where cDNA from both samples hybridized look yellow (green+red=yellow) • Spots with little/no cDNA hybridized look black
Uses of Expression Profiling • Pharmaceutical research: • ID drug targets by comparing expression profile of drug-treated cells with those of cells containing mutations in genes encoding known drug targets • Disease Dx and Tx: • Distinguish morphologically similar cancers • DLBCL (Poulsen et al (2005) Microarray-based classification of diffuse large B-cell lymphomas European Journal of Haematology 74(6):453-65.)) • Therapy potential • Rabson AB, Weissmann D. From microarray to bedside: targeting NF-kappaB for therapy of lymphomas. Clin Cancer Res. 2005 Jan 1;11(1)2-6.
Future Applications • Diagnostic tool to screen for infective agents • Chip imprinted with set of pathogenic genomes used to identify bacterial, viral, or parasite genomic material in patient’s body fluids • Diagnostic chip to check for mutations involved in drug-gene interactions.
Experimental Design (2) • A fundamental challenge of microarray experiments: underdetermined systems Kohane IS, Kho AT, Butte AJ. Microarrays for an Integrative Genomics. (The MIT Press; Cambridge, MA; 2003), p. 11.
“Standards for minimum data to be exchanged” “Standards for format of messages to exchange the data” MIAME minimum information that should be reported about a microarray experiment to enable its unambiguous interpretation and reproduction MAGE a standard transmission format for microarray experiment data MGED Microarray gene expression data http://www.mged.org/Workgroups/MIAME/miame.html http://www.mged.org/Workgroups/MAGE/mage.html
Public Microarray Data Repositories Major public repositories: • GEO (NCBI) • http://www.ncbi.nlm.nih.gov/geo/ • ArrayExpress (EBI) • http://www.ebi.ac.uk/arrayexpress/
Standards and Repositories • Brazma, A, et al. Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nature Genetics. 2001 Dec;29(4):373. http://www.nature.com/cgi-taf/DynaPage.taf?file=/ng/journal/v29/n4/full/ng1201-365.html • Ball, CA, et al. Submission of Microarray Data to Public Repositories. PLoS Biology. 2004 September; 2 (9): e317 http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&pubmedid=15340489
Controlled Vocabularies • Genomics, proteomics, and especially microarray techniques have created a large need for controlled vocabularies to assist the analyses across multiple entities & species. • Taxonomy – systematic classification of objects according to relationships. • Ontologies – • An organizational framework for concepts
Controlled Vocabularies in Bioinformatics • The Gene Ontology http://www.geneontology.org/ • Knowledge capture (the ontology itself) • Annotation of gene products (for comparisons) • The MGED Ontology (arising from MIAME) • http://mged.sourceforge.net/ • Annotation of microarray experiments for public repositories • Clinical Bioinformatics Ontology: • Annotation of gene tests in electronic medical records • http://www.cerner.com/cbo • MIAPE from Proteomics Standards Initiative (PSI) • http://psidev.sourceforge.net/
4. Genomics Data and Patient Care From genotype to phenotype
Bioinformatics and Patient Care • Understanding a person’s genome ushers the era of “Personalized Medicine” • Obviously you should keep track of health-related genetic data in the EMR. • The 9-11 disaster showed you need to know the genomic variant information as well. • Cash et al. Forensic bioinformatics in the wake of the World Trade Center Disaster. PSB 2003:638-653.
Human Disease Gene Specifics Genes linked to human diseases (9-2004) • + 425 in 2 yrs • 1700/20,000 = 9% of loci
Genetic Medicine is not new • Karl Landsteiner started genetic medicine over 100 years ago (1903) • Blood transfusions worked off the ABO blood group system. • Landsteiner got the Nobel Prize in 1930 for his work. • http://nobelprize.org/medicine/laureates/1930/landsteiner-bio.html
Genomic Medicine is New • What to do with all of this genetic information and every person being unique? • And the information about genetic conditions is available on the Internet.
Genomics Data and Patient Care • Where do you find the data for genes causing human diseases? • What do you do with genetic data in electronic medical records?