640 likes | 731 Views
Genes. Diseases. Diseases. Diseases. Physiology. Diseases. Physiology. Genes. Genes. Anatomy. Diseases. Physiology. Anatomy. Diseases. Physiology. Anatomy. Diseases. Physiology. Anatomy. Diseases. Physiology. Anatomy. Diseases. Physiology. Anatomy. Diseases. Anatomy.
E N D
Genes Diseases Diseases Diseases Physiology Diseases Physiology Genes Genes Anatomy Diseases Physiology Anatomy Diseases Physiology Anatomy Diseases Physiology Anatomy Diseases Physiology Anatomy Diseases Physiology Anatomy Diseases Anatomy Genes Genes Genes Genes Genes Genes Novel relationships & Deeper insights Medical Informatics Bioinformatics
Biomedical informatics: The broad discipline concerned with the study and application of computer science, information science, informatics, cognitive science and human-computer interaction in the practice of biological research, biomedical science, medicine and healthcare. Bioinformatics, clinical informatics and public health informatics or medical informatics can be considered as sub-domains within biomedical informatics. Bioinformatics: The merger of biotechnology and information technology with the goal of revealing new insights and principles in biology OR The science of managing and analyzing biological data using advanced computing techniques. Especially important in analyzing genomic research data. Health Informatics or Medical Informatics: The intersection of information science, computer science, and health care. It deals with the resources, devices, and methods required to optimize the acquisition, storage, retrieval, and use of information in health and biomedicine. Wikipedia
& YOU Mining Bio-Medical Mountains How Computer Science can help Biomedical Research and Health Sciences Anil Jegga Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center (CCHMC) Department of Pediatrics, University of Cincinnati http://anil.cchmc.org Anil.Jegga@cchmc.org
Algorithm: A fixed procedure embodied in a computer program. Base: One of the molecules that form DNA and RNA molecules. Base pair: Two nitrogenous bases (adenine and thymine or guanine and cytosine) held together by weak bonds. Two strands of DNA are held together in the shape of a double helix by the bonds between base pairs. Wikipedia
Nucleotide: A subunit of DNA or RNA consisting of a nitrogenous base (adenine, guanine, thymine, or cytosine in DNA; adenine, guanine, uracil, or cytosine in RNA), a phosphate molecule, and a sugar molecule (deoxyribose in DNA and ribose in RNA). Thousands of nucleotides are linked to form a DNA or RNA molecule. Genome: All the genetic material in the chromosomes of a particular organism; its size is generally given as its total number of base pairs. Genomics: The study of genes and their function. Functional Genomics: The study of genes, their resulting proteins, and the role played by the proteins the body's biochemical processes. Wikipedia
Two Separate Worlds….. Disease World Genome Variome Transcriptome Regulome miRNAome • Name • Synonyms • Related/Similar Diseases • Subtypes • Etiology • Predisposing Causes • Pathogenesis • Molecular Basis • Population Genetics • Clinical findings • System(s) involved • Lesions • Diagnosis • Prognosis • Treatment • Clinical Trials…… Interactome Pharmacogenome Metabolome Physiome Pathome Medical Informatics Bioinformatics & the “omes PubMed Proteome Disease Database Patient Records OMIM Clinical Synopsis Clinical Trials 382 “omes” so far……… and there is “UNKNOME” too - genes with no function known http://omics.org/index.php/Alphabetically_ordered_list_of_omics With Some Data Exchange…
The genome is our Genetic Blueprint • Nearly every human cell contains 23 pairs of chromosomes • 1 to 22 and • XY or XX • XY = Male • XX = Female • Length of chromosomes 1 to 22, X, Y together is ~3.2 billion bases.
The Genome is Who We Are on the inside! Information coded in DNA • Chromosomes consist of DNA • molecular strings of A, C, G, & T • base pairs, A-T, C-G • Genes • DNA sequences that encode proteins • less than 3% of human genome
5000 bases per page….. CACACTTGCATGTGAGAGCTTCTAATATCTAAATTAATGTTGAATCATTATTCAGAAACAGAGAGCTAACTGTTATCCCATCCTGACTTTATTCTTTATGAGAAAAATACAGTGATTCCAAGTTACCAAGTTAGTGCTGCTTGCTTTATAAATGAAGTAATATTTTAAAAGTTGTGCATAAGTTAAAATTCAGAAATAAAACTTCATCCTAAAACTCTGTGTGTTGCTTTAAATAATCAGAGCATCTGCTACTTAATTTTTTGTGTGTGGGTGCACAATAGATGTTTAATGAGATCCTGTCATCTGTCTGCTTTTTTATTGTAAAACAGGAGGGGTTTTAATACTGGAGGAACAACTGATGTACCTCTGAAAAGAGAAGAGATTAGTTATTAATTGAATTGAGGGTTGTCTTGTCTTAGTAGCTTTTATTCTCTAGGTACTATTTGATTATGATTGTGAAAATAGAATTTATCCCTCATTAAATGTAAAATCAACAGGAGAATAGCAAAAACTTATGAGATAGATGAACGTTGTGTGAGTGGCATGGTTTAATTTGTTTGGAAGAAGCACTTGCCCCAGAAGATACACAATGAAATTCATGTTATTGAGTAGAGTAGTAATACAGTGTGTTCCCTTGTGAAGTTCATAACCAAGAATTTTAGTAGTGGATAGGTAGGCTGAATAACTGACTTCCTATCATTTTCAGGTTCTGCGTTTGATTTTTTTTACATATTAATTTCTTTGATCCACATTAAGCTCAGTTATGTATTTCCATTTTATAAATGAAAAAAAATAGGCACTTGCAAATGTCAGATCACTTGCCTGTGGTCATTCGGGTAGAGATTTGTGGAGCTAAGTTGGTCTTAATCAAATGTCAAGCTTTTTTTTTTCTTATAAAATATAGGTTTTAATATGAGTTTTAAAATAAAATTAATTAGAAAAAGGCAAATTACTCAATATATATAAGGTATTGCATTTGTAATAGGTAGGTATTTCATTTTCTAGTTATGGTGGGATATTATTCAGACTATAATTCCCAATGAAAAAACTTTAAAAAATGCTAGTGATTGCACACTTAAAACACCTTTTAAAAAGCATTGAGAGCTTATAAAATTTTAATGAGTGATAAAACCAAATTTGAAGAGAAAAGAAGAACCCAGAGAGGTAAGGATATAACCTTACCAGTTGCAATTTGCCGATCTCTACAAATATTAATATTTATTTTGACAGTTTCAGGGTGAATGAGAAAGAAACCAAAACCCAAGACTAGCATATGTTGTCTTCTTAAGGAGCCCTCCCCTAAAAGATTGAGATGACCAAATCTTATACTCTCAGCATAAGGTGAACCAGACAGACCTAAAGCAGTGGTAGCTTGGATCCACTACTTGGGTTTGTGTGTGGCGTGACTCAGGTAATCTCAAGAATTGAACATTTTTTTAAGGTGGTCCTACTCATACACTGCCCAGGTATTAGGGAGAAGCAAATCTGAATGCTTTATAAAAATACCCTAAAGCTAAATCTTACAATATTCTCAAGAACACAGTGAAACAAGGCAAAATAAGTTAAAATCAACAAAAACAACATGAAACATAATTAGACACACAAAGACTTCAAACATTGGAAAATACCAGAGAAAGATAATAAATATTTTACTCTTTAAAAATTTAGTTAAAAGCTTAAACTAATTGTAGAGAAAAAACTATGTTAGTATTATATTGTAGATGAAATAAGCAAAACATTTAAAATACAAATGTGATTACTTAAATTAAATATAATAGATAATTTACCACCAGATTAGATACCATTGAAGGAATAATTAATATACTGAAATACAGGTCAGTAGAATTTTTTTCAATTCAGCATGGAGATGTAAAAAATGAAAATTAATGCAAAAAATAAGGGCACAAAAAGAAATGAGTAATTTTGATCAGAAATGTATTAAAATTAATAAACTGGAAATTTGACATTTAAAAAAAGCATTGTCATCCAAGTAGATGTGTCTATTAAATAGTTGTTCTCATATCCAGTAATGTAATTATTATTCCCTCTCATGCAGTTCAGATTCTGGGGTAATCTTTAGACATCAGTTTTGTCTTTTATATTATTTATTCTGTTTACTACATTTTATTTTGCTAATGATATTTTTAATTTCTGACATTCTGGAGTATTGCTTGTAAAAGGTATTTTTAAAAATACTTTATGGTTATTTTTGTGATTCCTATTCCTCTATGGACACCAAGGCTATTGACATTTTCTTTGGTTTCTTCTGTTACTTCTATTTTCTTAGTGTTTATATCATTTCATAGATAGGATATTCTTTATTTTTTATTTTTATTTAAATATTTGGTGATTCTTGGTTTTCTCAGCCATCTATTGTCAAGTGTTCTTATTAAGCATTATTATTAAATAAAGATTATTTCCTCTAATCACATGAGAATCTTTATTTCCCCCAAGTAATTGAAAATTGCAATGCCATGCTGCCATGTGGTACAGCATGGGTTTGGGCTTGCTTTCTTCTTTTTTTTTTAACTTTTATTTTAGGTTTGGGAGTACCTGTGAAAGTTTGTTATATAGGTAAACTCGTGTCACCAGGGTTTGTTGTACAGATCATTTTGTCACCTAGGTACCAAGTACTCAACAATTATTTTTCCTGCTCCTCTGTCTCCTGTCACCCTCCACTCTCAAGTAGACTCCGGTGTCTGCTGTTCCATTCTTTGTGTCCATGTGTTCTCATAATTTAGTTCCCCACTTGTAAGTGAGAACATGCAGTATTTTCTAGTATTTGGTTTTTTGTTCCTGTGTTAATTTGCCCAGTATAATAGCCTCCAGCTCCATCCATGTTACTGCAAAGAACATGATCTCATTCTTTTTTATAGCTCCATGGTGTCTATATACCACATTTTCTTTATCTAAACTCTTATTGATGAGCATTGAGGTGGATTCTATGTCTTTGCTATTGTGCATATTGCTGCAAGAACATTTGTGTGCATGTGTCTTTATGGTAGAATGATATATTTTCTTCTGGGTATATATGCAGTAATGCGATTGCTGGTTGGAATGGTAGTTCTGCTTTTATCTCTTTGAGGAATTGCCATGCTGCTTTCCACAATAGTTGAACTAACTTACACTCCCACTAACAGTGTGTAAGTGTTTCCTTTTCTCCACAACCTGCCAGCATCTGTTATTTTTTGACATTTTAATAGTAGCCATTTTAACTGGTATGAAATTATATTTCATTGTGGTTTTAATTTGCATTTCTCTAATGATCAGTGATATTGAGTTTGTTTTTTTTCACATGCTTGTTGGCTGCATGTATGTCTTCTTTTAAAAAGTGTCTGTTCATGTACTTTGCCCACATTTTAATGGGGTTGTTTTTCTCTTGTAAATTTGTTTAAATTCCTTATAGGTGCTGGATTTTAGACATTTGTCAGACGCATAGTTTGCAAATAGTTTCTCCCATTCTGTAGGTTGTCTGTTTATTTTGTTAATAGTTTCTTTTGCTATGCAGAAGCTCTTAATAAGTTTAATGAGATCCTGATATGTTAGGCTTTGTGTCCCCACCCAAATCTCATCTTGAATTATATCTCCATAATCACCACATGGAGAGACCAGGTGGAGGTAATTGAATCTGGGGGTGGTTTCACCCATGCTGTTCTTGTGATAGTGAATGAGTTCTCACGAGATCTAATGGTTTTATGAGGGGCTCTTCCCAGCTTTGCCTGGTACTTCTCCTTCCTGCCGCTTTGTGAAAAAGGTGCATTGCGTCCCTTTCACCTTCTTCTATAATTGTAAGTTTCCTGAGGCCTTCCCAGCCATGCTGAACTTCAAGTCAATTAAACCTTTTTCTTTATAAATTACTCAGTCTCTGGTGGTTCTTTATAGCAGTGTGAAAATGGACTAATGAAGTTCCCATTTATGAATTTTTGCTTTTGTTGCAATTGCTTTTGACATCTTAGTCATGAAATCCTTGCCTGTTCTAAGTACAGGACGGTATTGCCTAGGTTGTCTTCCAGGGTTTTTCTAATTTTGTGTTTTGCATTTAAGTGTTTAATCCATCTTGAGTTGATTTTTGTATATTGTGTAAGGAAGGGGTCCAGTTTCAATCTTTTGCATATGGCTAGTTAGTTATCCCAGTACCATTTATTGAAAAGACAGTCTTTTCCCCATCGCTCGTTTTTGTCAGTTTTATTGATGATCAGATAATCATAGCTGTGTGGCTTTATTTCTGGGTTCTTTATTCTGTTCTATTGGTTTATGTCCCTGTTTTTGTGCCAGTACCATGCTGTTTTGGTTAACATAGCCCTGTAGTATAGTTTGAGGTCAGATAGCCTGATGCTTCCAGCTTTGTTCTTTTTCTTAAGATTGCCTTGGCTATTTGGCCTCTTTTTTGGTTCCACATGAATTTTAAAACAGTTGTTTCTAGTTTTTGAAGAATGTCATTGGTAGTTTGATAGAAATAGCATTTAATCTGTAAATTGATTTGTGCAGTATGGCCTTTTAATGATATTGATTCTTCCTATCCATGAGCATGATATGTTTTCCATTTTGTTTGTATCCTCTCTGATTTCTTTGTGCAGTGTTTTGTAATTCTCATTGTAGAGATTTTTCACCTCCCTGGTTAGTTGTATTTTACCCTAGATATTTTATTCTTTTTGTGAAAATTGTGAATGGGATTGCCTTCCTGATTTGACTGCCAGCTTGGTTACTGTTGGTTTATAGAAATGCTAGTGATTTTTGTACATTGATTTTCTTTCTAAAACTTTGCTGAAGTTTTTTTTATTAGCAGAAGGAGCTTTGGGGCTGAGACTATGGGGTTTTCTAGATATAGAATCATGTCAGCTTCAAATAGGGATAATTTTACTTCCTCTCTTCCTATTTGGATGCCCTTTATTTCTTTCTCTTGCCTGATTACTCTGGCTGGGATTTCCTATGTTGAATAGGAGTCATGAGAGAGGGCATCAAATCTACACATATCAAATACTAACCTTGAATGTCTAGAT
How much data make up the human genome? • 3 pallets with 40 boxes per pallet x 5000 pages per box x 5000 bases per page = 3,000,000,000 bases! • To get an accurate sequence requires • 6-fold coverage! • Now imagine shredding 18 pallets and reassembling!
Human Genome Project–Initial Stages • Most of the initial phases were primarily focused on improving & speeding the technology to sequence and analyze DNA. • Scientists all around the world worked to make detailed maps of our chromosomes and sequence model organisms, like worm, fruit fly, and mouse. Image Courtesy: Google Images
Overwhelming Challenges • First there was the Assembly The DNA sequence is so long that no technology can read it all at once, so it was broken into pieces. There were millions of clones (small sequence fragments). The assembly process included finding where the pieces overlapped in order to put the draft together. 3,200,000 piece puzzle anyone?
The Completion of the Human Genome Sequence • One June 26, 2000 President Clinton, with J. Craig Venter, and Francis Collins, announces completion of "the first survey of the entire human genome” - 80% working draft. • Publication of 90 percent of the sequence in the February 2001 issue of the journal Nature. • Completion of 99.99% of the genome as finished sequence on July 2003. Image Courtesy: Google Images
Human Genome is finally Sequenced!!! But…the Project is not Done… • Next there is the Annotation: The sequence is like a topographical map, the annotation would include cities, towns, schools, libraries and coffee shops! So, where are the genes? • How do genes function? • How do we use this information for scientific understanding? • How does it benefit or improve the health care?
What do genes do anyway? • As per current estimate, we only have ~27,000 genes! That means each gene has to do a lot! • Genes make proteins that make up nearly all we are (bones, muscles, hair, eyes, etc.). • Almost everything that happens in our bodies happens because of proteins (walking, digestion, fighting disease). Eye Color and Hair Color are determined by gene Image Courtesy: Google Images
Of Mice and Men: It’s all in the genes Humans and Mice have about the same number of genes. But then why are we so different from each other, how is this possible? Did you say cheese? Mmm, Cheese! • Our genome is almost identical to chimpanzee! • It’s not just the differences but the similarities that hold keys to many biological locks! • While one human gene can make many different proteins a mouse gene can only make a few……… probably! Image Courtesy: Google Images
Genes are important • By selecting different pieces of a gene, your body can make many kinds of proteins. (This process is called alternative splicing.) • If a gene is “expressed” that means it is turned on and it will make proteins.
What we’ve learned from our genome so far… • There are a relatively small number of human genes, less than 30,000, but they have a complex architecture that we are only beginning to understand and appreciate. • We know where 85% of genes are in the sequence. • We don’t know where the other 15% are because we haven’t seen them “on” (they may only be expressed during fetal development). • We only know what about 50% of our genes do so far. • So it is relatively easy to locate genes in the genome, but it is hard to figure out what they do.
How do scientists find genes? • The genome is so large that useful information is hard to find. • Researchers use a computational microscope to help scientists search the genome. • Just as you would use “google” to find something on the internet, researchers can use the “Genome Browser” to find information in the human genome. Image Courtesy: Google Images
The Continuing Project • Finding the complete set of genes and annotating the entire sequence. Annotation is like detailing; scientists annotate sequence by listing what has been learnt experimentally and computationally about its function. • Proteomics is studying the structure and function of groups of proteins. Proteins are really important, but we don’t really understand how they work. • Comparative Genomics is the process of comparing different genomes in order to better understand what they do and how they work. Like comparing humans, chimpanzees, and mice that are all mammals but all quite different. Image Courtesy: Google Images
Who works on this stuff anyway? • Biologists and Chemistsunderstand the physical sciences-they take biology and chemistry classes. • Computer Scientistsprogram the computers (the same people who make video games!)-they take math and computer classes. • Computer Engineerstry to build better, faster, smarter computers-they take math, physics and computer classes. • Social Scientiststry to understand how this new information and technology will impact our lives-they take sociology and philosophy classes.
How can I work on this project, or something like it? • Read about it, online at http://www.genome.gov, or in Nature, Science, or other scientific magazines. • Take classes in biology, chemistry, mathematics and physics classes at high school. • Go to college and get a degree in science, engineering, mathematics, or social sciences.
Bioinformatics Opportunities Director/Professor - University Company (Pharmaceutical) National Laboratory Research Foundation Bioinformatics Biochemistry Biology Computer Science Computer Engineering Mathematics Physics Linguistics Education, Sociology, Philosophy, Psychology, Community Studies) A research degree in any of these majors will take you far! Ph.D. Research Staff - Company/University National Laboratory Research Foundation Teaching - Community College Public Schools M.S. (M.A.) Entry-Level - Company National Laboratory Teaching – Private Schools B.S. (B.A.)
now…. The number 1 FAQ How much biology should I know?? No simple or straight-forward answer… unfortunately! But the mantra is: Take the classes and Interact routinely with biologists OR Work with the biologists or the biological data High School Senior Summer Internship http://www.cincinnatichildrens.org/ed/research/undergrad/hs/default.htm Summer Undergraduate Research Fellowship http://www.cincinnatichildrens.org/ed/research/undergrad/surf/default.htm
But I want to start with some basics.. • http://www.ncbi.nlm.nih.gov/Education • http://www.ebi.ac.uk/2can/ • http://www.genome.gov/Education/ • http://genomics.energy.gov/ • Books • Introduction to Bioinformatics by Teresa Attwood, David Parry-Smith • A Primer of Genome Science by Gibson G and Muse SV • Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, Second Edition by Andreas D. Baxevanis, B. F. Francis Ouellette • Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology by Dan Gusfield • Bioinformatics: Sequence and Genome Analysis by David W. Mount • Discovering Genomics, Proteomics, and Bioinformatics by A. Malcolm Campbell and Laurie J. Heyer
Biological Challenges - Computer Engineers • Post-genomic Era and the goal of bio-medicine • to develop a quantitative understanding of how living things are built from the genome that encodes them. • Deciphering the genome code • Identifying unknown genes and assigning function by computational analysis of genomic sequence • Identifying the regulatory mechanisms • Identifying their role in normal development/states vs disease states
Biological Challenges - Computer Engineers • Data Deluge: exponential growth of data silos and different data types • Human-computer interaction specialists need to work closely with academic and clinical biomedical researchers to not only manage the data deluge but to convert information into knowledge. • Biological data is very complex and interlinked! • Creating information systems that allow biologists to seamlessly follow these links without getting lost in a sea of information - a huge opportunity for computer scientists.
Biological Challenges - Computer Engineers A major goal in molecular biology is Functional Genomics – Study of the relationships among genes in DNA & their function – in normal and disease states • Networks, networks, and networks! • Each gene in the genome is not an independent entity. Multiple genes interact to perform a specific function. • Environmental influences – Genotype-environment interaction • Integrating genomic and biochemical data together into quantitative and predictive models of biochemistry and physiology • Computer scientists, mathematicians, and statisticians will ALL be an integral and critical part of this effort.
Informatics – Biologists’ Expectations • Representation, Organization, Manipulation, Distribution, Maintenance, and Use of information, particularly in digital form. • Functional aspect of bioinformatics: Representation, Storage, and Distribution of data. • Intelligent design of data formats and databases • Creation of tools to query those databases • Development of user interfaces or visualizations that bring together different tools to allow the user to ask complex questions or put forth testable hypotheses.
Informatics – Biologists’ Expectations • Developing analytical tools to discover knowledge in data • Levels at biological information is used: • comparing sequences – predict function of a newly discovered gene • breaking down known 3D protein structures into bits to find patterns that can help predict how the protein folds • modeling how proteins and metabolites in a cell work together to make the cell function…….
Finally….What does informatics mean to biologists? The ultimate goal of analytical bioinformaticians is to develop predictive methods that allow biomedical researchers and scientists to model the function and phenotype of an organism based only on its genomic sequence. This is a grand goal, and one that will be approached only in small steps, by many scientists from different but allied disciplines working cohesively.
Biology – Data Structures Four broad categories: Strings: To represent DNA, RNA, amino acid sequences of proteins Trees: To represent the evolution of various organisms (Taxonomy) or structured knowledge (Ontologies) Sets of 3D points and their linkages: To represent protein structures Graphs: To represent metabolic, regulatory, and signaling networks or pathways
Biology – Data Structures • Biologists are also interested in • Substrings • Subtrees • Subsets of points and linkages, and • Subgraphs. Beware: Biological data is often characterized by huge size, the presence of laboratory errors(noise), duplication, and sometimes unreliability.
Support Complex Queries – A typical demand • Get me all genes involved in or associated with brain development that are differentially expressed in the Central Nervous System. • Get me allgenesinvolved in brain developmentinhumanandmouse that also showiron ion binding activity. • For this set of genes, what aspects of function and/or cellular localization do they share? • For this set of genes, what mutations are reported to cause pathological conditions?
Model Organism Databases: Common Issues • Heterogeneous Data Sets - Data Integration • From Genotype to Phenotype • Experimental and Consensus Views • Incorporation of Large Datasets • Whole genome annotation pipelines • Large scale mutagenesis/variation projects (dbSNP) • Computational vs. Literature-based Data Collection and Evaluation (MedLine) • Data Mining • extraction of new knowledge • testable hypotheses (Hypothesis Generation)
Human Genome Project – Data Deluge No. of Human Gene Records currently in NCBI: 29413 (excluding pseudogenes, mitochondrial genes and obsolete records). Includes ~460 microRNAs NCBI Human Genome Statistics – as on February12, 2008
The Gene Expression Data Deluge Till 2000: 413 papers on microarray! Problems Deluge! Allison DB, Cui X, Page GP, Sabripour M. 2006. Microarray data analysis: from disarray to consolidation and consensus. Nat Rev Genet. 7(1): 55-65.
Information Deluge….. A researcher would have to scan 130 different journals and read 27 papers per dayto follow a single disease, such as breast cancer (Baasiri et al., 1999 Oncogene 18: 7958-7965). • 3 scientific journals in 1750 • Now - >120,000 scientific journals! • >500,000 medical articles/year • >4,000,000 scientific articles/year • >16 million abstracts in PubMed derived from >32,500 journals
Data-driven Problems….. • How to name or describe proteins, genes, drugs, diseases and conditions consistently and coherently? • How to ascribe and name a function, process or location consistently? • How to describe interactions, partners, reactions and complexes? Some Solutions…… and there are some funny, weird, and ambiguous ones too.. • Develop/Use controlled or restricted vocabularies (IUPAC-like naming conventions, HGNC, MGI, UMLS, etc.) • Create/Use thesauruses, central repositories or synonym lists (MeSH, UMLS, etc.) • Work towards synoptic reporting and structured abstracting • LOL: lysyloxidase-like 1 • AR*E: aryl sulfatase E in all species • f**K: fuculokinase gene in bacteria • ADA: American Dental Association OR American Diabetes Association OR Adenosine Deaminase OR Ada programming language (based on PASCAL) • Generally, the names refer to some feature of the mutant phenotype • Dickie’s small eye (Thieler et al., 1978, Anat Embryol (Berl), 155: 81-86) is now Pax6 • Gleeful: "This gene encodes a C2H2 zinc finger transcription factor with high sequence similarity to vertebrate Gli proteins, so we have named the gene gleeful (Gfl)." (Furlong et al., 2001, Science 293: 1632) What’s in a name! Rose is a rose is a rose is a rose! Gene Nomenclature • Disease names • Mobius Syndrome with Poland’s Anomaly • Werner’s syndrome • Down’s syndrome • Angelman’s syndrome • Creutzfeld-Jacob disease • Accelerin • Antiquitin • Bang Senseless • Bride of Sevenless • Christmas Factor • Cockeye • Crack • Draculin • Dickie’s small eye • Draculin • Fidgetin • Gleeful • Knobhead • Lunatic Fringe • Mortalin • Orphanin • Profilactin • Sonic Hedgehog
Rose is a rose is a rose is a rose….. Not Really! What is a cell? • any small compartment • (biology) the basic structural and functional unit of all organisms; they may exist as independent units of life (as in monads) or may form colonies or tissues as in higher plants and animals • a device that delivers an electric current as a result of chemical reaction • a small unit serving as part of or as the nucleus of a larger political movement • cellular telephone: a hand-held mobile radiotelephone for use in an area divided into small sections, each with its own short-range transmitter/receiver • small room in which a monk or nun lives • a room where a prisoner is kept Image Sources: Somewhere from the internet and Google Images
Semantic Groups, Types and Concepts: • Semantic Group Biology – Semantic Type Cell • Semantic Groups ObjectORDevices – Semantic Types Manufactured Device or Electrical Device or Communication Device • Semantic Group Organization – Semantic Type Political Group Foundation Model Explorer
HEPATOCELLULAR CARCINOMA SOMATIC [ARG249SER] CTNNB1 TP53* MET Hepatocellular Carcinoma TP53 aflatoxin B1, a mycotoxin induces a very specific G-to-T mutation at codon 249 in the tumor suppressor gene p53. Environmental Effects • COLORECTAL CANCER [3-BP DEL, SER45DEL] • COLORECTAL CANCER [SER33TYR] • PILOMATRICOMA, SOMATIC [SER33TYR] • HEPATOBLASTOMA, SOMATIC [THR41ALA] • DESMOID TUMOR, SOMATIC [THR41ALA] • PILOMATRICOMA, SOMATIC [ASP32GLY] • OVARIAN CARCINOMA, ENDOMETRIOID TYPE, SOMATIC [SER37CYS] • HEPATOCELLULAR CARCINOMA SOMATIC [SER45PHE] • HEPATOCELLULAR CARCINOMA SOMATIC [SER45PRO] • MEDULLOBLASTOMA, SOMATIC [SER33PHE] The REAL Problems Many disease states are complex, because of many genes (alleles & ethnicity, gene families, etc.), environmental effects (life style, exposure, etc.) and the interactions.
ALK in cardiac myocytes • Cell to Cell Adhesion Signaling • Inactivation of Gsk3 by AKT causes accumulation of b-catenin in Alveolar Macrophages • Multi-step Regulation of Transcription by Pitx2 • Presenilin action in Notch and Wnt signaling • Trefoil Factors Initiate Mucosal Healing • WNT Signaling Pathway • HEPATOCELLULAR CARCINOMA • LIVER: • Hepatocellular carcinoma; • Micronodular cirrhosis; • Subacute progressive viral hepatitis • NEOPLASIA: • Primary liver cancer • CBL mediated ligand-induced downregulation of EGF receptors • Signaling of Hepatocyte Growth Factor Receptor CTNNB1 MET • Estrogen-responsive protein Efp controls cell cycle and breast tumors growth • ATM Signaling Pathway • BTG family proteins and cell cycle regulation • Cell Cycle • RB Tumor Suppressor/Checkpoint Signaling in response to DNA damage • Regulation of transcriptional activity by PML • Regulation of cell cycle progression by Plk3 • Hypoxia and p53 in the Cardiovascular system • p53 Signaling Pathway • Apoptotic Signaling in Response to DNA Damage • Role of BRCA1, BRCA2 and ATR in Cancer Susceptibility….Many More….. TP53 The REAL Problems
Methods for Integration • Link driven federations • Explicit links between databanks. • Warehousing • Data is downloaded, filtered, integrated and stored in a warehouse. Answers to queries are taken from the warehouse. • Others….. Semantic Web, etc………
Link-driven Federations • Creates explicit links between databanks • query: get interesting results and use web links to reach related data in other databanks • Examples: NCBI-Entrez, SRS