1 / 56

Bioinformatics

Bioinformatics. Stuart M. Brown, Ph.D. Director, Research Computing, NYU School of Medicine. Genomic Biology as a Quantitative Science. A Genome Revolution is underway in Biology and Medicine. We are in the midst of a "Golden Era" of biology

nola-dennis
Download Presentation

Bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bioinformatics Stuart M. Brown, Ph.D. Director, Research Computing, NYU School of Medicine Genomic Biology as a Quantitative Science

  2. A Genome Revolution is underway in Biology and Medicine • We are in the midst of a "Golden Era" of biology • The Human Genome Project has produced a huge storehouse of data that will be used to change every aspect of biological research and medicine • The revolution is about treating biology as an information science, not about specific technologies.

  3. The Human Genome Project

  4. The job of the biologist is changing As more biological information becomes available and laboratory equipment becomes more automated ... • The biologist will spend more time using computers & on experimental design and data analysis (and less time doing tedious lab biochemistry) • Biology will become a more quantitative science (think how the periodic table affected chemistry)

  5. Biological Information Protein 2-D gel mRNA Expression Protein 3-D Structure Mass Spec. Genome sequence The Cell

  6. A review of some basic genetics

  7. DNA • 4 bases (G, C, T, A) • base pairs G--C T--A • genes • non-coding regions

  8. Decoding Genes

  9. Classic Molecular Biology • A gene is a DNA sequence at a particular locus on a chromosome that encodes a protein. • The Central Dogma of Molecular Biology: DNA ––—> RNA ——> Protein • A mutation changes the DNA sequence - leads to a change in protein sequence - or no protein. • Alleles are slightly different DNA sequences of the same gene.

  10. The human genome is the the complete DNA content of the 23 pairs of human chromosomes - 44 autosomes plus two sex chromosomes - approximately 3.2 billion base pairs.

  11. Bold Words from Francis Collins: “The history of biology was forever altered a decade ago by the bold decision to launch a research program that would characterize in ultimate detail the complete set of genetic instructions of the human being.” Francis S. Collins Director of the National Human Genome Research Institute N Engl J Med1999882:42-65

  12. Genome Projects • Complete genomic sequences: • Dozens of microorganisms • Yeast, C. elegans, Drosophila • Mouse • Human • Comparative genomics • All this data is enabling new kinds of research - for those with the computational skills to take advantage of it.

  13. How does genome sequencing technology work? • Molecular biology of the Sanger method • Sub-cloning of fragments - BAC, PAC, cosmid, plasmid, phage • Automated sequencers • The need for computers to assemble the "reads" and manage the workflow

  14. Automated sequencing machines, particularly those made by PE Applied Biosystems, use 4 colors, so they can read all 4 bases at once.

  15. Raw Genome Data:

  16. Lots of Sequence Data • How to extract useful knowledge from all of this data? • Need sophisticated computer tools • Find the genes • Figure out what they do (function) • Diagnostic tests • Medical treatments

  17. Finding genes in genome sequence is not easy • About 1% of human DNA encodes functional genes. • Genes are interspersed among long stretches of non-coding DNA. • Repeats, pseudo-genes, and introns confound matters

  18. Gene prediction tools - look for Start and Stopcodons, intron splice sites, similarity to known genes and cDNAs, etc.

  19. Data Mining Tools • Scientists need to work with a lot of layers of information about the genome • coding sequence of known genes and cDNAs • genetic maps (known mutations and markers) • gene expression • Protein sequence (from Mass Spectroscopy) • cross species homology • Most of the best tools are free on the Web

  20. UCSC

  21. Ensembl at EBI/EMBL

  22. What comes after Genome Sequencing? • We are now in the "Post-Genomic" era. • It is possible to use the genome sequence plus a variety of automated laboratory equipment to do entirely new kinds of biology. • Not just scaled-up, but comprehensive

  23. Relate genes to Organisms • Diseases • OMIM: Human Genetic Disease • Metabolic and regulatory pathways • KEGG • Cancer Genome Project

  24. Human Alleles • The OMIM (Online Mendelian Inheritance in Man) database at the NCBI tracks all human mutations with known phenotypes. • It contains a total of about 2,000 genetic diseases[and another ~11,000 genetic loci with known phenotypes - but not necessarily known gene sequences] • It is designed for use by physicians: • can search by disease name • contains summaries from clinical studies

  25. KEGG: Kyoto Encylopedia of Genes and Genomes • Enzymatic and regulatory pathways • Mapped out by EC number and cross-referenced to genes in all known organisms (wherever sequence information exits) • Parallel maps of regulatory pathways

  26. Genomics • What is Genomics? • An operational definition: •The application of high throughput automated technologies to molecular biology. • A philosophical definition: •A wholistic or systemsapproach to the study of information flow within a cell.

  27. Genomics Technologies • Automated DNA sequencing • Automated annotation of sequences • DNA microarrays • gene expression (measure RNA levels) • SNP Genotyping • Genome diagnostics (genetic testing) • Proteomics • Protein identification • Protein-protein interactions

  28. DNA chip microarrays • Put a large number (~100K) of cDNA sequences or synthetic DNA oligomers onto a glass slide (or other substrate) in known locations on a grid. • Label an RNA sample and hybridize • Measure amounts of RNA bound to each square in the grid • Make comparisons • Cancerous vs. normal tissue • Treated vs. untreated • Time course • Many applications in both basic and clinical research

  29. Spot your own Chip(plans available for free from Pat Brown’s website) Robot spotter Ordinary glass microscope slide

  30. cDNA spotted microarrays

  31. Goal of Microarray experiments • Microarrays are a very good way of identifying a bunch of genes involved in a disease process • Differences between cancer and normal tissue • Tuberculosis infected vs resistant lung cells • Mapping out a pathway • Co-regulated genes • Finding function for unknown genes • Involved these processes

  32. Direct Medical Applications • Diagnosis • Type of cancer • Aggressive or benign? • Monitor treatment outcome • Is a treatment having the desired effect on the target tissue?

  33. When you go looking…

  34. …you will certainly find something!

  35. Human Genetic Variation • Every human has essentially the same set of genes • But there are different forms of each gene -- known as alleles • blue vs. brown eyes • genetic diseases such as cystic fibrosis or Huntington’s disease are caused by dysfunctional alleles

  36. Alleles are created by mutations in the DNA sequence of one person - which are passed on to their descendants

  37. Clinical Manifestationsof Genetic Variation (All disease has a genetic component) • Susceptibility vs. resistance • Variations in disease severity or symptoms • Reaction to drugs (pharmacogenetics) All of these traits can be traced back to particular genes (or sets of genes)

  38. Pharmacogenomics • People react differently to drugs • Side effects • Variable effectiveness • There are genes that control these reactions • SNP markers can be used to identify these genes (profiles)

  39. Use the Profiles • Genetic profiles of new patients can then be used to prescribe drugs more effectively & avoid adverse reactions. • Sell a drug with a gene test • Can also speed clinical trials by testing on those who are likely to respond well.

  40. Toxicogenomics • There are a number of common pathways for drug toxicity (or environmental tox.) • It is possible to compile genomic signatures (gene expression data) for these pathways. • Candidate drug molecules can be screened in cell culture or in animals for induction of these toxicity pathways.

  41. Planning for a Genomics Revolution • Bioinformatics support must be integral in the planning process for the development of new genomics research facilities. • Genome Project sequencing centers have more staff and more $$$ spent on data analysis than on the sequencing itself. • Microarray facilities will be even more skewed toward data analysis • It is an information-intensive business!

More Related