120 likes | 341 Views
Topics. Basis of Bioinformatics Goals of Bioinformatics Bioinformatics Jargon 101. Basis of Bioinformatics. What makes Bioinformatics possible? Advances in Biotechnology PCR, Sequencing, Shotgun sequencing, Large scale data generation Advances in Computer hardware Advent of the WWW
E N D
Topics • Basis of Bioinformatics • Goals of Bioinformatics • Bioinformatics Jargon 101 Lecture 1 CS566
Basis of Bioinformatics • What makes Bioinformatics possible? • Advances in Biotechnology • PCR, Sequencing, Shotgun sequencing, Large scale data generation • Advances in Computer hardware • Advent of the WWW • Representation of problems amenable to Statistics and Computer Science • Evolutionary underpinnings of life Lecture 1 CS566
Biotechnology: Polymerase Chain Reaction • The anti-thesis, happily, of NP-completeness • Used to form exact copies of section of DNA • Doubling of template per cycle, i.e., after n cycles, 2n copies of DNA • Advantages: • Precise subsequence can be selected using appropriate primers • Can create large amounts from small sample • Sine qua none for DNA sequencing projects, and a lot of experimental biology Lecture 1 CS566
Biotechnology: Sequencing • Analogy: Reading a phrase • Assumption: Can read only letter at a time • Start with copies of the phrase to be read • Allow several cycles of PCR to proceed • At any moment in time, entire set of partial phrases is present (all having the same start point) • Freeze • Arrange phrases by size and just read terminal letter Lecture 1 CS566
Biotechnology: Sequencing “This is the best course I’ve ever taken” T Th Thi This This This i This is Shotgun sequencing This is the best cou the best course I’ve I’ve ever taken This is the best course I’ve ever taken Lecture 1 CS566
Shotgun Sequencing • Analogy: Reading a long sentence, indirectly • Fragment few copies of a sentence into phrases, randomly • Find the order of characters in each phrase • Find overlaps between phrases • Assemble phrases into original sentence • ‘Shotgun’ refers to parallel sequencing of multiple ‘phrases’ Lecture 1 CS566
Large Scale Data Generation • Sequencing robots permit complete sequences to be obtained in a short time • Expression arrays allow for simultaneous measurement of the activity of thousands of genes • Mass spectrometric pipelines allow for the simultaneous identification of several proteins • Autoanalyzers allow the automation of measurement of numerous chemicals Lecture 1 CS566
Advances in Computer Hardware • Exponential increase in biological data has been matched by Moore’s law: Periodic doubling of CPU speeds • Memory and Disk sizes have kept pace with the increase in data volumes (from 1.44 Kb to Petabytes) • Clustering allows for handling of many of the parallel problems in biology (IBM’s many shades of blue..) Role of the WWW • Wide range of data and analysis tools just a few clicks away (oversimplification) • Results and Ideas within and between disciplines disseminated very fast • Web offers potential for mining across several databases Lecture 1 CS566
Meat for Statistics and Computer Science • A lot can be learnt from the string representation of biological molecules • Now have data volumes for reliable statistical inferencing • Now have computer hardware to support implementation of algorithms • Challenges: • Stimulus for creating and refining statistical and computational approaches • Emulating Biology, as well as learning strategies from it • “Computer Science was invented for Bioinformatics”- Ewan Birney, GRC 2003 Lecture 1 CS566
Evolutionary Stochasticity • “The chimpanzee is our cousin, but so is yeast, albeit billions of years removed” • Building evolutionary trees has a lot of academic interest • But the simple fact of evolutionary relationships is useful in many ways • Comparison across species useful in understanding biology of individual species Lecture 1 CS566
Goals in Bioinformatics • Understand Biology • Cataloguing biomolecules • Understand what they do, in isolation • Understand how things work together, at different levels of abstraction • Cure disease • Drug target approach – Classical • Integrated approach – Futuristic • Multiple drugs for non-linear effects • Address source of problem, not effect Lecture 1 CS566
Nucleotide/Base/Phosphate DNA/cDNA RNA/mRNA/tRNA/rRNA Protein/Amino Acid Sequence/Sequencing Homology/Orthology/ Paralogy/Analogy Exon/Intron/Intergenic region Genetic code/Codon Splicing/Alternative splicing Species/System/Tissue/ Organ/Cell/Organelle Genome/Chromosome/ Chromatin/Histones/Gene/ Allele/Diploid/Haploid Recombination/Mutation Replication/Transcription/ Expression/Translation Eubacteria/Archaea/ Eukaryotes/Viruses Maternal Inheritance Bioinformatics Jargon 101 Lecture 1 CS566