Bioinformatics

MOLEKYLÄRBIOLOGI MED GENETIK – BIOINFORMATIK HT -07 Bioinformatics David Brodin David.Brodin@biosci.ki.se BEA core facility www.bea.ki.se Course web page: www.bea.ki.se/biomedicin_v42/

Lecture Content Monday • Introduction to Bioinformatics -History of Bioinformatics -Need for computers -Computational Biologi -Fields of Bioinformatics -Bioinformatic tools • Homologi, sekvensanalys och fylogenetik • Introduction to Microarrays & Lab Tuesday • Mass spectrometry • Web Databases, bioinformatic tools etc • Genotyping Arrays • Tiling Arrays Wednesday • Computer Lab

Need for Computers Major advances in • the field of molecular biology • genomic technologies Explosive growth in the biological information generated by the scientific community Need of computerized databases to store, organize, and index the data and for specialized tools to view and analyze the data what computer science is to molecular biology is like what mathematics has been to physics ...... -- Larry Hunter, ISMB’94

History of Bioinformatics History of DNA Sequencing Adapted from Messing & Llaca, PNAS (1998)

History of Bioinformatics

History of Bioinformatics • Early database: The Atlas of Protein Sequences was available on Digital Tape in 1978, and by modem 1980. • Early programs: restriction enzyme sites, pattern finding, promoters, etc… circa 1978. • 1982: DDBJ/EMBL/GenBank are created as a public repository of genetic sequence information. • 1983: NIH funds the PIR (Protein Information Resource) database. • 1988: Pearson and Lipman create FASTA Number of published base pairs • 1971 First published DNA sequence 12 • 1977 PhiX174 5,375 • 1982 Lambda 48,502 • 1992 Yeast Chromosome III 316,613 • 1995 Haemophilus influenza 1,830,138 • 1996 Saccharomyces 12,068,000 • 1998 C. elegans 97,000,000 • 2000 D. melanogaster 120,000,000 • 2001 H. sapines (draft) 2,600,000,000 • 2003 H. sapiens 2,850,000,000

Computational Biology In the early days of bioinformatics a major concern was creation and maintenance of databases to store biological information, involving design issues and development of complex user interfaces. Today the most pressing task involves the analysis and interpretation of various types of data, including nucleotide and amino acid sequences, protein domains, and protein structures. The actual process of analyzing and interpreting data is referred to as computational biology. Bioinformatics: Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data. Computational Biology: The development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems. -National Institute of Health (NIH) Biology in the 21st century is being transformed from a lab-based science to an information science as well.

Biological Databases A biological database is a large, organized body of persistent data, usually associated with computerized software designed to update, query, and retrieve components of the data stored within the system. For researchers to benefit from the data stored in a database, two additional requirements must be met: • easy access to the information • a method for extracting only that information needed to answer a specific biological question Nucleotide sequence record: • the input sequence with a description of the type of molecule • ID of sequence • the scientific name of the source organism • contact name • literature citations associated with the sequence.

Sub-disciplines, challenges & goals Important sub-disciplines: • Analysis and interpretation of various types of biological data • Developement of new algorithms and statistics with which to asses biological information • Development and implementation of tools that enable efficient access and management of different types of information Challenges of working with bioinformat.: • Need to feel comfortable in interdisciplinary area • Depend on others for primary data • Need to address important biological and computer science problems Important goal of bioinformatics: understanding basic biological processes and, in turn, advances in the diagnosis, treatment, and prevention of many genetic diseases.

Fields of Bioinformatics The ”omics” Series: • Genomics: Gene identification & characterization • Transcriptomics: Expression profiles of mRNA • Proteomics: Functions & interactions of proteins • Structural Genomics: Large scale structure determination • Cellinomics: Metabolic Pathways, Cell-cell interactions • Pharmacogenomics: Genome-based drug design

Typical Questions Biological problems that computers can help with: • I cloned a gene –is it a known gene? • Does the sequence match? Is the sequence any good? • Is the sequence similar to other known sequences? • Which gene family does it belong to? • The gene I´m interested in was found in another organism, but not in mine. How can I look for it? • How is the gene expressed in different types of tissues? • What is the biological function of the protein encoded by the gene? • Is the gene associated with any disease? Increasingly, biological studies begin with a scientist conducting vast numbers of database and web site searches to formulate specific hypotheses or to design large-scale experiments.

Bioinformatic tools • Many different bioinformatic tools avaiable over the internet free of charge to whoever wishes to use them • Also many commersial software packages avaiable • Some bioinformaticians write their own tools for specialized tasks Many platforms avaiable for software development...

Open Source & Open Access Open Source in the life sciences: • Present in all areas of bioinformatics • Some very well known examples of tools used in industry and academic circles include: • BLAST • EMBOSS • EnsEMBL • GenScan • Bioconductor Open Access: • Unrestricted access to data • Allows all to work and make discoveries • Discoveries are not necessarily open access • Open access is applicable to any kind of data you want to apply it to: • Sequence data (DNA, RNA or protein) • Gene expression data • Protein-protein interaction data • Publication

Top 10 Future Challenges Precise, predictive model of transcription initiation and termination: ability to predict where and when transcription will occur in a genomePrecise, predictive model of RNA splicing/alternative splicing: ability to predict the splicing pattern of any primary transcript in any tissue Precise, quantitative models of signal transduction pathways: ability to predict cellular responses to external stimuli Determining effective protein:DNA, protein:RNA and protein:protein recognition codesAccurate ab initio protein structure prediction Rational design of small molecule inhibitors of proteinsMechanistic understanding of protein evolution: understanding exactly how new protein functions evolveMechanistic understanding of speciation: molecular details of how speciation occursContinued development of effective gene ontologies - systematic ways to describe the functions of any gene or protein Education: development of appropriate bioinformatics curricula for secondary, undergraduate and graduate education Chris Burge, Ewan Birney, Jim Fickett. Genome Technology, issue No. 17, January, 2002

Bioinformatics

Bioinformatics

Presentation Transcript

Bioinformatics

Bioinformatics:

Bioinformatics

Bioinformatics

Bioinformatics

Bioinformatics

Bioinformatics

Bioinformatics

Bioinformatics

Bioinformatics

Bioinformatics

Bioinformatics

Bioinformatics

Bioinformatics

Bioinformatics

Bioinformatics

Bioinformatics

Bioinformatics

Bioinformatics

Bioinformatics

Bioinformatics