480 likes | 1.3k Views
NCBI Bioinformatics Workshop. Rabat, Morocco 2012. What is Bioinformatics?. Bioinformatics is the application of information technology to the field of molecular biology .
E N D
NCBI Bioinformatics Workshop Rabat, Morocco 2012
What is Bioinformatics? Bioinformatics is the application of information technology to the field of molecular biology. The term bioinformatics was coined by Paulien Hogeweg in 1979 for the study of informatics' processes in biotic systems. Its primary use since at least the late 1980s has been in genomics and genetics, particularly in those areas of genomics involving large-scale DNA sequencing. Bioinformatics now entails the creation and advancement of databases, algorithms, computational and statistical techniques, and theory to solve formal and practical problems arising from the management and analysis of biological data. Wikipedia
What is NCBI? On November 4, 1988 that President Ronald Reagan signed the Health Omnibus Extension Act to create The National Center for Biotechnology Information as part of National Library of Medicine at NIH. • Create automated systems for knowledge about molecular biology, biochemistry, and genetics. • Perform research into advanced methods of analyzing and interpreting molecular biology data. • Enable biotechnology researchers and medical care personnel to use the systems and methods developed.
History of molecular biology 1860 Genetics GregorMendel discovered that genes determine characteristics of the organism genes are passed to children from both parents 1943 Molecular biology James Watson discovered that DNA molecule might store the genes 1962 Noble Prize James Watson, Francis Crick, Wilkins (Rosaline Franklin) 1970 Central Dogma (first announced in 1952) and reinstated by Francis Crick in Nature.
Central Dogma of molecular biology The central dogma of molecular biology was first enunciated by Francis Crick in 1958[1] and re-stated in a Nature paper published in 1970 The general transfers describe the normal flow of biological information: DNA can be copied to DNA (DNA replication), DNA information can be copied into mRNA, (transcription), and proteins can be synthesized using the information in mRNA as a template (translation). Does the central dogma still stand? Koonin EV. Biol Direct. 2012 Aug 23;7(1):27. [Epub ahead of print]
History of biotechnology 1590 the microscope is discovered by Janssen 1675Leeuwehoek discovered protozoa and bacteria 1855 Escherichia coli bacterium is discovered (major research and production tool for biotechnology 1879Flemming discovered chromatin, rod-like structures in cell nucleus, later called ‘chromosomes’ 1942 The electron microscope is used to identify and characterize a bacteriophage- a virus that infects bacteria. 1953 Watson and Crick reveal the three-dimensional structure of DNA.1973 Cohen and Boyer perform the first successful recombinant DNA experiment, using bacterial genes. 1983 The Polymerase Chain Reaction (PCR) technique 1995 First bacterial genome is sequenced by whole genome shotgun technology 2001 The sequence of the human genome is published in Science and Nature, making it possible for researchers all over the world to begin developing treatments. 2005 Next Generation Sequencing: Illumna, MySeq, Ion Toron, PAcBio
Sequence database 1960 - Margaret Dayhoff collected sequences in a database that later become PIR 1974 –GenBank; 1980 –EMBL(ENA);1984 – DDBJ;1984 –SwissProt Sequence comparison 1970 – Needleman- Wuncsh global pairwise alignment 1972 - Smith-Waterman local alignment 1973 – multiple alignment Database searches by sequence similarity 1988 – FASTA by Pearson and Lipman 1990 – BLAST by Altshul, Gish, Lipman Text search and retrieval system 1990 – Entrez designed by Lipman and Benson Algorithms Gene prediction Protein structure Hidden Markov Model Clustering Trees History of Bioinformatics
Problem Solving DATA Data managment Validation Hypothesis Experiment MODEL Interpretation Visualization Analysis For every complex problem, there is an answer that is clear, simple, and wrong… - H. L. Mencken
ROC curve analysis Receiver Operating Characteristic (ROC) curve analysis (Metz, 1978; Zweig & Campbell, 1993)
Challenges in Computational Biology Multiple alignments and phylogenetic tree Protein structure prediction Protein Genome assembly and annotation Homology searches
Challenging issues in Bioinformatics • Data management processing, storage accuracy (highthrouput low quality) search and retrieval presentation • Data analysis algorithms statistical techniques • Simulation modeling and prediction Parameter estimation prediction accuracy
NCBI mission: discovery initiative NCBI Validation Analysis Search Visualization
What is GenBank? NCBI’s Primary Sequence Database • Nucleotide only sequence database • Archival in nature • Historical • Reflective of submitter point of view (subjective) • Redundant • GenBank Data • Direct submissions (traditional records) • Batch submissions (EST, GSS, STS) • ftp accounts (genome data) • Three collaborating databases • GenBank • DNA Database of Japan (DDBJ) • European Molecular Biology Laboratory (EMBL) Database
ACGTGC C C GA GA ATT GA GA C ATT TATAGCCG AGCTCCGATA CCGATGACAA RefSeq C TATAGCCG ACGTGC Curators CGTGA ATTGACTA TTGACA Genome Assembly TTGACA TTGACA ACGTGC ACGTGC TATAGCCG CGTGA CGTGA TATAGCCG ATTGACTA TATAGCCG ATTGACTA ATTGACTA CGTGA ATTGACTA ATTGACTA ATT TATAGCCG TATAGCCG TATAGCCG TATAGCCG TATAGCCG TTGACA C GenBank UniGene GA AT C C C C ATT GA GA GA GA ATT ATT ATT Algorithms GA GA GA GA C C ATT ATT C C Sequence Databases Labs Sequencing Centers Updated continually by NCBI Updated ONLY by submitters
Information retrieval NCBI Discovery initiative
Entrez Search and retrieval system Vice President Gore 1997 "From a computer in the comfort of your own home or from one in your neighborhood library, you will be able to access timely and accurate information. Already 30,000 people a day are using MEDLINE. By making it more accessible -- free and private -- we can increase that number many times over."
Improve information retrieval Add links filters Related information
Rescuing Zero-Result PubMed Searches 2011 2008 Zero-result rescued by spelling Zero-result rescued by spelling 19% Improvement 37% Improvement Unassisted Auto-complete Gene sensor Citation sensor/Hydra 16% of all PubMed searches Unassisted