130 likes | 142 Views
Algorithms for Biological Sequence Analysis. Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University, Taiwan Date: Feb. 22, 2011 WWW: http://www.csie.ntu.edu.tw/~kmchao. About this course.
E N D
Algorithms for Biological Sequence Analysis Kun-Mao Chao (趙坤茂) Department of Computer Science and Information Engineering National Taiwan University, Taiwan Date: Feb. 22, 2011 WWW: http://www.csie.ntu.edu.tw/~kmchao
About this course • Course: Algorithms for biological sequence analysis • Some basic knowledge on algorithm development and program design is required. • We will be focused on the sequence-related algorithmic problems. Genomic sequences are our main target. • The oldest language • The largest program • Spring semester, 2011 • 9:10 - 12:10 Tuesday, 107 CSIE Building • 3 credits • Web site: http://www.csie.ntu.edu.tw/~kmchao/seq11spr
Coursework: • Homework assignments and Class participation (10%) • Two midterm exams (70%; 35% each): • April 12, 2011 (tentatively) • May 24, 2011 (tentatively) • Oral presentation of selected papers (20%)
Outlines Part I: Sequence Homology • Introduction to basic algorithmic strategies • Pairwise sequence alignment • Multiple sequence alignment • Chaining algorithms for genomic sequence analysis • Suboptimal alignment • Comparative genomics • Compressed / constrained sequence comparison • Hidden Markov models (the Viterbi algorithm et al.) Part II: Sequence Composition • Maximum-sum and maximum-density segments • SNP and haplotype data analysis • Approximate gapped palindrome • Genome annotation • Other advanced topics
A Brief History of Genetics • 1859 Charles Darwin published “The Origin of Species.” • 1865 Genes are particular factors. [Gregor Mendel] • 1869 Discovery of nucleic acid [Friedrich Miescher] • 1903 Chromosomes are hereditary units. [Walter Sutton] • 1910 Genes lie on chromosomes. [Thomas Hunt Morgan] • 1913 Chromosomes are linear arrays of genes. [Alfred Sturtevant]
A Brief History of Genetics (cont’d) • 1931 Recombination occurs by crossing over. [Harriet Creighton and Barbara McClintock] • 1944 DNA is the genetic material. [Oswald Avery, Colin McLeod and Maclyn McCarty] • 1953 DNA is a double helix. [James Watson and Francis Crick] • 1961-1967 Genetic code is triplet. [Marshall Nirenberg, Har Gobind Khorana, Sydney Brenner & Francis Crick] • 1977 DNA was sequenced for the first time. [Fred Sanger, Walter Gilbert, and Allan Maxam] • 21th Century: Many genomes completely sequenced
Milestones of Bioinformatics • 1962 Pauling's theory of molecular evolution • 1965 Margaret Dayhoff's Atlas of Protein Sequences • 1970 Needleman-Wunsch algorithm • 1977 DNA sequencing and software to analyze it (Staden) • 1981 Smith-Waterman algorithm developed • 1981 The concept of a sequence motif (Doolittle) • 1982 GenBank Release 3 made public • 1982 Phage lambda genome sequenced
Milestones of Bioinformatics (cont’d) • 1983 Sequence database searching algorithm (Wilbur-Lipman) • 1985 FASTP/FASTN: fast sequence similarity searching • 1988 National Center for Biotechnology Information (NCBI) created at NIH/NLM • 1988 EMBnet network for database distribution • 1990 BLAST: fast sequence similarity searching • 1991 EST: expressed sequence tag sequencing • 1993 Sanger Centre, Hinxton, UK • 1994 EMBL European Bioinformatics Institute, Hinxton, UK
Milestones of Bioinformatics (cont’d) • 1995 First bacterial genomes completely sequenced • 1996 Yeast genome completely sequenced • 1997 PSI-BLAST • 1998 Worm (multicellular) genome completely sequenced • 1999 Fly genome completely sequenced
Milestones of Bioinformatics (cont’d) • Human Genome Project (1990-2003) • Mouse 2002 • Rat 2004 • Chimpanzee 2005 • Completed Genomes
The Primate Family Tree Source: Nature
A Sequence Analysis BookPublished by Springer in 2009 (ISBN 978-1848003194) Sequence Comparison: Theory and Methods by Kun-Mao Chao and Louxin Zhang