130 likes | 150 Views
Explore algorithms for biological sequence analysis, covering homology, composition, and genetic history milestones. Coursework includes assignments, exams, and paper presentations. Delve into the history of genetics and bioinformatics milestones from 1859 to present.
E N D
Algorithms for Biological Sequence Analysis Kun-Mao Chao (趙坤茂) Department of Computer Science and Information Engineering National Taiwan University, Taiwan Date: September 14, 2009 WWW: http://www.csie.ntu.edu.tw/~kmchao
About this course • Course: Algorithms for biological sequence analysis • Some basic knowledge on algorithm development and program design is required. • We will be focused on the sequence-related algorithmic problems. Genomic sequences are our main target. • The oldest language • The largest program • Fall semester, 2009 • 13:20 – 16:20 Monday, 107 CSIE Building. • 3 credits • Web site: http://www.csie.ntu.edu.tw/~kmchao/seq09fall
Coursework: • Homework assignments and Class participation (15%) • Two midterm exams (60%; 30% each): • October 26, 2009 (tentatively) • December 7, 2009 (tentatively) • Oral presentation of selected papers (25%)
Outlines Part I: Sequence Homology • Introduction to basic algorithmic strategies • Pairwise sequence alignment • Multiple sequence alignment • Chaining algorithms for genomic sequence analysis • Suboptimal alignment • Comparative genomics • Compressed / constrained sequence comparison • Hidden Markov models (the Viterbi algorithm et al.) Part II: Sequence Composition • Maximum-sum and maximum-density segments • SNP and haplotype data analysis • Approximate gapped palindrome • Genome annotation • Other advanced topics
A Brief History of Genetics • 1859 Charles Darwin published “The Origin of Species.” • 1865 Genes are particular factors. [Gregor Mendel] • 1869 Discovery of nucleic acid [Friedrich Miescher] • 1903 Chromosomes are hereditary units. [Walter Sutton] • 1910 Genes lie on chromosomes. [Thomas Hunt Morgan] • 1913 Chromosomes are linear arrays of genes. [Alfred Sturtevant]
A Brief History of Genetics (cont’d) • 1931 Recombination occurs by crossing over. [Harriet Creighton and Barbara McClintock] • 1944 DNA is the genetic material. [Oswald Avery, Colin McLeod and Maclyn McCarty] • 1953 DNA is a double helix. [James Watson and Francis Crick] • 1961-1967 Genetic code is triplet. [Marshall Nirenberg, Har Gobind Khorana, Sydney Brenner & Francis Crick] • 1977 DNA was sequenced for the first time. [Fred Sanger, Walter Gilbert, and Allan Maxam] • 21th Century: Many genomes completely sequenced
Milestones of Bioinformatics • 1962 Pauling's theory of molecular evolution • 1965 Margaret Dayhoff's Atlas of Protein Sequences • 1970 Needleman-Wunsch algorithm • 1977 DNA sequencing and software to analyze it (Staden) • 1981 Smith-Waterman algorithm developed • 1981 The concept of a sequence motif (Doolittle) • 1982 GenBank Release 3 made public • 1982 Phage lambda genome sequenced
Milestones of Bioinformatics (cont’d) • 1983 Sequence database searching algorithm (Wilbur-Lipman) • 1985 FASTP/FASTN: fast sequence similarity searching • 1988 National Center for Biotechnology Information (NCBI) created at NIH/NLM • 1988 EMBnet network for database distribution • 1990 BLAST: fast sequence similarity searching • 1991 EST: expressed sequence tag sequencing • 1993 Sanger Centre, Hinxton, UK • 1994 EMBL European Bioinformatics Institute, Hinxton, UK
Milestones of Bioinformatics (cont’d) • 1995 First bacterial genomes completely sequenced • 1996 Yeast genome completely sequenced • 1997 PSI-BLAST • 1998 Worm (multicellular) genome completely sequenced • 1999 Fly genome completely sequenced
Milestones of Bioinformatics (cont’d) • Human Genome Project (1990-2003) • Mouse 2002 • Rat 2004 • Chimpanzee 2005 • Completed Genomes
The Primate Family Tree Source: Nature
A New BookPublished by Springer in 2009 (ISBN 978-1848003194) Sequence Comparison: Theory and Methods by Kun-Mao Chao and Louxin Zhang