190 likes | 217 Views
Algorithms for Biological Sequence Analysis. Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University, Taiwan Date: September 19, 2006 E-mail: kmchao@csie.ntu.edu.tw WWW: http://www.csie.ntu.edu.tw/~kmchao. About this course.
E N D
Algorithms for Biological Sequence Analysis Kun-Mao Chao (趙坤茂) Department of Computer Science and Information Engineering National Taiwan University, Taiwan Date: September 19, 2006 E-mail: kmchao@csie.ntu.edu.tw WWW: http://www.csie.ntu.edu.tw/~kmchao
About this course • Course: Algorithms for biological sequence analysis • We will be focused on the sequence-related algorithmic problems. Genomic sequences are our main target. • The oldest language • The largest program • Fall semester, 2006 • Tuesday 10:20 – 13:10, 111 CSIE Building. • 3 credits • Web site: http://www.csie.ntu.edu.tw/~kmchao/seq06fall
Coursework: • Homework assignments and Class participation (15%) • Two midterm exams (30% each): • November 7, 2006 (tentatively) • December 19, 2006 (tentatively) • Oral presentation of selected papers (25%)
Outlines Part I: Sequence Homology • Introduction to genomes • Dynamic programming strategy revisited • Pairwise sequence alignment • Multiple sequence alignment • Chaining algorithms for genomic sequence analysis • Suboptimal alignment • Comparative genomics • Hidden Markov models (the Viterbi algorithm et al.) Part II: Sequence Composition • Maximum-sum and maximum-density segments • SNP and haplotype data analysis • Genome annotation • Other advanced topics
A Brief History of Genetics • 1859 Darwin publishes The Origin of Species • 1865 Genes are particular factors • 1871 Discovery of nucleic acid • 1903 Chromosomes are hereditary units • 1910 Genes lie on chromosomes • 1913 Chromosomes are linear arrays of genes • 1931 Recombination occurs by crossing over
A Brief History of Genetics (cont’d) • 1944 DNA is the genetic material • 1945 A gene codes for protein • 1951 First protein sequence • 1953 DNA is a double helix • 1961 Genetic code is triplet • 1977 Eukaryotic genes are interrupted • 1977 DNA can be sequenced • 21th Century: Many genomes completely sequenced
Milestones of Bioinformatics • 1962 Pauling's theory of molecular evolution • 1965 Margaret Dayhoff's Atlas of Protein Sequences • 1970 Needleman-Wunsch algorithm • 1977 DNA sequencing and software to analyze it (Staden) • 1981 Smith-Waterman algorithm developed • 1981 The concept of a sequence motif (Doolittle) • 1982 GenBank Release 3 made public • 1982 Phage lambda genome sequenced
Milestones of Bioinformatics (cont’d) • 1983 Sequence database searching algorithm (Wilbur-Lipman) • 1985 FASTP/FASTN: fast sequence similarity searching • 1988 National Center for Biotechnology Information (NCBI) created at NIH/NLM • 1988 EMBnet network for database distribution • 1990 BLAST: fast sequence similarity searching • 1991 EST: expressed sequence tag sequencing • 1993 Sanger Centre, Hinxton, UK • 1994 EMBL European Bioinformatics Institute, Hinxton, UK
Milestones of Bioinformatics (cont’d) • 1995 First bacterial genomes completely sequenced • 1996 Yeast genome completely sequenced • 1997 PSI-BLAST • 1998 Worm (multicellular) genome completely sequenced • 1999 Fly genome completely sequenced
Milestones of Bioinformatics (cont’d) • Human Genome Project (1990-2003) • Mouse 2002 • Rat 2004 • Chimpanzee 2005 • Completed Genomes
The Primate Family Tree Source: Nature
Count every " F" in the following text: FINISHED FILES ARE THE RE SULT OF YEARS OF SCIENTI FIC STUDY COMBINED WITH THE EXPERIENCE OF YEARS... Source: My niece’s email
Olny srmat poelpe can raed tihs. cdnuolt blveiee taht I cluod aulaclty uesdnatnrd waht I was rdanieg. The phaonmneal pweor of the hmuan mnid, aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it deosn't mttaer in waht oredr the ltteers in a wrod are, the olny iprmoatnt tihng is taht the frist and lsat ltteer be in the rghit pclae. The rset can be a taotl mses and you can sitll raed it wouthit a porbelm. Source: My niece’s email
“Discovery is to see what everyone else has seen, but think what no one else has thought.” Albert Szent-Györgyi(The Nobel Prize in Physiology or Medicine, 1937 ) “By inventing elegant software tools, we can help biologists see and think.” “Invention Discovery” Kun-Mao Chao