1 / 19

Algorithms for Biological Sequence Analysis

Algorithms for Biological Sequence Analysis. Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University, Taiwan Date: September 19, 2006 E-mail: kmchao@csie.ntu.edu.tw WWW: http://www.csie.ntu.edu.tw/~kmchao. About this course.

takara
Download Presentation

Algorithms for Biological Sequence Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Algorithms for Biological Sequence Analysis Kun-Mao Chao (趙坤茂) Department of Computer Science and Information Engineering National Taiwan University, Taiwan Date: September 19, 2006 E-mail: kmchao@csie.ntu.edu.tw WWW: http://www.csie.ntu.edu.tw/~kmchao

  2. About this course • Course: Algorithms for biological sequence analysis • We will be focused on the sequence-related algorithmic problems. Genomic sequences are our main target. • The oldest language • The largest program • Fall semester, 2006 • Tuesday 10:20 – 13:10, 111 CSIE Building. • 3 credits • Web site: http://www.csie.ntu.edu.tw/~kmchao/seq06fall

  3. Coursework: • Homework assignments and Class participation (15%) • Two midterm exams (30% each): • November 7, 2006 (tentatively) • December 19, 2006 (tentatively) • Oral presentation of selected papers (25%)

  4. Outlines Part I: Sequence Homology • Introduction to genomes • Dynamic programming strategy revisited • Pairwise sequence alignment • Multiple sequence alignment • Chaining algorithms for genomic sequence analysis • Suboptimal alignment • Comparative genomics • Hidden Markov models (the Viterbi algorithm et al.) Part II: Sequence Composition • Maximum-sum and maximum-density segments • SNP and haplotype data analysis • Genome annotation • Other advanced topics

  5. A Brief History of Genetics • 1859 Darwin publishes The Origin of Species • 1865 Genes are particular factors • 1871 Discovery of nucleic acid • 1903 Chromosomes are hereditary units • 1910 Genes lie on chromosomes • 1913 Chromosomes are linear arrays of genes • 1931 Recombination occurs by crossing over

  6. A Brief History of Genetics (cont’d) • 1944 DNA is the genetic material • 1945 A gene codes for protein • 1951 First protein sequence • 1953 DNA is a double helix • 1961 Genetic code is triplet • 1977 Eukaryotic genes are interrupted • 1977 DNA can be sequenced • 21th Century: Many genomes completely sequenced

  7. Milestones of Bioinformatics • 1962 Pauling's theory of molecular evolution • 1965 Margaret Dayhoff's Atlas of Protein Sequences • 1970 Needleman-Wunsch algorithm • 1977 DNA sequencing and software to analyze it (Staden) • 1981 Smith-Waterman algorithm developed • 1981 The concept of a sequence motif (Doolittle) • 1982 GenBank Release 3 made public • 1982 Phage lambda genome sequenced

  8. Milestones of Bioinformatics (cont’d) • 1983 Sequence database searching algorithm (Wilbur-Lipman) • 1985 FASTP/FASTN: fast sequence similarity searching • 1988 National Center for Biotechnology Information (NCBI) created at NIH/NLM • 1988 EMBnet network for database distribution • 1990 BLAST: fast sequence similarity searching • 1991 EST: expressed sequence tag sequencing • 1993 Sanger Centre, Hinxton, UK • 1994 EMBL European Bioinformatics Institute, Hinxton, UK

  9. Milestones of Bioinformatics (cont’d) • 1995 First bacterial genomes completely sequenced • 1996 Yeast genome completely sequenced • 1997 PSI-BLAST • 1998 Worm (multicellular) genome completely sequenced • 1999 Fly genome completely sequenced

  10. Milestones of Bioinformatics (cont’d) • Human Genome Project (1990-2003) • Mouse 2002 • Rat 2004 • Chimpanzee 2005 • Completed Genomes

  11. Chimpanzee Genome

  12. The Primate Family Tree Source: Nature

  13. Source: My niece’s email

  14. Source: My niece’s email

  15. Source: My niece’s email

  16. Count every " F" in the following text: FINISHED FILES ARE THE RE SULT OF YEARS OF SCIENTI FIC STUDY COMBINED WITH THE EXPERIENCE OF YEARS... Source: My niece’s email

  17. Olny srmat poelpe can raed tihs. cdnuolt blveiee taht I cluod aulaclty  uesdnatnrd waht I was rdanieg. The phaonmneal pweor of the hmuan mnid, aoccdrnig  to a rscheearch at Cmabrigde Uinervtisy, it deosn't mttaer in waht oredr the  ltteers in a wrod are, the olny iprmoatnt tihng is taht the frist and lsat  ltteer be in the rghit pclae. The rset can be a taotl mses and you can sitll  raed it wouthit a porbelm. Source: My niece’s email

  18. “Discovery is to see what everyone else has seen, but think what no one else has thought.” Albert Szent-Györgyi(The Nobel Prize in Physiology or Medicine, 1937 ) “By inventing elegant software tools, we can help biologists see and think.” “Invention  Discovery” Kun-Mao Chao

  19. Source: My niece’s email

More Related