1 / 42

Introduction to Bioinformatics | Explore the Intersection of Biology and Computing

This course introduces the field of bioinformatics and teaches students about the major biological questions that can be addressed using bioinformatics tools. The course covers sequence and structure analysis tools and their limitations. The grading includes homework assignments and a final project.

joyv
Download Presentation

Introduction to Bioinformatics | Explore the Intersection of Biology and Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Bioinformatics236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Shula Shazman Sivan Bercovici Course web site : http://webcourse.cs.technion.ac.il/236523

  2. What is Bioinformatics?

  3. Course Objectives • To introduce the bioinfomatics discipline • To make the students familiar with the major biological questions which can be addressed by bioinformatics tools • To introduce the major tools used for sequence and structure analysis and explainin general how they work (limitation etc..)

  4. Course Structure and Requirements • Class Structure • 2 hours Lecture • 1 hour tutorial 2. Home work • Homework projects will be given every third week • The homework will be done in pairs. • 4/4 homework projects submitted 2. A final project will be conducted and submitted in pairs

  5. Grading • 30 % Homework assignments • 70% final project

  6. Literature list • Gibas, C., Jambeck, P. Developing Bioinformatics Computer Skills. O'Reilly, 2001. • Lesk, A. M. Introduction to Bioinformatics. Oxford University Press, 2002. • Mount, D.W. Bioinformatics: Sequence and Genome Analysis. 2nd ed.,Cold Spring Harbor Laboratory Press, 2004. Advanced Reading Jones N.C & Pevzner P.A. An introduction to Bioinformatics algorithms MITPress, 2004

  7. What is Bioinformatics?

  8. What is Bioinformatics? “The field of science in which biology, computer science, and information technology merge to form a single discipline” Ultimate goal: to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be discerned.

  9. 21ST centaury Genome Transcriptome Proteome Central Paradigm in Molecular Biology Gene (DNA) mRNA Protein

  10. from purely lab-based science to an information science Bioinformatics Bio = Informatics

  11. From DNA to Genome First protein sequence Watson and Crick DNA model 1955 1960 First protein structure 1965 1970 1975 1980 1985

  12. 1990 First bacterial genome Hemophilus Influenzae 1995 Yeast genome First human genome draft 2000

  13. Complete Genomes Total 706 456 Eukaryotes 78 43 Bacteria 578 383 Archaea 50 29 2008 2007

  14. What’s Next ? The “post-genomics” era Annotation Comparative genomics Structural genomics Functional genomics Goal: to understand the living cell

  15. Annotation CCTGACAAATTCGACGTGCGGCATTGCATGCAGACGTGCATG CGTGCAAATAATCAATGTGGACTTTTCTGCGATTATGGAAGAA CTTTGTTACGCGTTTTTGTCATGGCTTTGGTCCCGCTTTGTTC AGAATGCTTTTAATAAGCGGGGTTACCGGTTTGGTTAGCGAGA AGAGCCAGTAAAAGACGCAGTGACGGAGATGTCTGATG CAA TAT GGA CAA TTG GTT TCT TCT CTG AAT ...... .............. TGAAAAACGTA

  16. Identify the genes within a given sequence of DNA Identify the sites Which regulate the gene Annotation Predict the function

  17. promoter TF binding site Transcription Start Site Ribosome binding Site ORF=Open Reading Frame CDS=Coding Sequence CCTGACAAATTCGACGTGCGGCATTGCATGCAGACGTGCATG CGTGCAAATAATCAATGTGGACTTTTCTGCGATTATGGAAGAA CTTTGTTACGCGTTTTTGTCATGGCTTTGGTCCCGCTTTGTTC AGAATGCTTTTAATAAGCGGGGTTACCGGTTTGGTTAGCGAGA AGAGCCAGTAAAAGACGCAGTGACGGAGATGTCTGATGCAA TATGGACAATTGGTTTCTTCTCTGAAT ................................. ..............TGAAAAACGTA

  18. Comparative genomics Human ATAGCGGGGGGATGCGGGCCCTATACCC Chimp ATAGGGG - - GGATGCGGGCCCTATACCC Mouse ATAGCG - - - GGATGCGGCGC -TATACCA

  19. Perhaps not surprising!!! How humans are chimps? Comparison between the full drafts of the human and chimp genomes revealed that they differ only by 1.23%

  20. Researchers have learned a great deal about the function of human genes by examining their counterparts in simpler model organisms such as the mouse. Conservation of the IGFALS (Insulin-like growth factor) Between human and mouse.

  21. Functional genomics

  22. Understanding the function of genes and other parts of the genome

  23. A network of interactions can be built for all proteins in an organism A large network of 8184 interactions among 4140 S. Cerevisiae proteins

  24. Structural Genomics

  25. Assigning the structures of all proteins protein complexes Evolutionary relationship fold Biologic processes Protein-ligand complexes Shape and electrostatics Active sites Functional sites

  26. Resources and Databases The different types of data are collected in database • Sequence databases • Structural databases • Databases of Experimental Results All databases are connected

  27. Sequence databases • Gene database • Genome database • SNPs database • Disease related mutation database

  28. Gene database • Give information into gene structure and function • Alternative splicing of genes • Alternative pattern of exons included to create gene product

  29. Genome Databases • Data organized by species • Clones assembled into contigous pieces ‘contigs’ or whole chromosomes • Information on non-coding regions • Relativity

  30. Genome Browsers • Annotation adds value to sequence • Easy “walk” through the genome • Comparative genomics

  31. Genome Browsers • UCSC Genome Browserhttp://genome.ucsc.edu/ • Ensembl Genome Browser(http://www.ensembl.org) • WormBase:http://www.wormbase.org/ • AceDB:http://www.acedb.org/ • Comprehensive Microbial Resource:http://www.tigr.org/tigr-scripts/CMR2/CMRHomePage.spl • FlyBase:http://flybase.bio.indiana.edu/

  32. SNP database Single Nucleotide Polymorphisms (SNPs) • Single base difference in a single position among two different individuals of the same species • Play an important role in differentiation and disease

  33. Sickle Cell Anemia • Due to 1 swapping an A for a T, causing inserted amino acid to be valine instead of glutamine in hemoglobin Image source: http://www.cc.nih.gov/ccc/ccnews/nov99/

  34. Healthy Individual >gi|28302128|ref|NM_000518.4| Homo sapiens hemoglobin, beta (HBB), mRNA ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGA GGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGC AGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATG CTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGC TCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGAT CCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGCAAAGAATTCA CCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCA CTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTTCCCTAAGTCCAACTACTAAACT GGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAACATTTATTTTCATTGC >gi|4504349|ref|NP_000509.1| beta globin [Homo sapiens] MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLG AFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVAN ALAHKYH

  35. Diseased Individual >gi|28302128|ref|NM_000518.4| Homo sapiens hemoglobin, beta (HBB), mRNA ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGA GGTGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGC AGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATG CTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGC TCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGAT CCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGCAAAGAATTCA CCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCA CTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTTCCCTAAGTCCAACTACTAAACT GGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAACATTTATTTTCATTGC >gi|4504349|ref|NP_000509.1| beta globin [Homo sapiens] MVHLTPVEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLG AFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVAN ALAHKYH

  36. Disease Databases • Genes are involved in disease • Many diseases are well studied • Description of diseases and what is known about them is stored

  37. Structure Databases • 3-dimensional structures of proteins, nucleic acids, molecular complexes etc • 3-d data is available due to techniques such as NMR and X-Ray crystallography

  38. Databases of Experimental Results • Data such as experimental microarray images- expression data • Proteomic data • Metabolic pathways, protein-protein interaction data, regulatory networks • ETC………….

  39. PubMed Literature Databases http://www.ncbi.nlm.nih.giv/PubMed Service of the National Library of Medicine

  40. Putting it all Together • Each Database contains specific information • Like other biological systems also these databases are interrelated

  41. PROTEIN PIR SWISS-PROT DISEASE LocusLink OMIM OMIA ASSEMBLED GENOMES GoldenPath WormBase TIGR MOTIFS BLOCKS Pfam Prosite GENOMIC DATA GenBank DDBJ EMBL ESTs dbEST unigene GENES RefSeq AllGenes GDB SNPs dbSNP GENE EXPRESSION Stanford MGDB NetAffx ArrayExpress PATHWAY KEGG COG STRUCTURE PDB MMDB SCOP LITERATURE PubMed

More Related