430 likes | 436 Views
Introduction to Bioinformatics 236523/234525. Lecturer: Prof. Yael Mandel-Gutfreund Teaching Assistance: Shula Shazman Idit kosti. Course web site : http://webcourse.cs.technion.ac.il/236523. What is Bioinformatics?. Course Objectives. To introduce the bioinfomatics discipline
E N D
Introduction to Bioinformatics236523/234525 Lecturer: Prof. Yael Mandel-Gutfreund Teaching Assistance: Shula Shazman Idit kosti Course web site : http://webcourse.cs.technion.ac.il/236523
Course Objectives • To introduce the bioinfomatics discipline • To make the students familiar with the major biological questions which can be addressed by bioinformatics tools • To introduce the major tools used for sequence and structure analysis and explainin general how they work (limitation etc..)
Course Structure and Requirements • Class Structure • 2 hours Lecture • 1 hour tutorial 2. Home work • Homework assignments will be given every second week • The homework will be done in pairs. • 5/5 homework assignments will be submitted 2. A final project will be conducted and submitted in pairs
Grading • 20 % Homework assignments • 80 % final project
Literature list • Gibas, C., Jambeck, P. Developing Bioinformatics Computer Skills. O'Reilly, 2001. • Lesk, A. M. Introduction to Bioinformatics. Oxford University Press, 2002. • Mount, D.W. Bioinformatics: Sequence and Genome Analysis. 2nd ed.,Cold Spring Harbor Laboratory Press, 2004. Advanced Reading Jones N.C & Pevzner P.A. An introduction to Bioinformatics algorithms MITPress, 2004
What is Bioinformatics? “The field of science in which biology, computer science, and information technology merge to form a single discipline” Ultimate goal: to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be discerned.
21ST centaury Genome Transcriptome Proteome Central Paradigm in Molecular Biology Gene (DNA) mRNA Protein
From DNA to Genome First protein sequence Watson and Crick DNA model 1955 1960 First protein structure 1965 1970 1975 1980 1985
1990 First genome Hemophilus Influenzae 1995 Yeast genome First human genome draft 2000
Complete Genomes Total 1379 294 Eukaryotes 133 39 Bacteria 1152 235 Archaea 94 23 20102005
1,000 Genomes Project: Expanding the Map of Human Genetics Researchers hope the effort will speed up the discovery of many diseases's genetic roots
25000 genomes… What’s Next ? The “post-genomics” era Annotation Comparative genomics Structural genomics Functional genomics Main Goal: To understand the living cell
From ….25000 genomes To…Understanding living cells
Annotation CCTGACAAATTCGACGTGCGGCATTGCATGCAGACGTGCATG CGTGCAAATAATCAATGTGGACTTTTCTGCGATTATGGAAGAA CTTTGTTACGCGTTTTTGTCATGGCTTTGGTCCCGCTTTGTTC AGAATGCTTTTAATAAGCGGGGTTACCGGTTTGGTTAGCGAGA AGAGCCAGTAAAAGACGCAGTGACGGAGATGTCTGATG CAA TAT GGA CAA TTG GTT TCT TCT CTG AAT ...... .............. TGAAAAACGTA
Identify the genes within a given sequence of DNA Identify the sites Which regulate the gene Annotation Predict the function
How do we identify a gene in a genome? A gene is characterized by several features (promoter, ORF…) some are easier and some harder to detect…
promoter TF binding site Transcription Start Site Ribosome binding Site ORF=Open Reading Frame CDS=Coding Sequence CCTGACAAATTCGACGTGCGGCATTGCATGCAGACGTGCATG CGTGCAAATAATCAATGTGGACTTTTCTGCGATTATGGAAGAA CTTTGTTACGCGTTTTTGTCATGGCTTTGGTCCCGCTTTGTTC AGAATGCTTTTAATAAGCGGGGTTACCGGTTTGGTTAGCGAGA AGAGCCAGTAAAAGACGCAGTGACGGAGATGTCTGATGCAA TATGGACAATTGGTTTCTTCTCTGAAT ................................. ..............TGAAAAACGTA
Using Bioinformatics approaches for Gene hunting Relative easy in simple organisms (e.g. bacteria) VERY HARD for higher organism (e.g. humans)
Comparative genomics
Perhaps not surprising!!! How humans are chimps? Comparison between the full drafts of the human and chimp genomes revealed that they differ only by 1.23%
So where are we different ?? Human ATAGCGGGGGGATGCGGGCCCTATACCC Chimp ATAGGGG - - GGATGCGGGCCCTATACCC Mouse ATAGCG - - - GGATGCGGCGC -TATACCA
And where are we similar ??? VERY SIMAILAR Conserved between many organisms VERY DIFFERENT
Functional genomics
TO BE IS NOT ENOUGH In any time point a gene can be functional or not
From the gene expression pattern we can lean: What does the gene do ? When is it needed? What other genes or proteins interact with it? ….. What's wrong??
Structural Genomics
protein complexes Evolutionary relationship fold Biologic processes Protein-ligand complexes Shape and electrostatics Active sites Functional sites The protein three dimensional structure can tell much more than the sequence alone
Resources and Databases The different types of data are collected in database • Sequence databases • Structural databases • Databases of Experimental Results All databases are connected
Sequence databases • Gene database • Genome database • Disease related mutation database • ………….
Genome Browsers Easy “walk” through the genome
Genome Browsers • UCSC Genome Browserhttp://genome.ucsc.edu/ • Ensembl Genome Browser(http://www.ensembl.org) • WormBase:http://www.wormbase.org/ • AceDB:http://www.acedb.org/ • Comprehensive Microbial Resource:http://www.tigr.org/tigr-scripts/CMR2/CMRHomePage.spl • FlyBase:http://flybase.bio.indiana.edu/
Mutation database • Single base difference in a single position among two different individuals of the same species • Play an important role in differentiation and disease
Sickle Cell Anemia • Due to 1 swapping an A for a T, causing inserted amino acid to be valine instead of glutamine in hemoglobin Image source: http://www.cc.nih.gov/ccc/ccnews/nov99/
Healthy Individual >gi|28302128|ref|NM_000518.4| Homo sapiens hemoglobin, beta (HBB), mRNA ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGA GGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGC AGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATG CTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGC TCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGAT CCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGCAAAGAATTCA CCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCA CTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTTCCCTAAGTCCAACTACTAAACT GGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAACATTTATTTTCATTGC >gi|4504349|ref|NP_000509.1| beta globin [Homo sapiens] MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLG AFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVAN ALAHKYH
Diseased Individual >gi|28302128|ref|NM_000518.4| Homo sapiens hemoglobin, beta (HBB), mRNA ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGA GGTGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGC AGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATG CTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGC TCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGAT CCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGCAAAGAATTCA CCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCA CTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTTCCCTAAGTCCAACTACTAAACT GGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAACATTTATTTTCATTGC >gi|4504349|ref|NP_000509.1| beta globin [Homo sapiens] MVHLTPVEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLG AFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVAN ALAHKYH
Structure Databases • 3-dimensional structures of proteins, nucleic acids, molecular complexes etc • 3-d data is available due to techniques such as NMR and X-Ray crystallography
Databases of Experimental Results • Data such as experimental microarray images- gene expression data • Proteomic data- protein expression data • Metabolic pathways, protein-protein interaction data, regulatory networks • ETC………….
PubMed Literature Databases http://www.ncbi.nlm.nih.gov/pubmed/ Service of the National Library of Medicine
Putting it all Together • Each Database contains specific information • Like other biological systems also these databases are interrelated
PROTEIN PIR SWISS-PROT DISEASE LocusLink OMIM OMIA ASSEMBLED GENOMES GoldenPath WormBase TIGR MOTIFS BLOCKS Pfam Prosite GENOMIC DATA GenBank DDBJ EMBL ESTs dbEST unigene GENES RefSeq AllGenes GDB SNPs dbSNP GENE EXPRESSION Stanford MGDB NetAffx ArrayExpress PATHWAY KEGG COG STRUCTURE PDB MMDB SCOP LITERATURE PubMed