410 likes | 723 Views
Introduction. Day 1: Introduction Day 2: Sequence Analysis Day 3: Databases Day 3: Dynamic Programming mario@sanbi.ac.za. Goals of Bioinformatics. Understand living cells and how they function on a molecular level Done by analysing molecular sequence and structural data
E N D
Introduction • Day 1: Introduction • Day 2: Sequence Analysis • Day 3: Databases • Day 3: Dynamic Programming mario@sanbi.ac.za
Goals of Bioinformatics • Understand living cells and how they function on a molecular level • Done by analysing molecular sequence and structural data • Rationale is the “central dogma” of biology
Genomic Data (2009) http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome
Genomic Data (2010) http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome
Bioinformatics Limitations • Completely relying on the information is dangerous if the info is inaccurate • Quality of bioinformatics predictions depends on • quality of the data and • sophistication of the algorithms • Bioinformatics and experimental biology are complementary: • Bioinformatics results need to be consistent • with experimental biology
Bioinformatics Limitations • Data (e.g. sequence, expression) may contain errors • Downstream interpretation of sequence date will be wrong if the sequences or the annotation thereof is wrong • Many algorithms lack capability and sophistication to truly reflect reality • Outcome of computation also depends on available computing power
Definitions • Sequence alignment • Dynamic Programming • Global/ Local Alignment • Sequence Identity • Phylogenetics • Paralog/ homolog • Proteomics • Genomics • Transcriptomics • Annotation • BLAST • Sequence assembly • Contig
‘Omics’ • Genomics • Proteomics • Transcriptomics • Phylolomics etc. • Genomics • Structural • Functional
Structural Genomics • Deals with genome structures • Focus on study of • Genome mapping • Genome sequencing and assembly • Genome annotation • Genome comparison
Structural Genomics:Genome mapping • Identify relative locations of • Genes • Mutations or • Traits
Increasing Resolution Structural Genomics:Genome mapping Cytological Map Genetic Map Physical Map DNA Sequence Image adapted from “Essential Bioinformatics” by Jin Xiong For more info: Look at Chap 5 in “Genomes”, T.A. Brown (572.86 MAL) in the UWC Library or the online version at http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=genomes.chapter.6196
Increasing Resolution Structural Genomics:Genome mapping Cytological Map Genetic Map Physical Map DNA Sequence Image adapted from “Essential Bioinformatics” by Jin Xiong For more info: Look at Chap 5 in “Genomes”, T.A. Brown (572.86 MAL) in the UWC Library or the online version at http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=genomes.chapter.6196
Increasing Resolution * * * * * Structural Genomics:Genome mapping Cytological Map Genetic Map Physical Map DNA Sequence Image adapted from “Essential Bioinformatics” by Jin Xiong For more info: Look at Chap 5 in “Genomes”, T.A. Brown (572.86 MAL) in the UWC Library or the online version at http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=genomes.chapter.6196
Increasing Resolution * * * * * agctggatttgcgcgcaa Structural Genomics:Genome mapping Cytological Map Genetic Map Physical Map DNA Sequence Image adapted from “Essential Bioinformatics” by Jin Xiong For more info: Look at Chap 5 in “Genomes”, T.A. Brown (572.86 MAL) in the UWC Library or the online version at http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=genomes.chapter.6196
Structural Genomics:Genome sequencing • Shotgun sequencing • Genome is fragmented and cloned • Random sequencing of both ends of cloned DNA • High numbers of random sequences • It statistically ensures the whole genome is covered • Software used to assemble the random fragments into a single, contiguous genome
Structural Genomics:Shotgun sequencing http://www.scq.ubc.ca/wp-content/uploads/2006/08/shotgun1.gif
Structural Genomics:Genome sequencing • Hierarchical sequencing • 100-300kb genomic cloned into a BAC • Using a physical map, order and locations of BAC clones on chromosome can be determined • Successive sequencing of adjacent BAC clones result in coverage of the complete genome
Structural Genomics:Hierarchical sequencing http://www.scq.ubc.ca/wp-content/uploads/2006/08/topdownseq.gif
Structural Genomics:Shotgun vs Hierarchical Shotgun Hierarchical
CCAATAA CACCATT TATAAT AATTGGCA TTGAATA Structural Genomics:Genome assembly • Sequence fragments are stitched together through the overlapping sequences between fragments
Structural Genomics:Genome annotation • Happens before submission to database • Gene prediction: GenScan, FgenesH • Verify predictions • BLAST search against sequence database • Compare to experimentally determined cDNA and EST sequences: GeneWise, Spidey, SIM4, EST2Genome • Manual checking by human curators • • Functional assignment • BLAST Homology searching against protein database • Search protein motif and domain databases: Pfam and Interpro
Structural Genomics:Genome annotation http://hinvlite.sanbi.ac.za
Structural Genomics:Genome comparison • Comparison of • Gene number • Gene location • Gene content • Reveals extent of conservation between genomes • Reveals core set of genes crucial for survival; the “Minimal Genome”
Structural Genomics:Genome comparison http://www.sanger.ac.uk/Software/ACT/
Functional Genomics • Focus on gene function • On genome level, using • High throughput methods • Conducted using • Sequence-based • Microarray-based methods
Functional Genomics:Sequence-based • Expressed Sequence Tag (EST) • Provide rough estimate of actively expressed genes under specific physiological conditions • Serial Analysis of Gene Expression (SAGE) • Provides quantitative analysis of mRNA expression • Occurrence and quantity of a specific fragment indicates level of gene expression
Functional Genomics:ESTs • Selected mRNA sequences are reverse transcribed into cDNA clones • cDNA clones are then sequenced • Obtained from 5’ or 3’ end • Typically 500bp long http://www.ncbi.nlm.nih.gov/About/primer/est.html
Functional Genomics:ESTs • EST Limitations • Often low quality • Contamination (vector) • Chimera • Represent partial genes • Despite this ESTs are still widely used (www.ncbi.nlm.nih.gov/dbEST)
Functional Genomics • EST Gene index construction • Organise and consolidate ESTs s.t. data can be used to extract full-length cDNAs • Remove contaminants • Mask repeats • Cluster sequences • Within a cluster, assemble overlapping ESTs into contigs/ consensus sequences • Annotation: similar to process for genome • Examples: Unigene, StackPack, TGI
Functional Genomics:SAGE • Short DNA fragment (15-20bp) is cut from a cDNA and used as unique marker for that transcript • Fragments are concatenated, cloned and sequenced http://www.sagenet.org/protocol/MANUAL1e.pdf
Functional Genomics: Microarrays • Immobilised probes (oligonucleotides or cDNA) are ‘spotted’ on a chip • Probes are representative of a complete genome • Fluorescent cDNA from organism is allowed to hybridise with the probes • Intensity of fluorescence per spot reflect the amount of mRNA present
Proteomics:Technology • 2D-Page Gel: Separates proteins based on charge and mass • Melanie, CAROL, Comp2Dgel, SWISS-2DPAGE • Mass Spectrometry (MS): peptide is fragmented, aspirated and the mass-to-charge ratio is determined • Database searching: Using peptide fingerprint obtained from MS, a database can be searched • ExPASY: AAcompIdent, TagIdent, PeptIdent, CombSearch • ProFound, Mascot
Proteomics:Technology • Differential In-gel Electrophoresis (DIGE) • Proteins from experimental and control samples are labeled with different colored dyes • Differentially expressed proteins can be co- separated and visualised on the same gel
Proteomics:Technology • Protein Microarrays • Chip contains immobilised proteome • Used to study protein function • Assay • Protein-protein interaction • Protein-DNA/ RNA interactions • Protein-ligand interactions • Enzyme activity
Proteomics:Post-translational Modifications • For activity, many proteins have to be covalently modified before or after folding process • Proteolytic cleavage, formation of disulfide bonds, addition of phosphoryl, methyl, acetyl groups, etc. • Modifications impact protein function • Bioinformatics can predict sites for modification • AutoMotif, Cysteine, FindModand GlyMod(available from ExPASY), RESID
Proteomics:Protein Sorting • Sub-cellular localisation is integral to protein function • Many proteins are only active when after being transported to specific compartments • Identifying protein localisation is important in functional annotation • SignalP, TargetP, PSORT
Proteomics:Protein-protein Interactions • Experimental determination • Prediction based on • Domain fusion • Gene neighbours • Sequence homology • Phylogenetic information • Hybrid methods