210 likes | 326 Views
Lec 09. We have alredy discussed. What is MSA (Multiple Sequence Alignment) ? What is it good for? How do I use it? Software and algorithms The programs How they work? Which to use? In practice Get the sequences Reformat them Evaluate the alignment Realign or modify the alignment
E N D
Lec 09 We have alredy discussed • What is MSA (Multiple Sequence Alignment)? • What is it good for? • How do I use it? • Software and algorithms • The programs • How they work? • Which to use? • In practice • Get the sequences • Reformat them • Evaluate the alignment • Realign or modify the alignment • Add or subtract sequence
Lec 09 Central server Web-based Local computer Platform, software and algorithm selection • Platforms • Software's • The best • What’s available • The easiest to use • The best output • Algorithm • The most accurate • The best for your problem • What’s available • What you are familiar with
Lec 09 Main applications of MSA
Lec 09 Get the sequences:databases • GenBank: An annotated collection of all publicly available nucleotide and protein sequences. • RefSeq: NCBI non-redundant set of reference sequences, including genomic DNA, transcript (RNA), and protein products. • UniProt Consortium Database: Universal protein knowledgebase, a central resource of protein sequence and function from Swiss-Prot, TrEMBL and PIR. • Entrez Gene: Gene-centered information at NCBI. • UniGene: Unified clusters of ESTs and full-length mRNA sequences. • OMIM: Online Mendelian inheritance in man: a catalog of human genetic and genomic disorders. • Model Organism Genome Databases: MGD, RGD, SGD, Flybase… • GeneCards: Integrated database of human genes, maps, proteins and diseases. • SNP Consortium Database.
Lec 09 Get the sequences: Entrez Text Searches http://www.ncbi.nlm.nih.gov/sites/gquery
Lec 09 Entrez Gene http://www.ncbi.nlm.nih.gov/gene
Lec 09 UniProt Consortium Databases (http://www.uniprot.org) • Number of explicitly cross-referenced databases: 126
Lec 09 UniProt Text Search http://www.uniprot.org/
Lec 09 UniProt Sequence Report
Lec 09 PIR Text Search http://pir.georgetown.edu/pirwww/search/textsearch.shtml
Lec 09 OMIM: Online Mendelian Inheritance in Man http://www.ncbi.nlm.nih.gov/sites/entrez?db=omim&TabCmd=Limits
Lec 09 Protein Family Databases • Whole Proteins • PIRSF: A Network Classification System of Protein Families • COG (Clusters of Orthologous Groups) of Complete Genomes • ProtoNet: Automated Hierarchical Classification of Proteins • Protein Domains • Pfam: Alignments and HMM Models of Protein Domains • SMART: Protein Domain Families • CDD: Conserved Domain Database • Protein Motifs • PROSITE: Protein Patterns and Profiles • BLOCKS: Protein Sequence Motifs and Alignments • PRINTS: Protein Sequence Motifs and Signatures • Integrated Family Databases • iProClass: Superfamilies/Families, Domains, Motifs, Rich Links • InterPro: Integrate Pfam, PRINTS, PROSITES, ProDom, SMART, PIRSF, SuperFamily
Lec 09 • PIRSF: A Network Classification System of Protein Families http://pir.georgetown.edu/pirwww/dbinfo/pirsf.shtml
Lec 09 COG: Clusters of Orthologous Groups of proteins http://www.ncbi.nlm.nih.gov/COG/
Lec 09 Domain Classification http://pir.georgetown.edu/pirwww/dbinfo/iproclass.shtml
Lec 09 Domain Classification InterPro Gene3D
Lec 09 Protein Motifs
Lec 09 Databases of Protein Functions • Metabolic Pathways, Enzymes, and Compounds • Enzyme Classification: Classification and Nomenclature of Enzyme-Catalysed Reactions (EC-IUBMB) • KEGG (Kyoto Encyclopedia of Genes and Genomes): Metabolic Pathways • LIGAND (at KEGG): Chemical Compounds, Reactions and Enzymes • EcoCyc: Encyclopedia of E. coli Genes and Metabolism • MetaCyc: Metabolic Encyclopedia (Metabolic Pathways) • BRENDA: Enzyme Database • UM-BBD: Microbial Biocatalytic Reactions and Biodegradation Pathways • Cellular Regulation and Gene Networks • EpoDB: Genes Expressed during Human Erythropoiesis • BIND: Descriptions of interactions, molecular complexes and pathways • DIP: Catalogs experimentally determined interactions between proteins • BioCarta: Biological pathways of human and mouse • GO: Gene Ontology Consortium Database
Lec 09 KEGG Metabolic & Regulatory Pathways • KEGG is a suite of databases and associated software, integrating our current knowledge • on molecular interaction networks, the information of genes and proteins, and of chemical • compounds and reactions. http://www.genome.jp/kegg/pathway.html http://www.genome.jp/kegg/pathway.html#metabolism
Lec 09 Multiple Genome Alignment MGA Michael Höhl, Stefan Kurtz ,Enno Ohlebusch Efficient Multiple Genome Alignment Bioinformatics , Vol. 18 (S1): S312-S320, 2002 http://bibiserv.techfak.uni-bielefeld.de/mga/ref.html PipMaker and MultiPipMakerSchwartz S, Elnitski L, Li M, et al. MultiPipMaker and supporting tools: alignments and analysis of multiple genomic DNA sequences NUCLEIC ACIDS RES 31 (13): 3518-3524 JUL 1 2003 http://bio.cse.psu.edu/pipmaker/ MAVIDBray N and Pachter L ,MAVID multiple alignment server , Nucleic Acids Research 2003 31: 3525-3526 http://baboon.math.berkeley.edu/mavid/http://www-gsd.lbl.gov/vista/ MultiPipMaker - output
Lec 09 Multiple Genome Alignment Genomic Targets for Comparative Sequencing http://genome.ucsc.edu/