10 likes | 146 Views
Comparative genomics. Visualization of a multiple genome alignment using VISTA. This plot shows the conservation between human and chimpanzee, cow, mouse, and fugu around the first intron of the cMet gene.
E N D
Comparative genomics Visualization of a multiple genome alignment using VISTA. This plot shows the conservation between human and chimpanzee, cow, mouse, and fugu around the first intron of the cMet gene. Brudno M, Do CB, Cooper GM, Kim MF, Davydov E, Green ED, Sidow A, Batzoglou S (2003) NISC Comparative Sequencing Program. LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA. Genome Res. 13:721-31. Phylogenetic footprinting Transcription factor binding site Module Human gene Primates Mouse / rat Other mammals Brachydanio rerio Drosophila…Yeast Identification of well conserved regions in orthogolous sequences Potential transcription factor binding sites Sites or modules present in group of co-regulated genes exon 2 exon 3 exon 4 exon 5 exon 6 exon 7 IL1A mutation R>Q (SeattleSNPs) «damaging » (PPHpredict) mutation A>S (SeattleSNPs) «benign » (PPHpredict) IL1B myristoylation phosphorylation RNA interaction IL1Fx cleavage NLS IL1RN Interleukin-1 Interleukin-1 propeptide VDR PXR AR THR MAO: Multiple Alignment Ontology http://www-igbmc.u-strasbg.fr/BioInfo/MAO/mao.html Ontologies have become important in bioinformatics as they provide a structured representation of the knowledge available in a particular domain [1]. MAO is a task-oriented ontology for data retrieval and exchange in the fields of DNA/RNA alignment, protein sequence and protein structure alignment. The purpose of the MAO ontology is to standardise descriptions of alignments and the associated structural and functional information in order to allow the different alignment construction and analysis applications to communicate with each other. Most of the features associated with multiple alignments are defined as MAO concepts, ranging from a single residue to sub-families of sequences and/or 3D structures. Attributes are assigned to the concepts where appropriate, in order to permit the integration of more complex information such as residue function or activity, sequence feature conservation or 3D structural location. An important criterium in the design of MAO was the possibility to link with other biological ontologies, in particular those registered at the OBO (Open Biomedical Ontologies) web site. By providing an integrated environment for all the data available for a protein family, MAO facilitates knowledge extraction which can then be presented in a user-friendly format [2]. MAO can thus serve as a basis for a repository of annotated protein families, that will help in sharing information in the community. JD. Thompson, D. Moras, O. Poch, IGBMC, Illkirch, France. (julie,moras,poch)@igbmc.u-strasbg.fr SR. Holbrook, LBNL, Berkeley CA, USA. SRHolbrook@lbl.gov K. Katoh, Bioinformatics Center, ICR, Kyoto, Japan. kkatoh@kuicr.kyoto-u.ac.jp P. Koehl, UC Davis, Davis, CA, USA. koehl@cs.ucdavis.edu E. Westhof, IBMC, Strasbourg, France. E.Westhof@ibmc.u-strasbg.fr Phylogenetic studies Protein family evolution Organism classification, the ‘tree of life’ Iwabe N, Hara Y, Kumazawa Y, Shibamoto K, Saito Y, Miyata T, Katoh K (2004) Sister group relationship of turtles to the bird-crocodilian clade revealed by nuclear DNA-coded proteins. Mol Biol Evol. Dec 29. [1] Bard, J.B. and Rhee, S.Y. 2004. Ontologies in biology: design, applications and future challenges. Nat Rev Genet. 3:213-222. [2] Lecompte, O., Thompson ,J.D., Plewniak, F., Thierry, J. and Poch O. 2001 Multiple alignment of complete sequences (MACS) in the post-genomic era. Gene 270:17-30. Protein: hierarchical function annotation Sequence homolog detection Vitamin D3 receptor VDR a. AB Hinge DBD LBD b. GM C..C C..C E E H R c. Suga H, Katoh K, Miyata (2001) Sponge homologs of vertebrate protein tyrosine kinases and frequent domain shufflings in the early evolution of animals before the parazoan-eumetazoan split. Gene. 280:195-201. Lecompte O, Ripp R, Thierry JC, Moras D, Poch (2002) Comparative analysis of ribosomal proteins in complete genomes: an example of reductive evolution at the domain scale. Nucleic Acids Res. 24:5382-90. • Full length protein annotation • Domain organisation • Conserved motifs/residues Multiple alignments improve protein homolog searches e.g. PSI-Blast, ProfileSearch, HMMER, SAM… Multiple Alignment Ontology Domain organisation Conserved motifs, residues Sequence analysis revealed two sets of differentially conserved residues, partitioning the Nuclear Receptor superfamily into two classes. multiple_sequence_alignment part_of Class I : homodimers sub_alignment part_of part_of alignment_sequence alignment_column Class II : heterodimers part_of is_attribute part_of is_attribute is_attribute is_attribute is_attribute is_attribute residue accession Taxid (NCBI) sequence_feature biological_process (GO) cellular_component (GO) molecular_function (GO) pdb_name ndb_name Multiple alignments are used to identify protein domains, either ab initio or by searching databases of known domains, e.g. Pfam, Interpro, CDD... column_conservation part_of Brelivet Y, Kammerer S, Rochel N, Poch O, Moras D. (2004) Signature of the oligomeric behaviour of nuclear receptors at the sequence and structural level. EMBO Rep. 5:423-9. sequence_feature_type is_a is_a Gene: identification, validation is_a domain Sequence error detection Prediction of functional sites is_a is_a amino_acid is_attribute MAO concept Interpro SCOR nucleotide part_of Protein: structure comparison, modelling is_attribute OBO Ontology atom residue_function structural_location External database The 3D structure of a family of fungal lectins shows unexpected, significant structure similarity with actinoporins, a family of pore-forming toxins. Structure and sequence signatures suggest a potential sugar binding site in the lectin XCL and a possible evolutionary relationship. One member of this family, from X. chrysenteron, induces drastic changes in the actin cytoskeleton after sugar binding at the cell surface and has potent insecticidal activity. is_attribute is_attribute is_attribute is_attribute 3d_atomic_point protein_interactions (PSI) RESID database catalytic_site (CSA) MAO Knowledge Base for System Biology An example of integrated analysis : the interleukin-1 protein family. Interleukin-1 (IL-1 ) is a proinflammatory cytokine produced by activated macrophages and monocytes. It functions in the generation of systemic and local responses to infection, injury, and immunological challenges and is the primary cause of chronic and acute inflammation (Dinarello, 1998). The IL-1 gene cluster on human chromosome 2q contains several related genes including the genes encoding IL-1A , IL-1B and their endogenous receptor antagonist IL-1RN. IL1A and IL1B are synthesised as larger precursors. The N terminal approx. 115 amino acids form a propeptide that is cleaved off to release the active interleukin-1. Both IL1A and IL1B bind to the same IL1-specific receptor on the target cell. Potential sequence errors are detected by multiple alignment analysis and verified where possible by comparison to EST sequences. Phylogenetic footprinting is a technique that identifies regulatory elements by finding well conserved regions in a set of orthologous non-coding DNA sequences from multiple species. Bianchetti L, Thompson JD, Lecompte O, Plewniak F & Poch O. (2005) vALId: Validation of protein sequence quality based on multiple ALIgnment data. JBCB. Ensembl:il1a_human PDB:1itb_a propeptide cleavage Sequence alignment of XCL with the 2 actinoporins of known structure. Secondary structure is indicated by coils (-helix) and arrows (-strand). Structural comparison of XCL (cyan) and sticholysin II (StnII) from Stichodactyla helianthus (purple). StnII shares high structural similarity with EqtII. Birck C, Damian L, Marty-Detraves C, Lougarre A, Schulze-Briese C, Koehl P, Fournier D, Paquereau L, Samama JP. (2004) A new lectin family with structure similarity to actinoporins revealed by the crystal structure of Xerocomus chrysenteron lectin XCL. J Mol Biol. 5:1409-20. RNA: sequence, structure, function The secondary structure of bacterial RNase P RNA, a ribozyme responsible for the maturation of the 5′ end of tRNAs, based on sequence comparison analysis. RNase P RNA secondary structures fall into two types, A and B, which share a common core, but differ in their peripheral elements. A. B. Protein interaction networks Interpro It was shown, by two-hybrid analysis, that hTAFII20 heterodimerizes with hTAFII135. The interaction requires a domain of hTAFII135 which shows sequence homology to H2A and to the yeast SAGA component ADA1. These results are indicative of a histone fold type of interaction between hTAFII20-hTAFII135 and yTAFII68-yADA1, which therefore constitute novel histone-like pairs in the TFIID and SAGA complexes. IIL1A propeptide processing and comparison with IL1B Simplified 2D representations of (A) a typical type A sequence and of (B) a type B sequence with their characteristic tertiary interactions. Part of a multiple alignment of 18 type B bacterial sequences. TFIID SCOR: 1nbs:b:139-142,b:166-170 AAAA,GAGUA Loop with dinucleotide platform SCOR: 1nbs:b:147-150,b:159-161 UACG,UAU Loop with dinucleotide platform IL1A reduces tumorigenicity* ySAGA IL1B promotes invasiveness* Sequence comparison of the putative histone fold regions of hTAFII135, hTAFII105, dTAFII110, yADA1, and a putative S. pombe ADA1 SCOR: hierarchical structural classification of RNA. Klosterman PS, Hendrix DK, Tamura M, Holbrook SR, Brenner SE. (2004) Three-dimensional motifs from the SCOR, structural classification of RNA database: extruded strands, base triples, tetraloops and U-turns. Nucleic Acids Res. 8:2342-52. Summary of the structures and some of the molecular interactions within the TFIID and SAGA complexes. CPK model of the B. subtilis RNase P RNA-tRNA complex *Differential effects of IL-1 on tumor development (Song et al, 2003) Massire C, Jaeger L, Westhof E. (1998) Derivation of the three-dimensional architecture of bacterial ribonuclease P RNAs from comparative sequence analysis. J Mol Biol. 4:773-93. Gangloff YG, Werten S, Romier C, Carre L, Poch O, Moras D, Davidson I. (2000) The human TFIID components TAF(II)135 and TAF(II)20 and the yeast SAGA components ADA1 and TAF(II)68 heterodimerize to form histone-like pairs. Mol Cell Biol. 20:340-51. The IL1A propeptide produces apoptosis in malignant but not normal cell lines. It is subject to post-transcriptional modifications. It also includes a nuclear localisation signal (NLS) that is functional after cleavage. Within the nucleus, the IL1A propeptide may interact with elements of RNA processing affecting alternate splicing of genes involved in the regulation of apoptosis. (Pollock et al, 2003). Human genetics, SNPs Therapeutics, drug discovery Nicotine is the major addictive substance in cigarettes, and genes involved in sensing nicotine are logical candidates for vulnerability to nicotine addiction. Feng et al. studied 6 single-nucleotide polymorphisms (SNPs) in the CHRNA4 gene and 4 SNPs in the CHRNB2 gene in relation to nicotine dependence in a collection of 901 subjects (815 sibs and 86 parents) from 222 nuclear families with multiple nicotine-addicted sibs. They found 2 SNPs in exon 5 of the CHRNA4 gene to be significantly associated with a protective effect against nicotine addiction. (OMIM:118504) Vitamin D is critically important for the development, growth, and maintenance of a healthy skeleton from birth until death. Mutations in the vitamin D receptor (VDR) gene lead to the early onset of severe rickets. DBD Insertion Domain The six SNPs located on the CHRNA4 gene, showing their locations on the chromosome 20 genomic contig. SNP1 and SNP2 in exon 5 are synonymous SNPs revealed by direct DNA sequencing. Thepairwise |D | values forthe six CHRNA4 SNPswere found to bequite high, whichindicates that these sixSNPs can be consideredto be one haplotypeblock LBD A multiple alignment of nuclear receptors highlights the conserved N-terminal DBD domain and a dispensable insertion domain in the N-terminal part of the LBD. Mutations (black arrows) in the VDR DBD prevent the receptor from activating gene transcription although 1,25-(OH)2D3 binding is normal. However, missense mutations in the LBD cause reduced or complete hormone insensitivity. As a result, vitamin D analogs have been developed that have the potential to interact with the receptor at amino acid contact points that differ from those utilized by the natural ligand and this alternative interaction can restore the function of mutant VDRs. Rochel N, Wurtz JM, Mitschler A, Klaholz B, Moras D. (2000) The crystal structure of the nuclear receptor for vitamin D bound to its natural ligand. Mol Cell. 5:173-9. Peter J. Malloy, J. Wesley Pike and David Feldman (1999) The Vitamin D Receptor and the Syndrome of Hereditary 1,25-Dihydroxyvitamin D-Resistant Rickets (1999). Endocrine Reviews 20 (2):156-188 Gardezi SA, Nguyen C, Malloy PJ, Posner GH, Feldman D, Peleg S. (2001) A rationale for treatment of hereditary vitamin D-resistant rickets with analogs of 1 alpha,25-dihydroxyvitamin D(3). J Biol Chem. 31:29148-56. Cross-species comparison of the DNA and protein sequences of a CHRNA4 coding segment (66 bp; 22 amino acids) in the cytoplasmic loop between transmembrane domains 3 and 4 Feng Y, Niu T, Xing H, Xu X, Chen C, Peng S, Wang L, Laird N, Xu X. (2004) A common haplotype of the nicotine acetylcholine receptor alpha 4 subunit gene is associated with vulnerability to nicotine addiction in men. Am J Hum Genet. 1:112-21.