280 likes | 441 Views
Recerca de selenoprote ïnes en el genoma d’organimes eucariotes. Marco Mariotti PhD student. Bioinform àtica , UPF. . Bioinformatics and genomics program Roderic Guigó's group Centre for Genomic Regulacion , Barcelona. marco.mariotti@crg.es. Genes in human chromosome 6.
E N D
Recerca de selenoproteïnes en el genomad’organimeseucariotes Marco Mariotti PhD student Bioinformàtica, UPF. Bioinformatics and genomics program RodericGuigó's group Centre for Genomic Regulacion, Barcelona marco.mariotti@crg.es
Genes in human chromosome 6 Dark Blue: coding The only selenoprotein in chromosome 6 GPx5 (cysteine homologue) GPx6 (selenoprotein)
Selenoprotein families include selenoproteins and cysteinehomologues (= orthologues or paralogues)
Selenoproteins are generally misannotated Percentages are computed by comparison Selenoprofiles-Ensembl annotations – see Mariotti and Guigó, 2010 - Bioinformatics.
Bioinformatics methods for selenoproteins • De novo: Selenogeneid (Castellano et al. 2001) • Homology based approaches: UGA / Sec or UGA / Cys alignments (e.g. Kryukov et al. 2003) - Selenoprofiles (Mariotti and Guigó 2010) - Seblastian (Mariotti et al. 2013) • SECIS prediction: - SECISearch (Kryukov et al. 2003) - SECISearch3 (Mariotti et al. 2013) – explained later
Why selenocysteine? • What is cysteine used for? mainly for: Disulfide bonds Inter or intramolecular, important for the folding of proteins, their stability, and in many cases necessary for catalysis. Thioredoxinsare a large class of proteins that perform redox reactions using thiol/disulfide switches (catalytic cysteines).
Why selenocysteine? The term “Thioredoxin” generally designates small oxidoreductase proteins, constituing a pool of enzymes for anti-oxidant defense. The thioredoxin system includes these proteins and those operating on them, both using them as electron donors, or maintaining them reduced. Plenty of other proteins possess a thioredoxin-like fold, relying on the same local structure that includes a thiol/disulfide switch, and performing redox functions. Selenocysteineis found almost always replacing a catalytic cysteine in redox proteins. Many selenoproteins possess a thioredoxin-like fold. Some are involved in the thioredoxin system. Consistently, selenocysteine is more reactive than cysteine, and it is a better reductant. Anyway, some selenoproteins possess totally unrelated functions (e.g. SelJ = eye crystallin, SelP = selenium storage and transport)
Examples: ThioredoxinReductases (TR) S SH +NADPH+H+ TR Trx Trx +NADP- S SH 3 paralogous genes are present for this family in human, all of which are selenoproteins. Sec is present as penultimate residue, and it is catalytic. TR are the only responsible for the reduction of thioredoxins in cell. • TR3: mitochondrial • Found in all vertebrates • TGR (TR2): can reduce glutathione disulfide. Contains a glutaredoxin (Grx) domain • Found only in tetrapodes • TR1: cytosolic. • Found in all vertebrates
Glutathione Peroxidases (GPx) R-OOH + 2 GSH R-OH + H2O + GSSG GPx 3 groups: GPx1/GPx2, GPx3/GPx5/GPx6, GPx4/GPx7/GPx8 Reduces superoxides at expenses of glutathione (GSH). 8 paralogues in human, 5 with Sec • GPx1: cytosolic, abundant in liver and red blood cells • GPx2: cytosolic, found in liver and gastrointestinal system • GPx3: secreted, found in plasma and intestine • GPx4: cytolic / mytochondrial, abundant in testis. • Can reduce phospholipidhydroperoxides • GPx5: secreted/membrane bound. • Found only in epididymis • GPx6: secreted (?). • Found in olfactory epithelium and embryonic tissues • GPx7: secreted (?) • GPx8: membrane bound (?) GPx6 converted to Cys independently in 3 species
Selenoproteins in mammals / vertebrates 20 duplications 9 gene losses 13 Sec Cys 28
Among vertebrates, selenoproteins are quite conserved: most of them are found in all species, few transformations occurred. This is very different than the situation in insects: (Chapple and Guigó, 2008)
Selenoproteins as test case • Selenoproteins have the peculiar characteristic of possessing a UGA codon, recoded because of the presence of the SECIS element. • If you learn how to predict selenoproteins, you are able to do the same with any “standard” protein family. • Bioinformatics project: find all selenoproteins in a given genome
UPF Biologia. Curs 2007-14 2007/08 – 2008/09: find all selenoproteins in a given protist genome 2009/10 – 2011/12: find a given selenoprotein family in all protist genomes 2012/13 – 2013/14: find all selenoproteins in a given vertebrate genome
UPF Biologia. Curs 2013-14 selenoproteinsin vertebrates Objective: find all selenoproteins in a given vertebrate genome http://bioinformatics.upf.edu/
Useful resources Sequences of your assigned species: • Ensembl Collection of genomes (and annotations) Your assigned genomes will be available in the UPF computers when you will start the project
Useful resources Sequences of your assigned species: • NCBI nucleotide Collection of all sequences (genomes, ESTs, etc) If you do not find something that you expect to be there, look for other sources of sequences. Most genomes today are low/medium-quality
Useful resources Selenoprotein sequences: • SelenoDB 2.0 Database containing manual annotations for human, and automated annotations (selenoprofiles) for other species. • NCBI protein NCBI hosts the sets of all known proteins. Noisy, but comprehensive. You can find here more selenoprotein sequences searching by homology (blast) or by keywords
Useful resources Tools: • Blast - typically tblastn • Exonerate - protein2genome mode • Genewise • Webserver with SECISearch3 and Seblastian: http://seblastian.crg.es/ S14. Anotació de genomes (I) S15. Anotació de genomes (II)
SECISearch 3 Based on a manually curated 2ndary structure alignment Combines up to 3 methods to ensure maximum sensitivity Filter and grading procedure based on manual inspection of hundreds of SECIS elements Mariotti et al. 2013
Seblastian Assumptions: the presence of a detectable SECIS within acceptable genomic distance from the Sec-UGA annotated homologue(s) (Sec/Cys) in the reference protein database Mariotti et al. 2013
Notes for the project • Results must be presented in a web page with the structure of a scientific paper • A file with the aminoacid sequences of all selenoproteins identified must be provided; plus another file with all SECISes identified • All genes should be as complete as possible: starting with a AUG, ending with a stop codon, and with an identified SECIS element downstream • Ignore alternative isoforms (if any), just choose one
Notes for the project EXTRA: • Report also the genes of selenoprotein machinery: SecS, eEFsec, pstk, secp43, SBP2, SPS1, (SPS2). Ignore tRNAsec, for technical reasons
Common pitfalls • Zero, one or many genes? Careful with superfamilies, and gene duplications • Know what to expect • Genome assemblies are not perfect!
Evaluation The projects will be evaluated based on: - results you are expected to find all selenoproteins in your assembly - discussion interpret your results logically - methods scripting is encouraged (but not compulsory) - presentation the web page should present the work as clearly as possible