1 / 20

ProRepeat a comprehensive directory of exact tandem repeats in proteins

ProRepeat a comprehensive directory of exact tandem repeats in proteins. PolyQ and neurodegenerative diseases. 9 diseases causes by polyQ repeats HD DRPLA SCA 1,2,3,6,7,17 Kennedy’s disease (SBMA). T2. T1. T3. Region 1. Region 2. Region 3. Androgen receptor (AR).

joyce
Download Presentation

ProRepeat a comprehensive directory of exact tandem repeats in proteins

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ProRepeat a comprehensive directory of exact tandem repeats in proteins

  2. PolyQ and neurodegenerative diseases • 9 diseases causes by polyQ repeats • HD • DRPLA • SCA 1,2,3,6,7,17 • Kennedy’s disease (SBMA)

  3. T2 T1 T3 Region 1 Region 2 Region 3 Androgen receptor (AR) Transcription Factor HORMONE BINDING TRANSCRIPTIONAL REGULATION DNA BINDING -COOH NH3- polyQ tract length has important consequences ■ shorter tracts : prostate cancer susceptibility ■ longer tracts : feminization syndromes ■ over 40 residues : SBMA (spinal and bulbar muscular atrophy) or Kennedy’s disease 9-35 residues, average of 20-25 depending on ethnic origin

  4. Collection of polyQ repeats 792 human individuals available from earlier study (Edwards, 1992) 26 armadillo individuals sequenced by CP 77 mammals and marsupials from protein database PolyQ in AR Céline Poux, RU

  5. What about repeats in other proteins? • ProRepeat database • Data sources: UniProt and RefSeq • Limited to exact tandem repeats • Standard, linear-time suffix tree algorithm • Stored in Oracle 10g • Interface in PHP5 Maarten van den Bosch, WUR

  6. Simple query syntax: e.g. “Q” or “DE” DE is equivalent to ED; DEF is equivalent to EFD and FDE

  7. Or use ProSite syntax: e.g. “[DE]-{P}-X(0,1).”

  8. Taxonomic distributions of hits

  9. Identifier Repeat unit Repetitions Unit length Length Start location End location Protein Taxonomy Ontology Sorting/grouping options

  10. Link to DNA data • DNA coding sequences of available repeats also stored in the database • Extracted from EMBL and/or RefSeq Hong Luo, WUR

  11. Link to DNA data / errors • Approximately 3% of corresponding nucleotide sequences cannot be retrieved • Errors caused by • No links to nucleotide database (35%) • NO_ANNOTATED_CDS • No EMBL links • Annotation errors in the nucleotide database (65%)

  12. Guido Kappé, RU

  13. T S Q G P A E

  14. Current work • Annotation of repeats versus function • Adding imperfect tandem repeats - a.k.a. approximate tandem repeats (ATR) – to the database • Offering remote access via web services (WSDL and BioMoby) • Expansion of the analysis capabilities of the interface

  15. PolyQ in AR (reprise) • Impure tracts longer and more variable than pure CAG tracts (mainly CAA, CCG, and CGG) • Presence of other codons better explained by codon duplication than multiple point mutations • interrupting codons are part of elongation process, rather than hampering their dynamics as proposed previously • Negative correlation between lengths of the different CAG tracts • maximal expansion length that protein can handle without being deleterious Céline Poux, RU

  16. Acknowledgements • Wageningen University and Research Centre • Maarten van den Bosch • Hong Luo • Mark Kramer • Harm Nijveen • Radboud University, Nijmegen • Guido Kappé • Céline Poux • Wilfried W. de Jong This work was supported in part by project grants from NWO/BMI (GK, CP) and the NBIC/BioAssist program (HN)

  17. Thank you for your attention! See also our posters on phylogenetic domain visualisation (TreeDomViewer) and microarray (re)annotation at the ISMB Post-doc positions available: contact Jack.Leunissen@wur.nl or jack@bioinformatics.nl

More Related