200 likes | 368 Views
ProRepeat a comprehensive directory of exact tandem repeats in proteins. PolyQ and neurodegenerative diseases. 9 diseases causes by polyQ repeats HD DRPLA SCA 1,2,3,6,7,17 Kennedy’s disease (SBMA). T2. T1. T3. Region 1. Region 2. Region 3. Androgen receptor (AR).
E N D
ProRepeat a comprehensive directory of exact tandem repeats in proteins
PolyQ and neurodegenerative diseases • 9 diseases causes by polyQ repeats • HD • DRPLA • SCA 1,2,3,6,7,17 • Kennedy’s disease (SBMA)
T2 T1 T3 Region 1 Region 2 Region 3 Androgen receptor (AR) Transcription Factor HORMONE BINDING TRANSCRIPTIONAL REGULATION DNA BINDING -COOH NH3- polyQ tract length has important consequences ■ shorter tracts : prostate cancer susceptibility ■ longer tracts : feminization syndromes ■ over 40 residues : SBMA (spinal and bulbar muscular atrophy) or Kennedy’s disease 9-35 residues, average of 20-25 depending on ethnic origin
Collection of polyQ repeats 792 human individuals available from earlier study (Edwards, 1992) 26 armadillo individuals sequenced by CP 77 mammals and marsupials from protein database PolyQ in AR Céline Poux, RU
What about repeats in other proteins? • ProRepeat database • Data sources: UniProt and RefSeq • Limited to exact tandem repeats • Standard, linear-time suffix tree algorithm • Stored in Oracle 10g • Interface in PHP5 Maarten van den Bosch, WUR
Simple query syntax: e.g. “Q” or “DE” DE is equivalent to ED; DEF is equivalent to EFD and FDE
Or use ProSite syntax: e.g. “[DE]-{P}-X(0,1).”
Identifier Repeat unit Repetitions Unit length Length Start location End location Protein Taxonomy Ontology Sorting/grouping options
Link to DNA data • DNA coding sequences of available repeats also stored in the database • Extracted from EMBL and/or RefSeq Hong Luo, WUR
Link to DNA data / errors • Approximately 3% of corresponding nucleotide sequences cannot be retrieved • Errors caused by • No links to nucleotide database (35%) • NO_ANNOTATED_CDS • No EMBL links • Annotation errors in the nucleotide database (65%)
T S Q G P A E
Current work • Annotation of repeats versus function • Adding imperfect tandem repeats - a.k.a. approximate tandem repeats (ATR) – to the database • Offering remote access via web services (WSDL and BioMoby) • Expansion of the analysis capabilities of the interface
PolyQ in AR (reprise) • Impure tracts longer and more variable than pure CAG tracts (mainly CAA, CCG, and CGG) • Presence of other codons better explained by codon duplication than multiple point mutations • interrupting codons are part of elongation process, rather than hampering their dynamics as proposed previously • Negative correlation between lengths of the different CAG tracts • maximal expansion length that protein can handle without being deleterious Céline Poux, RU
Acknowledgements • Wageningen University and Research Centre • Maarten van den Bosch • Hong Luo • Mark Kramer • Harm Nijveen • Radboud University, Nijmegen • Guido Kappé • Céline Poux • Wilfried W. de Jong This work was supported in part by project grants from NWO/BMI (GK, CP) and the NBIC/BioAssist program (HN)
Thank you for your attention! See also our posters on phylogenetic domain visualisation (TreeDomViewer) and microarray (re)annotation at the ISMB Post-doc positions available: contact Jack.Leunissen@wur.nl or jack@bioinformatics.nl