1 / 21

Using a Beagle to sniff for Bacterial Promoters

Using a Beagle to sniff for Bacterial Promoters. Stefan R. Maetschke, Michael Towsey and James M. Hogan Queensland University of Technology. An Agenda. Bacterial Promoters The domain and the motifs Earlier approaches, including ours Why dumber is better

adler
Download Presentation

Using a Beagle to sniff for Bacterial Promoters

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M. Hogan Queensland University of Technology

  2. An Agenda • Bacterial Promoters • The domain and the motifs • Earlier approaches, including ours • Why dumber is better • Not quite, but flexibility before sophistication • Exploiting new features as they are identified • Results

  3. RNA polymerase s transcription gene promoter GSS TSS Upstream from a Bacterial Gene • Search for ‘conserved’ -10 and -35 hexamers • Except they’re not really conserved • Plagued by massive false positive rates • But this is the Reader’s Digest version

  4. Previous Work s70 • Mainly in the E. coli system • PWMs – simple, but poor discrimination • Good performance if compound structure used • (Collado-Vides et. al.: State of the art pre 2006) • HMMs – less successful than in eukaryotes • TDNNs – boosted by GSS offset distribution • SVMs – spectrum kernel ensemble • (Gordon et. al. (us): state of the art, but at a price)

  5. Beagle • Principled and rapid inclusion of motifs as they are discovered or hypothesised • Prior to the Gordon et. al. paper, a TP:FP ratio of 1:300 was considered good. • But this was based solely on -10 and -35 motifs • A model description language and parser • Less sophisticated than it sounds, but sufficient • Iterative refinement of the model

  6. Upstream from a Bacterial Gene Core Enzyme: aabb’w w Specific sigma controls binding at -10, -35 elements But binding probability varies enormously Compensate when hexamers are weak a b b’ a s s1 s4 s2 s3 ATG TTGACA TATAAT -35 element -10 element TSS GSS “It has long been known that domains 2 and 4 … bind to the strongly conserved -10 and -35 boxes”. Except when they don’t because they aren’t…

  7. Upstream from a Bacterial Gene Simple Extended -10: TG Discovered in B. Subtilis, found in 20% of promoters in E. Coli -16 hypothesised to be important in E. Coli, TRTG or T(AG)TG consensus s70 w a b b’ a s s1 s4 s2 s3 TRTG ATG TTGACA TATAAT -35 element Extended -10 element TSS GSS But even the alpha units aren’t what they seem…

  8. Upstream from a Bacterial Gene aCTDs are carboxy terminal domains, binding to UP elements AT-rich region, proximal element more important w a aNTD2 b b’ a aNTD1 s s1 s4 aCTD1 aCTD2 s2 s3 AAAAAARNR TRTG AWWWWWTTTTT ATG TTGACA TGTATAAT distal UP element proximal UP element -35 element -16 Extended -10 element TSS GSS

  9. The Data • E. Coli and B. Subtilis • Confirmed TSS locations within 250bp of the nearest gene start • No overlapping reading frames • N=492 (E. Coli), 205 (B. Subtilis) • 250 bp USRs available

  10. Beagle algorithm • Define a consensus promoter • e.g. <TTGACA (15, 21) TATAAT (4, 13) TSS> • Ordered pairs specify gap ranges • Parse the description and define PWMs and weighted gaps • Initially trivial • Refine using the confirmed TSS locations

  11. Beagle algorithm • For each USR in the training set: • Anchor the pattern to the known TSS location • Determine the best match based on the current model • Find the MLE of the model parameters based on the best matches from the training data. • Test the refined definition on unseen data • 10 repeats x 10 fold cross validation • Essentially TSS prediction • Iterate until improvement ceases.

  12. TSS recognition (% accuracy) Guess which promoter boxes are more strongly conserved…

  13. Including UP elements • NNW15NN • AT rich region • NNAAAWWTWTTNNAAANNN • Estrem et al 1998 • NNAAAWWTWTTN – A6RNR • Gourse et al 2000 • distal - proximal motif

  14. TSS recognition (% accuracy)

  15. Comparing E. coli and B. subtilis promoters B. subtilis -35 element B. subtilis -10 element E. coli -35 element E. coli -10 element E. Coli has 7 known sigmas; B. Subtilis 18…

  16. Motifs ‘in the Gap’ • Extended -10 element • Consensus TGTATAAT • Strongly implicated in Subtilis • Hypothesised as significant in 20% E Coli • Extended -16 element • Consensus TRTG s70

  17. TSS recognition (% accuracy)

  18. The Complete Picture aNTD b/b’ aCTD I aCTD II aCTD II aCTD II s70 -35 -10 -62 -72 -52 -40.5 UP element AT rich Variable location

  19. TSS recognition (% accuracy)

  20. TSS recognition (% accuracy) E. coli 43.3% 48.3% B. subtilis 61.2% 71.2% +AT rich 64.8% 62.6% +TRTG +AT rich 47.3% 41.6% +TG

  21. Conclusions • Beagle provides a simple bridge between experiment and computational discovery • Is the extended -16 motif really important in E. Coli? • (Well, not in any general sense) • Fast, robust and flexible • Extensions • Combination of model organisms • Comparative genomics & regulation

More Related