1 / 23

Finding Sequence Motifs in Alu Transposons that Enhance the Expression of Nearby Genes

Kendra Baughman York Marahrens’ Lab UCLA. Finding Sequence Motifs in Alu Transposons that Enhance the Expression of Nearby Genes. Overview. Goal Background Prior Studies Strategy Results Remaining Tasks Future Directions. Goal.

hamilton
Download Presentation

Finding Sequence Motifs in Alu Transposons that Enhance the Expression of Nearby Genes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Kendra Baughman York Marahrens’ Lab UCLA Finding Sequence Motifs in AluTransposons that Enhance the Expression of Nearby Genes

  2. Overview Goal Background Prior Studies Strategy Results Remaining Tasks Future Directions

  3. Goal Determine if there are motifs present among Alu elements near highly expressed genes, and missing from Alu elements near poorly expressed genes, that might contribute to gene expression

  4. Background – Alu Elements Repetitive sequence Transposons (DNA sequences that make copies of themselves and insert elsewhere in the genome) Over 1 million in human genome ~50 subfamilies categorized by sequence differences

  5. Prior Studies “Repetitive sequence environment distinguishes housekeeping genes” Eller, Daniel et al. submitted “Alu abundance positively correlates with gene expression level” C.D. Eller et. al. submitted

  6. Higher Alu concentration near widely expressed genes

  7. Higher Alu concentration near highly expressed genes

  8. Alu Subfamilies # Alu in the Subfamily Subfamily

  9. Data Human gene expression levels from microarray data (Stan Nelson’s lab, UCLA) Alu information from UCSC Genome Browser, Repeat masker tracks

  10. Goal, reiterated Determine if there are motifs present among Alu elements near highly expressed genes, and missing from Alu elements near poorly expressed genes, that might contribute to gene expression

  11. Strategy Find Alu “near” high and low expression genes (within 20kb) Perform multiple sequence alignment on Alu sequences Identify motifs preferentially conserved around highly expressed genes (these motifs could help the genes be highly expressed)

  12. Strategy Find Alu “near” high and low expression genes (within 20kb) Perform multiple sequence alignment on Alu sequences Identify motifs preferentially conserved around highly expressed genes (these motifs could help the genes be highly expressed)

  13. Used Perl scripts to extract information from MySQL databases Grouped genes by expression level in R Chose genes in top and bottom 20% Expression Level Genes Screening the genes…

  14. Chrom1 1st 20mb Chrom10 Chrom19 1st 20mb 10kb 3% 6% 20% 20kb 7% 7% 28% 50kb 17% 11% 50% Screening the Alu… • Used MySQL queries to determine flanking region • Used Perl scripts to screen Alu located within 20kb of genes • Omitted Alu in overlapping flanking regions PERCENTAGES OF ALU THROWNOUT LO-gene HI-gene HI-Alu ??-Alu LO-Alu

  15. Strategy Find Alu “near” high and low expression genes (within 20kb) Perform multiple sequence alignment on Alu sequences Identify motifs preferentially conserved around highly expressed genes (these motifs could help the genes be highly expressed)

  16. Alignment Process… • First alignment tool: Clustalw • Slow, inaccurate • Second alignment tool: T-COFFEE • Can’t handle hundreds of sequences • Third alignment tool: MUSCLE • Aligning thousands of sequences = big gaps and processing limitations • Chose to analyze by subfamily (S, Sp/q) • Aligned elements around highly expressed genes • Aligned elements around poorly expressed genes • Profile high/low alignment • Consensus sequence alignment

  17. Alignment viewed in Jalview

  18. Alignments of Alu Sp/q and AluS Elements High Alu High conserv. Low conserv. AluSp-q EPS AluSp/q AluS

  19. Strategy Find Alu “near” high and low expression genes (within 20kb) Perform multiple sequence alignment on Alu sequences Identify motifs preferentially conserved around highly expressed genes (these motifs could help the genes be highly expressed)

  20. Alu w/ a base: *5547666896759699995769699999999999*9989979 Frequency of consensus base All Alu: 0444762289674300448576809499545545409449808 High Alu: TATCCACGCCTGCAAAATCTCAGCCACTCCCAAAGTTGCTGCG Alu consensus sequence Low Alu CANCC-CGCCT-CGTAATCCCAA--------AATGTT--TG-G All Alu: 76044 55899 37444989894 454045 98 8 Frequency of consensus base Alu w/ a base: 77488 66899 67444999995 455645 98 9 AluSp/q High Alu: TGCTCAGAAATTTCTCGGCTCACTGCAACCTCCGTATCACCCC Low Alu:CG---A-AA--------------------CTCCGT--T---CT Alu w/ a base: 596**65559458765699999978999999966566****** Alu w/ a base: 56 5 69 555655 6 99 Frequency of consensus base All Alu: 0860005458443600233333323333333345400000000 All Alu: 55 4 58 444544 0 77 Alu consensus sequence Frequency of consensus base AluS

  21. Remaining Tasks Analyze the remaining sub-families Determine whether identified motifs agree across subfamilies BLAST motifs against all Alu sequences and correlate alignment scores with expression level

  22. Future Directions Cluster alignments into a relationship tree to see if HI and LO Alu groups cluster differently from each other Create a matrix of pairwise alignments and cluster these into a tree using nearest neighbour clustering Use Hidden Markov Models or Gibbs sampling to identify sequence motifs (non-multiple sequence alignment method of motif finding)

  23. Acknowledgements Danny Eller York Marahrens Marc Suchard Chiara Sabatti SoCalBSI NIH/NSF

More Related