1 / 54

Comparative Sequence Analysis

Comparative Sequence Analysis. www.dcode.org. Ivan Ovcharenko Lawrence Livermore National Laboratory. BioQUEST Workshop, Beloit, June 2004. Comparative genomics Evolution of noncoding elements Aligning vertebrate genomes Function of the human gene deserts

hera
Download Presentation

Comparative Sequence Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Comparative Sequence Analysis www.dcode.org Ivan Ovcharenko Lawrence Livermore National Laboratory BioQUEST Workshop, Beloit, June 2004

  2. Comparative genomics • Evolution of noncoding elements • Aligning vertebrate genomes • Function of the human gene deserts • Redefining comparative sequence analysis • Phylogenetic shadowing • Transcriptional gene regulation

  3. The Genome Sequence: The Ultimate Code of Life ~ 50% is junk(repetitive elements) only 3% is coding for proteins the function of the rest ~47% (noncoding, nonrepetitive DNA) is unknown >hg16_dna range=chr11:31781924-31785923 TCAGGAACTTTGAAATGTTTTAAAACCCCAACTTTCTCCCCCATTTAAAC AGGCGGATTCATCGGCACTGGCCACCATATGGGCCCTTGGAGATCTATTG AGATGACCACCAACACTTGAATAGCGAGGGGCTGCTTTTCAGCGCTGCAC AATGCCCCGCGAGTAAGGGAAACTATTAAACTCCTGGGGCAGGAGCGTTG GCAAACTTTCGTGGGCAGAATTTTGAGGCTACAATGAGCGCGGACAACAA AAGGATTCTCTTGAGGCGTGCAGCGGGCCACATTGTGTTACAAGAAGCCC AGTCAACAGACTTTTCAGTGAAGTGTGTTAACCCCTCTGCTCTGCTATCA TTAATCACTGTCCGAAGAGCGGGCGCCTCCGTGCTATTTAGGGCGCTTGG CTGGGGGGATGGAGGGTGGATGGGGGGGCCAGGGCCCAGCATGGGGGGAG GCAGGGAGAGTGGACGGGGACCAGGGCTGGGTTCCTACATAGAGGAGATG GAGGGGAGGCAGGATGGAAACCAGCGGTGGGGGTGGAAGCAAGGGGGAAG GATTGGGGGGCCTGGGTTAGGGGAAAGACAGAGGGCGATGGAGGGAAAAA GAGGGCGATGGAGGGGAAAAGAAGGCTCAAAAAACATAGAGGCTAGAAAG GTATTTTTAAAAAAGGACAGAAAAGAATGCTGAGAGGAAAAAGAGACACG AGGGCCGAACAAGAGTGGGAGAGAGAGGAAAAGGAGGATGAGGGCCAGAG AATATTAGTAACTGAGCCCCATCTGGACTCTGGGTCTTTGCACTCCATCA GAAAGGTGGGGGTCGAGGAGGGCTACTTAGCTGAGGGAGACGCGCTCCGC TCACGTGTGCGGGCACAAGCGTCTGTGCTAATTTACTGCCCCAAGTTTCC GGGGACTTTTCAAAGCGTTTTTCAAGGGAAGAAATGAAGCGACCACCCCC ACCCCTCGCTTTATTTTCGGGTTTGGTGAAGAAGGAAGACTGGAAATAGC TCCTTTTGGCCAACTAGAAAGGCCGGAGGGTTATTGCTTTTGGAAAACAG ACAAAAATCTGTGCACATCTGGTATGGGGTGGGGGACACTGAGGAGAACA CAATGCCCATCTCCCCATGGCCACTCATGCCCATGCCTTCCTAGGGGCCC CATCTCGGTCCCTTTTCTGGCACATTCGATCTCGCCAATTAAACAAAGTT GCCCGAATCTGCCTCCGAAGAACCCCGCCGATAGCATGCTCTGCTCTCAT TTGCCTCTTTGACATTTTCTTAATTTTAAAACATGGAGATTCACATTCTT ATCCATGTTCTGTCTCACACAAACATACACACGGGTTTACACAGGCAGCA CGCGATCGCCGCCAGGCCCTGTGCTGCCTCCAGAACTGACACTTAAGAGA GAAAAGTCAGCAGGGACAGTAGAGCTCAATTTTAAATCTGGAAAAAAAAA AAAAAAAAAAAAAGATGGGAAGCGGGGATTGGAATTCCACAGCAAAAAGA AACCTGTCGCTGCAGGATCCCTTCTCTACCCCGCGGGGAGAGCGGCACGG AGACAGTTCATTACTTTAGAAGTGGCAACTGTTTGCAGCCAGGCGGTGAC CTAGCGGCTGCTCTTACATAAAATGGGTACATTTCCCCCCACTTTAGTGG ATTTGCCTTCCACTCTTAAAGCTTTTAACAAAATAAAACTAGAAGTTGGA TCTCGACTCCCCCACCCCCACGATAAACCTAAGTGGTGGACAATTAAGAT ATCTTCTTCAAAAGGCGCCCCCTCGGAGCCGCGCAAAGCAGGGGCCTTCA GTGGGTGCCGTTCACCTTCCAGCCTAATCCGTGAGAAAGCGAGTGAAAGC GCCTCCCATTATCCCAGCCCCAGGACCATCTGACGATGGGAATAGGATTT GTTTCCTGGAAGGAGGTGAGAGAGAGAGAGAGAGAGAGAGACAGAGAGAG

  4. Biologically functional regions in the genome tend to stay conserved through the evolution. Therefore, by aligning homologous sequences from different, but related species we can identify Evolutionary Conserved Regions (ECRs) with a putative functional importance 1880th 1920th 1950th 2000th Comparative Sequence Analysis

  5. Evolution of the genomic code Genomic modifications empowered the evolution: mutations insertions / deletions duplications rearrangements … A functional element Functional regions of the genome accumulated less mutations,Natural selection eliminated species with mutations altering the critical function of important elements actgactgactgATATTGACAgtttgttgttgttaa agggacaaactgATATTGACAgt---ttgttgttaa aggg--aaactgATATTGACAgt---ttgaaattaa tggg--aaaccaATATTGACAgt-actcgaaattaa tggg--aaaccaATATTGACAgt-actcgaaatgta Functionally important elements in the DNA stayed conserved through the evolution How to find evolutionary conserved elements? Millions of years of evolution

  6. Human ACTTTACGGGATCTATCTATACCGGTAACGTAATCCGATACCAGT |||||||||||||| |||||||||||| Mouse ACTTTACGGGATCTCTCTATACCGGTAAAAAAAATTTAGT step 1- find matches Human ACTTTACGGGATCTATCTATACCGGTAACGTAATCCGATACCAGT ||||||||||||||:|||||||||||| Mouse ACTTTACGGGATCTCTCTATACCGGTAAAAAAAATTTAGT Human ACTTTACGGGATCTATCTATACCGGTA----ACGT—-AATCCGATACCAGT ||||||||||||||:|||||||||||| |::| |||Mouse ACTTTACGGGATCTCTCTATACCGGTAAAAAAAATTT-----------AGT step 2- find mismatches step 3- insert gaps tolinearize thealignment Sequence Alignment Human ACTTTACGGGATCTATCTATACCGGTAACGTAATCCGATACCAGT Mouse ACTTTACGGGATCTCTCTATACCGGTAAAAAAAATTTAGT

  7. Conserved Elements Human ACTTTACGGGATCTATCTATACCGGTA----ACGT—-AATCCGATACCAGT ||||||||||||||:|||||||||||| |::| |||Mouse ACTTTACGGGATCTCTCTATACCGGTAAAAAAAATTT-----------AGT CONSERVEDDIVERGED Numeric criteria of conservation - minimal percent identity over minimal length Current case: 95% / 30 bps Common criteria: 70% / 100 bps General: ????

  8. Human aaTtAAGGgTAAgTTTAcAtTGtttggAGCAAagGAaTAgcgATGcTCtCTTTGAATGAC | |||| ||| |||| | || ||||| || || ||| || ||||||||||| Mouse --TcAAGGcTAAaTTTAtAcTG----aAGCAActGAcTActaATGtTCcCTTTGAATGAC 369920 369930 369940 369950 369960 660 670 680 690 700 710 Human GTATtTGAACAGtTCAATAGAAAAaCTgGTAATGTATCAAAGAGCATCTTAAATTtTGAA 70 80 90 100 110 |||| ||||||| ||||||||||| || ||||||||||||||||||||||||||| |||| Human cAAGAgATTA---TTTTtAAATAAGcacCAAaTAcAAatAAAATgCtAtTgGCTAAAGTT Mouse GTATgTGAACAGcTCAATAGAAAAtCT-GTAATGTATCAAAGAGCATCTTAAATTgTGAA |||| |||| |||| ||||||| ||| || || ||||| | | | ||||||||| 370570 370580 370590 370600 370610 370620 Mouse tAAGAtATTActaTTTTgAAATAAGtgtCAAgTAgAAgcAAAATaCcAaTtGCTAAAGTT 369970 369980 369990 370000 370010 370020 41 730 740 750 760 770 Human GAGATCtTtCTGCctACTTTCtTtTaggGCAcaCCaCTcTgCTTTACTTtaAtGcATTGT 120 130 140 150 160 170 |||||| | |||| |||||| | | ||| || || | |||||||| | | ||||| Human CAaTTtgTTTTgCATAcTTGTTTCTAATAAGgACAtAtGAgcCacAAAATaGCCAAAGGG Mouse GAGATC-TcCTGCtcACTTTCcTgTccaGCAttCCtCTtTcCTTTACTTagAgGaATTGT || || |||| |||| |||||||||||||| ||| | || | ||||| ||||||||| 370630 370640 370650 370660 370670 370680 Mouse CAgTTcaTTTTcCATAtTTGTTTCTAATAAGtACAcAcGActCttAAAATcGCCAAAGGG 370030 370040 370050 370060 370070 370080 780 790 800 810 820 830 Human TATTTAACCAGTCAATGAGAAGtCTGtGCTTTtGGTGTGAACTCATCTtGAGTGATCTTT 180 190 200 210 220 230 |||||||||||||||||||||| ||| ||||| ||||||||||||||| ||||||||||| Human AGgGAAAAaaCCCTcAACtgCTAACAGCACATTAACAAAGTATAGAAAcGAAAGACACTT Mouse TATTTAACCAGTCAATGAGAAGcCTGgGCTTTcGGTGTGAACTCATCTcGAGTGATCTTT || ||||| |||| ||| |||||||||||||||||||||||||||| ||||||||||| 370690 370700 370710 370720 370730 370740 Mouse AGaGAAAAg-CCCTgAACgtCTAACAGCACATTAACAAAGTATAGAAAgGAAAGACACTT 370090 370100 370110 370120 370130 370140 840 850 860 870 880 890 Human TATTAATGTACATTAAcCAATTTCAAGGACAACAGGATAAGGTTACTTtTGAAagGCTTT 240 250 260 270 280 290 |||||||||||||||| ||||||||||||||||||||||||||||||| |||| ||||| Human TTCTTTGGATTTCAGCCTTGTCATTTCCAATTTTCTGCTCCTTGGACATGCTTGTATTCA Mouse TATTAATGTACATTAAgCAATTTCAAGGACAACAGGATAAGGTTACTTcTGAAttGCTTT |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 370750 370760 370770 370780 370790 370800 Mouse TTCTTTGGATTTCAGCCTTGTCATTTCCAATTTTCTGCTCCTTGGACATGCTTGTATTCA 370150 370160 370170 370180 370190 370200 900 910 920 930 940 950 Human CTCAAGAAAtGGATTTATATTCaTCtAAAATAATCtTAAtTCACATGAcACTGTTTATtA 300 310 320 330 340 350 ||||||||| |||||||||||| || ||||||||| ||| |||||||| ||||||||| | Human AATTCTGGAACATCTATTCAGCATATCAATCCTAATTAGACAATCTGGGTCTGGAAAGGA Mouse CTCAAGAAAcGGATTTATATTCtTCcAAAATAATCgTAAcTCACATGAgACTGTTTATcA |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 370810 370820 370830 370840 370850 370860 Mouse AATTCTGGAACATCTATTCAGCATATCAATCCTAATTAGACAATCTGGGTCTGGAAAGGA 370210 370220 370230 370240 370250 370260 960 970 980 990 1000 1010 Human t---tAAAAAAtTAGATAAaCcAAGTCcTCTTaAAAtGTAcCAtTtTCATAAGaAaAACa 360 370 380 390 400 410 |||||| ||||||| | ||||| |||| ||| ||| || | ||||||| | ||| Human TGaGAGCTGGGTCATTTGCATAATTTAATCATAAATACTCAGTGATACATATTTCCAAAT Mouse ggaagAAAAAAaTAGATAAgCtAAGTCaTCTTgAAA-GTAtCAcTgTCATAAGgAgAACg || ||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 370870 370880 370890 370900 370910 370920 Mouse TGgGAGCTGGGTCATTTGCATAATTTAATCATAAATACTCAGTGATACATATTTCCAAAT 370270 370280 370290 370300 370310 370320 1020 1030 1040 1050 1060 1070 Human TTaTaAtATaCTtaGTgGAGctctAAGAACCCAGGTGGCTAATCTGA-TTTTTaAAAAAG 420 430 440 450 460 470 || | | || || || ||| ||||||||||||||||||||||| ||||| |||||| Human GCATTTGTACAATTATCTTTTCATCCTTGGGGCAATGGTATTAATATGATTAGGCAATAT Mouse TTgTcAcATtCTctGTaGAGacagAAGAACCCAGGTGGCTAATCTGAtTTTTTtAAAAAG |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 370930 370940 370950 370960 370970 370980 Mouse GCATTTGTACAATTATCTTTTCATCCTTGGGGCAATGGTATTAATATGATTAGGCAATAT 370330 370340 370350 370360 370370 370380 1080 1090 1100 1110 1120 1130 Human AGATTCTGCTTTGTATGTTAATTAGTacaAAAGAAAGAAGTcaCATTTGTGAGTTTAAAT 480 490 500 510 520 530 |||||||||||||||||||||||||| |||||||||||| ||||||||||||||||| Human TTCTGGAAAAAACAGACAAGTATGCACTCTTTTTAACTGCAGCTTAgGGCGATATGAAAA Mouse AGATTCTGCTTTGTATGTTAATTAGTgacAAAGAAAGAAGTggCATTTGTGAGTTTAAAT |||||||||||||||||||||||||||||||||||||||||||||| ||||||||||||| 370990 371000 371010 371020 371030 371040 Mouse TTCTGGAAAAAACAGACAAGTATGCACTCTTTTTAACTGCAGCTTAaGGCGATATGAAAA 370390 370400 370410 370420 370430 370440 1140 1150 1160 1170 1180 1190 Human gCACTATTCTTTtCcTTtCAATCaAatgAAAAAGTAGAAATTACTGCATGCAAATATTCA 540 550 560 570 580 590 ||||||||||| | || ||||| | |||||||||||||||||||||||||||||||| Human ATTAATTAATTTCTGAAGAAAATCAATTTCTCTACGTGACCACATTAGACATtgCTAAAC Mouse aCACTATTCTTTcCtTTaCAATCgAgcaAAAAAGTAGAAATTACTGCATGCAAATATTCA |||||||||||||||||||||||||||||||||||||||||||||||||||| |||||| 371050 371060 371070 371080 371090 371100 Mouse ATTAATTAATTTCTGAAGAAAATCAATTTCTCTACGTGACCACATTAGACATcaCTAAAC 370450 370460 370470 370480 370490 370500 Huge alignments How to use them efficiently?

  9. 2 different ways to describe large genomic alignments

  10. Schwartz S, Zhang Z, Frazer KA, Smit A, Riemer C, Bouck J, Gibbs R, Hardison R, Miller W.Genome Research, 2000PipMaker: http://bio.cse.psu.edu/pipmaker/ • Mayor C, Brudno M, Schwartz JR, Poliakov A, Rubin EM, Frazer KA, Pachter LS, Dubchak I. • Bioinformatics 2000 • Vista: http:www.gsd.lbl.gov/vista Vertical coordinate gives an average percent identity in the window of 100bps centered at a given nucleotide Graphical conservation profiles 80%, 100bpsalignment block 1. Percent identity plots 2. Smooth graphs Colored regions correspond to areas of evolutionary conservaiton

  11. From Comparative Genomics to Genome Biology

  12. Experimental assesment of the biological functionof evolutionary conserved regions 245 conserved elements 155 exons 90 noncoding (>70% >100bp) 5q31 region Cyclin I-homolog KIF3 100% 50% ECR-1 401 bp 84% IL-4 IL-13 KIF3 RAD50 RAD50 EXONS Conserved Non-Transcribed Sequences

  13. Removal of the ECR-1 from the mouse genome IL 4 IL 13 ECR-1 LoxP LoxP ECR-1 wild type ECR-1 knockout

  14. ECR-1 IL4 IL13 Rad50 IL5 10kb 6kb 120kb Expression of 3 cytokines reduced in ECR-1 knockout WT WT WT Pg/ml Pg/ml Pg/ml -ECR1/-ECR1 -ECR1/-ECR1 -ECR1/-ECR1 0 0 0 IL4 IL13 IL5 Loots et al., Science (2000)

  15. Aligning Vertebrate Genomes

  16. 24 CHROMOSOMES form chapters: Average chromosome is ~100 million letters long TTATCTTTCAAGATTTTAAAGGTGTTCCTAATATTTTACACAAAAGCATG AGCACTAGATATGGTTGCAAAATACTGGGTGATGAGTTATACTGCCATTC TCTGCTTTCCTGTGAACTCCTTTATTTGTATAGTAGCTATATGCTCAGAC GTTGAAAATATAAGAAGTGAAGTACCCTGAAAAGTATCACATGATGGCAC TGTTTCCATTTCCACATCCAATATTATGAAATAAAGCTATAATAAACTGG TATTAAGAATGGGGTATAATGCCAGTGTATTTTGTATAATTTATGTAAAA TAAAAATCTAACCACTATGGTTATTAATATGGGTACTAAAGTGAATTCAT AGATTTTTCACAAAATGTTTTGTAAAAGCTTGCATTTCTATAATGTCTAT AATTTAGATCACAAAGAAACAATTTATCTAGATATTAACAATTTTAGTAA CACGGAAAACAGCTTCATTAATTACTTGAGTTGCTTTACAAACTATTTTT TAAAATAGTATATTTTATGTTATATTTCAGTTTTAATTGGGAAGAAATAA CGCTGTATCATACATGAGATTTATCTGTGGCAAATATGACCATTTGCATG GAATTATTTCCGAAGAATGCAAAGAAAGTGTATAAATAATATTGAAAAGT ACATGGATCAGTGGTTGAAGGGATCAAGCACAATTTTAAAGTGAACAAAA TTTAAATGTGGCCAACCTGAATATTTAAAGGGTTCATTAATCTGAGAAAT GTAAATGTTAAATGGTGTGTGATTTCAACTACCATTATTTATTATGGTAA ACAGTCTTTCCTATATAATAGGCATGAAAAAATGGTGTGGAGTGATTATC ATCTCAGGAATGAGAGTACAATAATTTTCTATTCCTAACAAAAAAGAAAA AAAAATGATCAAAATGTGATGTGATATATAGTGAAGTACTATGTAGATGT GGATGTTTAAAGATGAACCAAGCATCAGGATTTCACCAAATTTTATCTAT AATAATGAATTAATAATAGTGGATATAGATACATCTTCCCAGTGGCATGA GTGTGGTAAAAAAGATACAAAGCTCTATGGACTTGAAATGATGCCCCTCT AGTGATGTTAAAGAACCTAATGGCCAGAATTTGGAAGTGCAGCAAGTGAG TGCTGTAAGAATATTTTTAAATGTGATCAGTTTATATTTGTTTTAATATG ACAGAAAAAATACTTTGCACAATTTTCCTTTTAATTCATCTGTGAACTTG TCTCGGGGGGAAAACATACATGTGAAGTGTTCTTACTGTATTCTTTTAAA AATAAATATGAAAAATAATCATGCAGGTAAACCAATTCCAAATATTTATC TTAACGACATCCCCAAAATCTTAAAGGTATATACTAGGCATAAACCTTAA ACCTTTAATCACAGTGGAGATAAATTCCTCCTACAAAAAGAAATGTGTAA AGTAGAACTAACTATTCTGATATATTATTCTATGTAATCATTTCTCAAGT CTGTCTTTAAACAAATAGTTACATCTTATTATAAAGACAATAAATAAATA CATTTTCCTAGAAATCCATCTTGAAATAAGGATTTCTTGCACCCTAGTTT CAAGAATACACTGGTGTCCTATCACCTCCTTTGGGAAAGTGACAGTTTGC ATAATACTTTTCACATAAGAGAAAAATTTAAATAATGATATTGAGGAAAT TGTTGAAACATTGCCTAATGGTATAGTAACAAAAAGTATTCATAAATCTG TACTGTAGAAGAGAAAATATACACTACAATAATCTGTTCATTTGTCTTAG AAGAGGGGAGAAAAAAACCCAGAATACTGAAATAGGAAATTTCCATGTTC ACTGTATTTCACCATGCAAATCACTTGCAATTTCCAAATGCCAGTGTTAC TTTTCAGGACAAATTTCACACAAAAGGAATTCAGTGATTATTCATCCAGT TTAATAATTCAATTAAATAAGTCTGATGCTGTCAGGTGTTCTTTTAATAA GENES are short stories: Every chromosome has ~1,000 genes Every gene has a function in the human body Human genome GENOME is a huge book of life: Sequence of 3 billion letters from a short, 4-letters alphabet: A, C, T, and G

  17. Sequenced vertebrate genomes human x4 ~ 80 MY mouse x4 rat x4 ~ 400 MY fugu x0.5 zebrafish x2 tetraodon x0.5

  18. Comparing Genomes 1. Mask out repetitive elements (RepeatMasker) 2. Map syntenic regions in two genomes (BLAT) 3. Align syntenic regions (BLASTZ) 4. Visualize alignments (ECR Browser)

  19. Times to align human and mouse genomes (3Gb vs 3Gb) Mapping/Aligning Location Time Blastz/Blastz UCSC Genome Browser 1000 days (3 years) Blat/Avid Vista Genome Browser 1 month Blat/Blastz ECR Browser <5 days http://ecrbrowser.dcode.org/ Why do we want to align genomes faster?

  20. Human vs mouse and human vs fugu genome comparisons 10%of the human genome is conserved Over 1,000,000ECRs (Evolutionary Conserved Regions) 0.2%of the human genomes is conserved 41,067 ECRs Why do we observe so many conserved elements?How many of them are functional? Is fugu an ideal organism for finding regulatory elements in the human genome?

  21. Clean dataset of regulatory elements 14,680 non-exonic ECRs Gene predictions Ensemble Genscan FGENESH++ Sanger22 Acembly Twinscan Human / nonhuman mRNA 4,110 ECRs Pseudogenes Human/fugu ECR is required to have corresponding human/mouse ECR 1,885 ECRs 146 promoter ECRs 1,739distant regulatory ECRs

  22. Gene Deserts in the Human Genome

  23. Gene Deserts in the Human Genome

  24. Identification of Gene Deserts Intergenic interval: - no RefSeq or Ensemble genes (20k genes, 195k exons) - no sequence gaps Gene deserts: 3% of longest intergenic intervals 25% of the human genome sequence Gene deserts 0.5Mb - 4Mb

  25. Mycoplasmas 0.6 Mb ~ 600 genes Smallest living organisms (0.1m) Cyanobacteria (blue-green algae) 3.5 Mb ~ 4,000 genes Coverts CO2 into O2 E. coli 4.6 Mb ~ 4,500 genes Common inhabitant of the human intestine. Yersinia pestis 4.8 Mb ~ 4,000 genes Causes plague Relative size of the human gene deserts 0.5 Mb … 4 Mb

  26. gene desert gene desert gene desert present in mouse genome

  27. GC content and SNP density SNP Density (N/Mb) Gene deserts Regular intergenic 459.0 316.5 Low GC content and increase in SNPs density suggest a decreased amount of functional elements in gene deserts

  28. Gene deserts sequence conservation

  29. DACH gene desert on human chr13 DACH 1,330 kb 876 kb 430 kb Over 1,000 human/mouse ECRs! Could any of them be functional?

  30. Dachshund 800 Kb FLJ 100% 75% 50% 100% 75% 50% Identification of 13 distal human/fugu ECRs around Dachshund Dachshund 30 Kb

  31. DACH gene pattern of expression Brain / CNS Eyes Limbs

  32. Nobrega et al., Science 2003 Testing the function of ECRs: LacZ transient transgenics LacZ Hsp68

  33. How to find core enhancers without comparisons to distant species

  34. SOM: a highly conserved developmental transcription factor 65 human/mouse ECRs NONE human/fugu ECRs NONE human/chicken ECRs

  35. Genes ‘flanked’ by human/fugu ECRs Gene A Gene B Gene C or Only 5.6% of the human genes are flanked by human/fugu noncoding ECRs

  36. Fugu ECRs Mouse ECRs Core ECRs Human/mouse counterparts of human/fugu ECRs

  37. Length Core ECRs 350bps/77% • Recapitulate ~90% of the human/fugu ECRs • 2. Reduce 10-fold the numberof putative enhancers (human/mouse ECRs) Percent identity Ovcharenko et al., Genomics, in press

  38. 65 human/mouse ECRs  core ECRs  4 putative enhancers

  39. Phylogenetic Shadowing

  40. 98% sequence identity How to find functional elements? Comparative sequence analysis of closely related organisms Human vs Chimp

  41. Different species accumulated differences independently Allen CTCGTCCAGTCTGGAGTGCAGTGGCGCGATCGCAGCTCACCGCAATGTCCGCCTCCCGGG 147 Green CTCGTCCAGTCTGGAGTGCAGTGGCGCGATCGCAGCTCACCGCAACGTCCGCCTCCCAGG 76 Human CTTGCTCAGGCTGGAGTGCAGTGGCATGATCTTGGCACACTGCAACCTCCACCTCCCGTG 281 Chimp CTTGCTCAGGCTGGAGTGCAGTGGCATGATCTTGGCTCACCGCAACCTCCACCTCCCGTG 270 Orangutan CTCGCTCAGGCTGGAGTGCAGTGGCGTGATCTTGGCTCACCGCAACCTCCACCCCCCGGG 193 Colobus TTCGTCCAGTCGGGAGTGCTGTGGCGCGATTGCAGCTCACGGCAACGTCCGCCTCCCGGG 214 Douc TTCATCCAGTCTGGAGTGCAGTGGCGCAATCGCAGCTCACCACAATGTCCGCTTCCCGGG 211 Francois TTCGTCCAGTCTGGAGTGCAGTGGCGCGATTGCAGCTCACCGCAACGTCCGCCTCCCGGG 77 Drill CTTGTCCAGTCTGGAGTGCAGTGGTGCGATCGCAGCTCACCGCAACGTCCGCCTCCCGGG 186 Mangabey CTCGTCCAGTCTGGAGTGCAGTGGTGCGATCGCAGCTCACCACAACGTCCGCCTCCCGGG 75 Owl TTCACCCAGGCTGGAGTACAGGGGCATGATCTCAGCTCACTGCAACCTCCACCTCCAAGG 191 Squirrel TTCACCCAGGCTAGAGTACAGTGGCATGATCTCAGCTCACTGCAACCTCCACCTCCAAGA 76 Tamarin TACCCCCCGGGTGGAATACCGGGGCATGATCTCAGCTCACTGCAACCTCCACCTCCCAGG 212 Titi TTCACCCAGGCTGGAGTACAGTGGCATGATCTCAGCTGACTGCAACCTTCACCTCCAAGG 202 * * ** * * * ** ** ** ** *** * * * **

  42. * * ** * * * ** ** ** ** *** * * * ** 2-state trainable HMM model to identify conserved elements using the sequence of complete matches * * ** * * * ** ** ** ** *** * * * **

  43. Minimum number of primates? mouse 14 primates 1 primate (Allouatta seniculus) I. Ovcharenko et al., Genome Research, 14(6), 2004

  44. HB1 Human/baboon alignments identify primate-specific regulatory elements

  45. Noncoding ECRs & Transcriptional GeneRegulation

  46. CNS regulatorymodule B Gene X Transcriptional gene regulation Limbs regulatorymodule A Gene X

  47. Regulatory module structure Transcription factors Gene regulatorymodule actgactgactgatattgacagtttgttgttgttaa Footprints or bindings sites are known for many transcription factors and they areextremely short (~ 6-10 bp) Computational predictions of transcription factor binding sites are overwhelmed with false positives

  48. Human ACTTTGATACATCTATCTATA ||||||||||||||:||||||Mouse ACTTTGATACATCTCTCTATA Human ACTTTGATACATCTATCTATA |||||Mouse ACTTT---------------- Human ACTTTCCTACATCTATCTATA |||||::|||||||:||||||Mouse ACTTTGATACATCTCTCTATA Human -----GATACATCTATCTATA ||||| Mouse ACTTTGATAC-----------

More Related