380 likes | 516 Views
MW 11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean. Lecture 17. Ultraconservation. Sequence Conservation implies Function. (but which function/s?...). Comparative Genomics of Distantly related species:. functional region!. human.
E N D
MW 11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean http://cs273a.stanford.edu [Bejerano Aut07/08]
Lecture 17 • Ultraconservation http://cs273a.stanford.edu [Bejerano Aut07/08]
Sequence Conservation implies Function • (but which function/s?...) Comparative Genomics of Distantly related species: functional region! human ...CTTTGCGA-TGAGTAGCATCTACTATTT... common ancestor ...ACGTGGGACTGACTA-CATCGACTACGA... anotherspecies Note: the inverse “no conservation no function”is a much weaker statement given current knowledge http://cs273a.stanford.edu [Bejerano Aut07/08]
How They Measured all human-mouse alignments human-mouse ancestral repeats alignment Difference: 5% of Human Genome [Mouse consortium, Nature 2002] http://cs273a.stanford.edu [Bejerano Aut07/08]
What They Found Human Genome: 3*109 letters 1.5% known function compare to other species >50% junk >5% human genome functional 3x more functional DNA than known! ~106 substrings do not code for protein What do they do then? [Science 2004 Breakthrough of the Year, 5th runner up] http://cs273a.stanford.edu [Bejerano Aut07/08]
Ultraconservation http://cs273a.stanford.edu [Bejerano Aut07/08]
Typical DNA Conservation levels Conserved elements between human and mouse are on average 85% identical. [mouse consortium, 2002] [Bejerano et al., Science 2004] http://cs273a.stanford.edu [Bejerano Aut07/08]
Ultraconserved Elements fish [Bejerano et al., Science 2004] http://cs273a.stanford.edu [Bejerano Aut07/08]
Ultraconserved Elements [Bejerano et al., Science 2004] http://cs273a.stanford.edu [Bejerano Aut07/08]
* * * * * No known function requires this much conservation ? CDS ncRNA TFBS seq. http://cs273a.stanford.edu [Bejerano Aut07/08]
Discovery can be fun (compared to 4 results day before our ScienceExpress paper) 58,300 http://cs273a.stanford.edu [Bejerano Aut07/08]
Ultra Conserved (UC) Elements • Any contiguous block of human-mouse-rat alignment that isidentical in all three species, syntenic and ≥200 bp. • (p=10-22 of finding one such element in slowest rate 2.9G neutral DNA) • Turns out there are 481(!) such blocks of sizes 200-779bp(total of 126Kb) in all chromosomes but 21, Y. *68 (61%) associated with alt-spliced exons http://cs273a.stanford.edu [Bejerano Aut07/08]
Ultra Clusters • exonic • non • possibly By joining two ultras into a cluster when separated <675Kb we obtained 89 clusters (each named after prominent gene/s) Non exonic elements tend to congregate in clusters. Exonic elements are distributed more randomly (tend to overlap an alt-spliced exon). http://cs273a.stanford.edu [Bejerano Aut07/08]
Genomic Distribution • exonic • non • possibly http://cs273a.stanford.edu [Bejerano Aut07/08]
481 UCEs identified Associate Ultras with Nearby Genes 93 type I genes 111 exonic 114 possibly exonic 256 non-exonic 100 introns 156 intergenic 225 type II genes http://cs273a.stanford.edu [Bejerano Aut07/08]
Functional Annotation of Related Genes p = 1.3 x 10-19 p = 4.8 x 10-10 p = 5.6 x 10-18 p = 1.2 x 10-4 p = 9.1 x 10-6 p = 4.4 x 10-6 ultra related genes Exonic 86 p = 0.39 p = 0.44 p = 0.77 p = 2.5 x 10-20 p = 1.8 x 10-15 p = 7.8 x 10-15 7 Non Exonic 218 Exonic – RNA processing @ transcription regulation Non Exonic – regulation of transcription at DNA level http://cs273a.stanford.edu [Bejerano Aut07/08]
Non Exonic Enhancers • The non exonic ultras are often found in “gene deserts”(140 / 256 >10Kb from a known gene; 88 > 100Kb away). • The genes flanking these ultras are GO enriched for development(p = 10-6), particularly early developmental tasks (p = 2-7 x 10-5) suggesting distal enhancer roles. • Indeed, uc.351 is contained in a proven enhancer of DACH, located 225Kb upstream of it [Nobrega et al., Science 2003]. ultra conserved http://cs273a.stanford.edu [Bejerano Aut07/08]
Zoom to uc.351, 225Kb upstream of DACH ultra conserved e.d 12.5 http://cs273a.stanford.edu [Bejerano Aut07/08]
Validating Regulatory Elements wild type Conserved Element Minimal Promoter Reporter Gene transgenic where is the reporter gene expressed? where is the wild type gene expressed? http://cs273a.stanford.edu [Bejerano Aut07/08]
Predictions and Proofs I Based on public domain genome wide data: ultraconservedelements larger subset does not one subset codes protein generate testable hypotheses for function from existing knowledge (2004) [Pennacchio et al., Nature, 2006] http://cs273a.stanford.edu [Bejerano Aut07/08]
The most conserved elements in the genome • If one concatenates ultras that are 1-2bp away, all 4 longest ultras (1044, 779, 731, 711bp) lie at 3’ end of POLA, near ARX, on chr X. The longest has 8 subs. and no indels (99.3%id) in chicken. • ARX is a homeobox gene involved in CNS development; defects in the gene are linked to epilepsy, mental retardation, autism and cerebral malformations. ultra conserved http://cs273a.stanford.edu [Bejerano Aut07/08]
Exonic uc’s correlate with Alt Splicing • 68 / 111 exonic ultras overlap an exon that shows clear evidence of being alternatively spliced. • Of the 59 GO annotated genes containing these elements: • 24 are RNA binding (p = 8.1 x 10-18), including HNRPU, HNRPDL, HNRPH1, HNRPK, HNRPM. • 16 contain the RNA recognition motif (p = 9.1 x 10-19), including SFRS1, SFRS3, SFRS6, SFRS7, SFRS10, SFRS11. • These ultras often overlap a short exon that is retained only in some tissues. • Such is the explicitly studied, uc.33 in PTBP2 (length 312bp) overlaps a 34bp exon included in the mature transcript only in the brain [Rahman et al., Genomics 2004] http://cs273a.stanford.edu [Bejerano Aut07/08]
Predictions and Proofs II Based on public domain genome wide data: ultraconservedelements larger subset does not one subset codes protein generate testable hypotheses for function from existing knowledge (2004) post transcriptional modification [Pennacchio et al., Nature, 2006] http://cs273a.stanford.edu [Bejerano Aut07/08]
Splicing Microarrays http://cs273a.stanford.edu [Bejerano Aut07/08]
Non-Sense Mediated Decay (NMD) http://cs273a.stanford.edu [Bejerano Aut07/08]
Of the 29 exonic ultraconserved elements in RNA-binding protein genes in human, 15 have human and/or mouse EST evidence suggesting the presence of AS-NMD in those regions. [Ni et al., Genes & Dev 2007 ] http://cs273a.stanford.edu [Bejerano Aut07/08]
Model for Homeostatic Auto/Cross-regulation [Ni et al., Genes & Dev 2007 ] http://cs273a.stanford.edu [Bejerano Aut07/08]
Similar Results = 100% conservation; associated with AS = normal splice = alternative splice = retained intron = normal stop codon = premature termination codon [Lareau et al., Nature 2007 ] http://cs273a.stanford.edu [Bejerano Aut07/08]
Ultraconserved Non-coding RNA About 1/3 of all ultras are expressed. Some are predicted to provide microRNA targets. A few are anti-correlated with miRNAexpression levels. A few even act as oncogenes. miRNA complementarity [Calin et al, Cancer Cell, 2007] http://cs273a.stanford.edu [Bejerano Aut07/08]
Ultras are Under Strong Human Selection Mutational cold spots? NO. Rare (new) mutations are introduced to the population. Fierce purifying selection? YES. Very few of these get anywhere near fixation. A A G A A humans chimp NonSyn DAF Ultra DAF [Katzman et al, Science ,2007] http://cs273a.stanford.edu [Bejerano Aut07/08]
Relation to Human Disease SHH LMBR1 1Mb Limb Lettice et al. HMG 2003 12: 1725-35 [Derti et al., Nature Genetics, 2006] http://cs273a.stanford.edu [Bejerano Aut07/08]
Link to Disease Remains Elusive http://cs273a.stanford.edu [Bejerano Aut07/08]
A Vertebrate Innovation? • Only 24 ultras can be partially traced back through direct sequence search to Ciona, C. Elegans or Drosophila. • All overlap coding exons from known genes (17 of which show clear evidence of alt-splicing inc. EIF2C1, DDX, BCL11A, EVI1, ZFR, CLK4, HNRPH1, GRIA3). • No intronic element in human was found to be coding in another species, although in some cases EST evidence indicates intron retention, presumably not as CDS. • Interestingly, ribosomal DNA (not part of the draft genomes) also harbors 6 ultraconserved elements in 18S, 28S. def def def http://cs273a.stanford.edu [Bejerano Aut07/08]
Genomic Distribution of Ultraconserved Elements • exonic • non • possibly Origins? http://cs273a.stanford.edu [Bejerano Aut07/08]
Computationally Driven Biology Simplified hypothesis generalize survey case study set BIO CS analyze experiment candidates http://cs273a.stanford.edu [Bejerano Aut07/08]
What we do understand.. • Ultraconserved elements exist. • They are maintained via strong on-going selection. • It is a heterogeneous bunch: • Some mediate splicing • Some regulate gene expression • Some express ncRNAs • (categories are not necessarily mutually exclusive) • Knockouts of four regulatory ultras did not lead to severe phenotypes (similar protein cases: Pbx2, Nkx6.2, Gli1) http://cs273a.stanford.edu [Bejerano Aut07/08]
What we don’t understand • Their functional density: • How did they come to be? • What is the selective advantage that lets them persist? http://cs273a.stanford.edu [Bejerano Aut07/08]
Broad Guess • It’s about 3-D structure. • Observation: rDNA (18S, 28S) have ultraconserved stretches,multiple constraints in a complex 3-D structure, the Ribosome. • ncRNA ultras: structure confers function • Splicing related ultras: the Splicosome • Cis-reg ultras: TSS 3-D proximity, chromatinand/or packed TFBS (Transcription factories?) TSS http://cs273a.stanford.edu [Bejerano Aut07/08]