300 likes | 562 Views
Regulatory Motif Finding (II). Balaji S. Srinivasan CS 374 Lecture 18 12/6/2005. Overview. Biology of DNA binding motifs Why motifs? Overview of motif finding algorithms Open problems in this area. Biology of Motifs. From last time…. Biology of Motifs. From last time ….
E N D
Regulatory Motif Finding (II) Balaji S. Srinivasan CS 374 Lecture 18 12/6/2005
Overview • Biology of DNA binding motifs • Why motifs? • Overview of motif finding algorithms • Open problems in this area
Biology of Motifs • From last time…
Biology of Motifs • From last time…
Biology of Motifs • Given transcription factor (TF) of fixed sequence… • binding affected by • secondary, tertiary structure of DNA • methylation state • DNA binding motifs
Biology of Motifs • DNA Motifs (regulatory elements) • Binding sites for proteins • Short sequences (5-25) • Up to 1000 bp (or farther) from gene • Inexactly repeating patterns
Biology of Motifs • TF binding affected by • secondary, tertiary structure of DNA • methylation state • DNA binding motifs • Should be on your radar… • motifs frontier of research why? • sequence data exists • static, not dynamic dynamic chromosome: accessibility affects transcription… dynamic epigenome (methylation state)
proks: immediate upstream reg euks: long range regulation Biology of Motifs • Prokaryotes • fewer TFs • long motifs • affinity dep on match • Eukaryotes (HARD) • more TFs per gene • shorter motifs • MUCH more noncoding seq • regulatory modules • long range effects
Biology of Motifs • Transcription Factors • often dimer, tetramer: palindromic binding site • binding • stochastic • affinity = structural/sequence match • high affinity not always desirable • combinatorial regulation (esp. eukaryotes) • order important! • site spacing important!
Why motifs? • Given: all TF/motif pairs • Get: global genetic regulatory network microbial eukaryotic
Recap #1 • To figure out transcriptional control… • find transcription factor binding sites • Eukaryotes: hard b/c • much more noncoding sequence • shorter motifs • longer range interactions
Motif Finding Overview • Methods • 1 genome • sequence overrepresentation (NBT shootout, not good) • Functional Genomics • predict regulons (Segal, etc.) • N genomes • phylogenetic footprinting (Kellis, etc.) • N genomes + Func Genomics • Phylocon (Tompa) • New ideas…
Motif Shootout • Nature Biotech Jan. 2005 • 13 way shootout • disappointing results • Useful in that • shows importance of using all info • benchmarking is clearly trouble area
upstreams Motif Shootout • Conceptually • load FASTA hopper of intergenic sequence from 1 genome into black box • output: motif matrices • But… • how to pick sequences? • comparison? • functional clustering? • benchmarking?
Motif Shootout • But… • how to pick sequences? • comparison? • functional clustering? • benchmarking? • So • not as useful as it seems… • huge, artificial limitations • “consider a spherical cow” • What if limitations removed?
Motifs via Functional Genomics • Coexpression • most popular (e.g. Segal 2003) • Functional clustering • then hunt upstream
Motifs via Functional Genomics • Chip/CHIP • key idea: assay DNA segments where TF binds • direct test of motif binding (e.g. Laub 2002) • Disadvantages • one TF at a time • need an antibody!
Motifs via Functional Genomics • Coinheritance, etc. • predict regulons, then look upstream • heuristic network integration • will return to this point • decent signal in prokaryotes (Manson-Mcguire 2001)
ultraconserved no conservation Motifs via Phylogenetic Footprinting • Key idea • functional sequence evolves more slowly • conservation hierarchy • ultraconserved NC elems (Bejerano & Haussler 2004) • proteins, ncRNAs • DNA binding motifs • unconstrained, neutrally drifting regions
Motifs via Phylogenetic Footprinting • Phylogenetic footprint • “footprint” is conservation • simple version • multiple alignment of orthologous upstream regions • Problem: nonfunctional sequence drifts rapidly • multiple align difficult if only small % conserved • protein twilight zone: 30% identity • nucleic acids upstream regions: often much less…
Motifs via Phylogenetic Footprinting • Phylogenetic Footprint • Problem: multiple alignment of upstreams hits twilight zone • One solution • search for parsimonious substrings… • without direct alignment (Blanchette 2003)
Motifs via Phylogenetic Footprinting • Multiple genome alignment can work • need close enough species • Kellis 2003 (four yeasts, genome alignments) • Xie 2005 (“four” mammals, genome alignment) • Discussed last time • Key points • Genome wide search • Motif Conservation Score: null model based test
Recap • Many programs for motif search • most are useless! • Lesson: • must use comparative genomics (e.g. alignment) • …or functional genomics (e.g. expression) • what about both together??
Integrated Motif Finding • Recall • comparative genomics • one upstream region in N species • functional genomics • N upstream regions in one species • Phylocon (Tompa 2003) • N upstreams in N species
Integrated Motif Finding • Phylocon • given N species • align upstream regions • key idea: align the alignments • Boosts sensitivity • LEU3 hard to find…
Integrated Motif Finding • Boosts sensitivity • LEU3 hard to find… • but align the alignments true motif pops out!
Integrated Motif Finding • Important features • no prior motif length reqd. • profile approach matches distribution, not sample (robust to subs) • several alignments for each upstream are OK • does well vs. real data… • ALLR (avg. log. like. ratio) • Q: are 2 profile columns samples from same distribution? • if so, that may be a matching motif position…
Open Questions • Phylocon is strong step in right direction… • align the alignments • But how do we… • choose species? • choose upstreams? • validate motifs? • find TF/motif pairs?
Conclusion • Motifs important • static, tractable, impt. • want: genetic regulatory networks • Motif finder selection • Don’t: use 1 genome w/o comparison or func. genomics • Do: use alignment & func genomics • Phylocon (Tompa), MCS (Kellis) • best to date b/c use N genes and M species