270 likes | 543 Views
Identifying conserved promoter motifs and transcription factor binding sites in plant promoters. Endre Sebestyén, ARI-HAS, Martonvásár, Hungary 26th, November, 2009 RCPGD Annual Meeting. Transcription factor binding sites. TFs bind short, often degenerate DNA sequences
E N D
Identifyingconservedpromotermotifs and transcriptionfactorbindingsites in plant promoters Endre Sebestyén, ARI-HAS, Martonvásár, Hungary 26th, November, 2009 RCPGD Annual Meeting
Transcription factor binding sites • TFs bind short, often degenerate DNA sequences • Promoters are variable length 5’ sequences • With TFBSs • TFBSs are usually conserved in a nonconserved surrounding sequence • Some well known TFBSs • TATA box • GC box • CpG island • Lots of other, less genereal TFBSs • Similarly expressed genes, or homologues should contain similar TFBSs
TFBS search and promoter analysis • Wet-lab methods • DNAse footprinting • Electrophoretic mobility shift assay • ChIP-Chip, ChIP-Seq • In silico methods • Experimentally verified sites • Consensus sequences • Consensus matrices • De novo motif discovery • Oligo frequency • Phylogenetic footprinting • Other methods
Experimentally verified sites • TRANSFAC • JASPAR • PLACE • PlantCARE
De novo motif discovery • Orthologous gene groups • Evolutionary conserved functional sites • Co-regulated genes • Same tissue, body part • Same developmental stage • Etc
„Real” promoter structure • No general motifs • No TATA-box, GC-box, etc • Lots of false positive TFBS • With wet-lab and in silico methods • Sometimes no apparent common TFBSs between coregulated genes
Database of Orthologous Promoters • Orthologous promoter sequence collections • Based on a BLAST search with first exons of reference species • Plants (Viridiplantae) • Reference species: Arabidopsis thaliana • Chordates • Reference species: Homo sapiens • 500/1000/3000 bp 5’ upstream regions • Conserved sequence regions • Annotations • Xrefs to other databases • Annotated transcription start sites
DoOP subsets • Cluster > Subset • Subset: collection of evolutionary monophyletic sequences in a cluster • Plant subsets • Brassicaceae • Arabidopsis thaliana • Brassicaceae species • Eudicotyledons • Grape, Solanum species, papaya, tobacco • Magnoliophyta • Maize, rice • Viridiplantae
Gene types – Gene Ontology • Standardized annotation for genes • Biological process • What does it do? • Transcription, translation, stress response, etc • Cellular component • Where is it located? • Membrane, ribosome, cytosol, etc • Molecular function • How does it work? • Dehydrogenase, ATP binding, etc
Gene types – Gene Ontology • 500 bp promoters • Search for significantly enriched terms in annotation • Brassicaceae • Eudicotyledons • Magnoliophyta • Viridiplantae • BP: transcription, translation, protein folding, stress response • CC: plasma membrane, ribosome parts • MF: ATP/GTP binding, DNA binding, ribosome parts
Motif generation • Phylogenetic footprinting • Functional TFBSs should be conserved • Local sequence alignment • Define conserved regions
Motif generation eudicotyledons Magnoliophyta Brassicaceae
TFBS databases • Lots of redundant data • Low quality, not updated • More than a 100 different version for TATA box
Synthetic biology • Synthetic biology • iGEM competition • BioBricks • MIT Registry of Standard Biological Parts • UV responsive promoter • Promoter expressed in roots • Etc • Synthetic promoters • Define basic promoter elements • Build and use custom made promoters • Gene expression more or less when and where you want it
SNP conservation • Gene expression levels change because • Regulatory elements change • Usually NOT protein coding regions • Conserved promoter regions might be functional regulatory elements • Search for SNPs in this regions • These SNPs might be interesting for breeders as theye are likely to be functional ones
A real example • Vilmos Soós, Endre Sebestyén, Angéla Juhász, János Pintér, Marnie E. Light, Johannes Van Staden, Ervin Balázs (2009) Stress-related genes define essential steps in the response of maize seedlings to smoke-water. Functional and Integrative Genomics, Volume 9, Number 2, Pages 231-242; doi:10.1007/s10142-008-0105-8 • Microarray experiments • Maize kernels (Mv 540) • 24 and 48 h – control vs smoke treated samples • Up and downregulated genes • Promoter sequences up to 1500 bp were extracted if available
Analysis of promoters • TRANSFAC database version 12.1 • Collection of TFBSs • More than a 100 plant TFBSs • DRE-element: GCCGAC • Scan for the TFBSs in the maize promoters • Up and downregulated • Also count the frequencies of all 5-8mer sequences • In all available maize promoters, not only the up or downregulated • Calculate the over or underrepresentation of a TFBS by the following • Observed frequency in up or downregulated promoters divided by the expected frequency in all promoters • If ratio > 1 : overrepresented • If ratio < 1 : underrepresented
Analysis of promoters • Results • Binding sites related to • Organogenesis • Meristem development • Housekeeping functions • Biotic stress • Cold and dehydration stress • ABA related motifs