1.33k likes | 1.52k Views
Too many matches…. A typical question:. A typical approach:. ?. Too many matches…. What are the potential TF sites involved in regulation of my gene of interest ?. “Let´s run Mat I nspector over the promoter region of my gene”. A typical question:. A typical approach:. ?.
E N D
A typical question: A typical approach: ? Too many matches… What are the potential TF sites involved in regulation of my gene of interest ? “Let´s run MatInspectorover the promoter region of my gene”
A typical question: A typical approach: ? Too many matches… Where do I get my input promoter DNA sequence from? “Let´s extract from NCBI. 3kb upstream of TSS to be sure to have the promoter…”
Too many matches… A typical result: ? Which of those matches are relevant? How do I get rid of all those “false positives” ?
Important facts to consider: TF binding sites… There is not a single false positive match MatInspectorgives you all physical TF binding sites A physical TFBS is found every 10 to 15 bps throughout the genome A single isolated TF binding site carries no function TFs work through complexes which are represented on sequence level through sets of TF binding sites in certain distance relationship and orientation ->promoter frameworks
? TF binding sites… Okay, what is now a physical TF binding site ? What is a functional TF binding site?
A physical binding site is invariable A physical binding site is a fixed part of the genome = weight matrix / IUPAC string Physical binding sites can be detected by MatInspector This DNA sequence usually can bind to its cognate protein(s) Physical binding sites have no function in transcription on their own False positives?
A functional binding site depends on context! A functional binding site requires a cellular context A functional binding site requires a genomic context Module Transcriptional function is defined by the cellular and genomic context Physical vs functional TFBS ...but binding proteins are present only in 2 cell types! -> no functional binding site in the other 3 cell types! One binding site, five cell types... ...biological function may require additional binding sites! Even when binding proteins are present...
TATA box INR box A transcriptional module is the smallest functional unit The core promoter - just another module A transcriptional module consists of two or more TFBSs Strand orientation, relative order and distance of TFBSs are important A module also has a strand orientation and can shift within a promoter Transcriptional modules are present in promoters and enhancers F1 + F2 - F3 +/- Transcriptional modules Transcriptional modules integrate signals via the interacting TFs
A B C A B C A B C Why uses nature modules? No common organization? Common modules!
Promoter modules can work in three different ways Transcriptional modules Synergistic Antagonistic Synergistic “Short range module” distance ≤ 50 bp “Composite elements” “Short range module” distance ≤ 50 bp “Looping module” distance up to 300bp or or Binding Affinity: High / Low Is possible High / Low Is possible High / Low Is possible High / High only
Transcriptional modules define target genes of pathways NFkappaB regulates a number of “target genes” NFkappaB is involved in regulation of target genes of several pathways NFkB C/EBP NFkB CREB C/EBP CREB ICAM-1 IL-2 IL-8 IRF-1 IL-6 ELAM-1 NFkappaB SAA-1 IFN-ß SAA-2 NFkappaB G-CSF IP-10 NFkB IRF-1 NFkB NFkB HLA-A HLA-B E-Selectin IL-1 Modules are the basic elements of regulatory pathways and networks Transcriptional modules Induced by 2 pathways !
Key – lock principle „DNA-looping“ binding distal promoter/ enhancer TF binding sites Protein complex TF binding sites TBP TATA TFIIB TFIIE proximal TFIIA promoter TFIIF TFIID TFIIH core promoter IN R RNA polymerase II Transcription factor binding sites Transcriptional modules
Transcription regulation mechanism Exon Promoter Primary transcript Gene C, transcript m Gene B, transcript p Transcription regulation implies a regulatory network Transcriptional modules Gene A, transcript n Protein complex
Context dependent expression by different protein complexes TBP TATA TFIIB TFIIE TFIIA TFIIF TFIID TFIIH TBP TATA TFIIB TFIIE TFIIA TFIIF TFIID TFIIH IN R Same lock – different keys: Same gene - different biological context Transcriptional modules
Context specific transcription regulation Example: Analysis of the RANTES promoter in different cell lines Transcriptional modules Experimentally verified evidence that TFBSs from modules, which are crucial for regulation in one biological context (cell type), are totally irrelevant in another ! Fessele, S., Maier, H., Zischek, C., Nelson, P.J., Werner, T. (2002) "Regulatory context is a crucial part of gene function" Trends in Genetics 18, 60-63 (MEDLINE 1181130)
Modules contribute strongly to functional promoter analysis Modules are usually linked to at least one known biological function A module match in a promoter makes this gene a good candidate A module match in a promoter does not prove the gene to be a target Additional independent evidence is required to prove the target A module match immediately suggests experimental verification Module matches reduce experimental efforts by orders of magnitude Transcriptional modules
? Promoter sequences Very interesting – but how does all this help me with my original question ? The question still is: What are the potential TF sites involved in regulation of my gene of interest ?
More things to consider before asking that question ! There was another one: ? Promoter sequences Where do I get my input promoter DNA sequence from? “Let´s extract from NCBI. 3kb upstream of TSS to be sure to have the promoter…”
More things to consider … ? Promoter sequences 3 kb is too large for meaningful analysis even going 10kb upstream of TSS is no guarantee to have the relevant promoter sequence multiple promoters are the rule, not an exeption the non-coding first exon is always part of the promoter Huh? What does this mean ? Where do I get this damn promoter now?
Which promoter? One gene = one promoter ? Gene A? Gene A? Gene A? Genes usually have alternative transcripts with alternative promoters Alternative transcripts/promoters
Context dependent expression via different promoters Example: Glucokinase Coding exons Hepatic promoter Pancreatic promoter Alternative transcripts/promoters
Comparative genomic map of the Glucokinase GCK Promoter set 1 Promoter set 2 Pancreatic promoter Hepatic promoter Alternative transcripts/promoters Data from ElDorado
Important facts to consider: Alternative transcripts/promoters Alternative promoter usage is often tied to regulation of tissue specific gene expression Alternative promoter usage is of very high biological relevance. There are several examples where aberrant regulation of the identical primary transcript leads to severe biological effects
Aromatase: Switch in promoter usage is associated with disease AATAAA AATAAA 1.1 1.4 1.f 1.6 1.3 1 II III IV V VI VII VIII IX X Normal breast Breast cancer Aromatase Alternative transcripts/promoters The gene product is absolutely identical. The only difference is in the alternative promoter usage. On transcript level this can be seen only in the non-coding first exon.
The aim of in silico promoter analysis - summary context 1 context 2 context 3 : context n Promoter Analysis 1. Identification of the promoter sequence 2. Prediction of physical transcription factor binding sites 3. Functional context 4. Context dependent functional transcription factor binding sites
… ElDorado promoter sequence retrieval Yes! I know all of this! I just wanted to know from where I can get my promoter sequence(s) easily! If you don´t have one already, sign up for a free evaluation account. first... ... then login here! www.genomatix.de
Either enter here the locus ID, or the gene name …or choose a sequence file from your directory... … or copy & paste a raw sequence here. It can be cdNA or whatever you have. It will be exactly mapped to the genomes within seconds. Upload a file from your local disk… ...accession number… … or exact contig position Choose the organism. ElDorado promoter sequence retrieval
ElDorado promoter sequence retrieval IMPORTANT! Affymetrix probe-set-ID input : Our annotation is NOT based on the Affymetrix NetAffx assignment!It is rather based on genomic mapping of each single probe. A transcript will be retrieved if at least one probe of the set (usually 11 probes) matches. For mixed probe sets (cross-hybridisation), all relevant transcripts will be retrieved, which might lead to a result with transcripts from different loci. Input in this section delivers results based on gene name or keyword search. Over a million of names, synonyms and gene IDs help to find what you want - fast! HMGCS1 ( for example) Input in this section delivers results based on ultra fast sequence mapping. Copy and paste raw sequence data here (min.15 nucleotides) or enter an accession number. In contrast to the entry of an accession number above, here the sequence is actually retrived from data base and mapped onto the genome(s). NOTE: many EST based accession numbers have poor sequence homology and deliver no result.
… licensed customers can add their own sequence data … here you can choose which chip´s probes to see... ElDorado promoter sequence retrieval
ElDorado promoter sequence retrieval This gives you an interactive graphical representation of the genomic context of your gene
ElDorado promoter sequence retrieval mapping positions of Affymetrix single probes ! switch display of components on and off scale/slide the retrieved genomic "window" select regions of the graphics and safe them into a file Orange indicates your input. In this case a gene name. It is very informative when your query is based on sequence data. Then you see the mapping positions. Everything is clickable – just play around ! Here you can scale the view
ElDorado promoter sequence retrieval Clicking on this trancriptional start region (TSR)... ...displays this hyperlink to ... Now we have zoomed into the promoter region
ElDorado promoter sequence retrieval ...this profile of the different experimentally verified TSS (CAGE tags) in the different tissue types.
ElDorado promoter sequence retrieval This is a table-like representation of all annotated elements. It is especially useful for quick and easy retrieval of the dna sequence(s) of interest.
ElDorado promoter sequence retrieval Tick/un-tick the boxes of what you would like to see, and then...
ElDorado promoter sequence retrieval This for instance... ...tells you that this SNP deletes three potential TF binding sites and creates a new one. A potential regulatory active SNP...
ElDorado promoter sequence retrieval from here you can directly run a MatInspector analysis for this promoter... ...again,play around with the interactive graphics... Click the symbols and jump right into MatBase, the TF knowledge base..
ElDorado promoter sequence retrieval now, finally the first way to extract a promoter sequence ... ...and/or any other element displayed in the list below. Choose your desired length. Unless you have good reason to change the length of the proximal promoter, leave the defaults!
ElDorado promoter sequence retrieval This shows you all annotated alternative transcripts plus all Affymetrix probe set single probe mappings plus another way to extract your promoter sequence(s)
ElDorado promoter sequence retrieval You know this already... Three different known transcripts for this locus... ... and four distinct promoters ! How this comes, I´ll tell you in a minute
ElDorado promoter sequence retrieval Tick the promoter of your interest... Or submit the promoter directly to MatInspector for graphical analysis. It works on a single sequence, too. Or submit sequences directly to one of those tasks. But they make sense only with multiple sequences. More on that later! ...choose format... ...and extract the sequence.
? ElDorado promoter sequence retrieval But why do I have four promoters here? And two even don´t have a transcript assigned, as it is written here! And what´s all that CompGen thing about? The multiple promoter thing I showed you before. Remember the GCK example, liver and pancreas? Now to the CompGen promoters. They are derived by a proprietary comparative genomics approach.
ElDorado promoter sequence retrieval For our example we have an homologous locus assigned in chimp, macaca, human, rat, dog, cow, opossum, chicken, and zebrafish. The tick-boxes you know already... We need them for later promoter retrieval. Note the Promoter Set number ! Exhaustive cross-mapping of all transcripts to all genomes of all organisms in ElDorado generates our homology groups.
ElDorado promoter sequence retrieval Get a feeling for the degree of phylogenetic conservation of the resp. promoter. See how much experimental evidence supports this promoter
ElDorado promoter sequence retrieval A Promoter Set represents phylogenetically conserved promoters You should be familiar with this view, now. Here the orange indicates a promoter belonging to a promoter set. With these tick-boxes you can switch on and off the display of the different Promoter Sets
? ElDorado promoter sequence retrieval Don´t waste my time here! How do I get my promoter sequence now? And which one of all those promoters should I take ? Well, which one? If you do not have any other information (experimental or from literature), I would recommend that you consider all available alternative promoters for further analysis
? ElDorado promoter sequence retrieval Don´t waste my time here! How do I get my promoter sequence now? And which one of all those promoters should I take ? Two easy ways of promoter sequence retrieval by two mouse clicks I showed you some minutes ago. There are more... oh... you cannot access these options?