410 likes | 609 Views
Biology of transcription factors. Lecture5 Dec 2012 Regulatory Genomics Weizmann Institute Prof. Yitzhak Pilpel. First home-assignment Read this paper:. Proc Natl Acad Sci U S A. 2006 Oct 3;103(40):14724-31. Epub 2006 Sep 26. Necessity. Sufficiency. Hierarchy. TF-TF interaction.
E N D
Biology of transcription factors Lecture5 Dec 2012 Regulatory Genomics Weizmann Institute Prof. Yitzhak Pilpel
First home-assignmentRead this paper: Proc Natl Acad Sci U S A. 2006 Oct 3;103(40):14724-31. Epub 2006 Sep 26.
Necessity Sufficiency Hierarchy TF-TF interaction Ho et al. Nature. 2002 Deduced network Properties - 1 - 0 . 5 . Correlation 0 0 . 5 G 2 G 1 1 M bp 1 M C B M S E Ndt80 U R S 1 Ume6 S C B Swi4 M C M 1 ' M C M 1 ' S F F ' Fkh1 0 . 2 0 . 4 Expression Coherence 0 . 6 0 . 8
Foxp2 TF: a human regulator involved in speech • In humans, mutations of FOXP2 cause a severe speech and language disorder. • Positive selection for variability in human compared to other vertebrates.
A Bayesian approach (conditional probability) • Xi could be“1” to denote denot: • The presences of motif m • Its distance from TSS is < N • Its on the coding strand • It neighbors another motif m’ • Or “0” otherwise ei = being expressed in pattern i
Example: two rRNA processing motifs The two motifs Work together The two motifs’ orientation matters
The procedure • Given that P(N|D)=P(N)*P(D|N) / P(D): • Search in the space of possible Ns to look for a network that maximizes the above probability • Impossible to enumerate all possible networks, thus needs an optimization algorithm • Use cross validation: partition the data into 5 gene sets, learn the rules based on all but one and test based on the left-out, each time.
For example: what does it take to belong to expression patter (4)? • Need to have RRPE and PAC • If PAC is not within 140 bps from ATG , but RRPE is within 240 bps then the probability of pattern 4 is 22% • If PAC is within 140 and RRPE is within 240 bp then 100% chance
Regulation of basal transcription in the promoter of IL-18 binding protein (Hurgin V, Novick D, Rubinstein M, PNAS 2002 ) L R Luc (1.0) ‚ pGL3( 1 272) ƒ ƒ Luc ‚ pGL3(1272 mGAS) ƒ ƒ Luc ƒ ƒ Luc pGL3(1272 mIRF-E) ‚ ƒ pGL3(1272 mC/EBP-E1) Luc pGL3(1272 mC/EBP-E2) ƒ ‚ Luc ‚ Luc pGL3(122) -1500 -1000 -500 -1 0 10 20 30 40 bp Basal expression For basal expression (1 AND 2) AND ((3L AND 3R) OR (NOT3L and NOT3R))
Inferring various logical conditions (“gates”) on motif combinations
The Bayesian network predicts very accurately expression profiles
S. Cerevisiae S. mikatae, S. kudriavzevii, S. bayanus). S. castellii S. Kluyveri Their intergenicsequences spanned 40 to 67% identity
Nucleotide conservation in promoters is highest close to the TSS TATA-containing genes All genes
Expression coherence score, intuition 1 2 * * EC1=0 EC2=0.66 * * * * * * * * * 3 4 * * * * * * * * * * * * * * EC3=0.2 EC4=0.2 * * * * * * Threshold distance, D
? ? ? ? ?
The data • Examined intergenic regions of human mouse rate and dog • ~18,000 genes • “Promoters”: 4kb centered on TSS • 3’UTRs based on RNA annotations • 64 Mb, and 15 Mb in total respectively for promoters and 3’ UTRs • Negative control: Introns of ~120 Mb • % of alignable sequence: • promoters: 51% (44% upstream and 58% downstream of the TSS), • 3’ UTR: 73%, • Introns:34%, • Entire genome: 28%
The phylogenetic trees • Questions: • How would addition of species affect analyses? • What if the sequences were not only mammalian?
An example: a known binding site of Err-a in the GABPA promoter • Questions: • What is the “meaning” of the other conserved positions?
Discovery of new motifs: exhaustive enumeration of all 6-mers
Discovery of new motifs: exhaustive enumeration of all 6-mers
Same methods to look for motifs in 3’ UTRs reveals strand-specific motifs
The most studied human TFs: a sever bias towards disease-related regulators
The most TF-regulated biological process: most knowledge comes from model organisms
A few structural families account for most human TFs From motif to TF fold? Structure function relationship: homeodomain-containing TFs are often associated with developmental processes, and those in the interferon regulatory factor family are generally associated with triggering immune responses against viral infections
TFs expression across tissues: TFs are always more lowly expressed compared to other genes in the same tissue. Why?
Most TFs are either tissue specific or very ubiquitous. What types of combinations between TFs do we expect here?
First home-assignmentRead this paper: Proc Natl Acad Sci U S A. 2006 Oct 3;103(40):14724-31. Epub 2006 Sep 26.
The phylogenetic profiles of human TFs For example: 13% of the human TFs are primate specific, while only 2% of our metabolic enzymes are primate specific
The chromosomal arrangement of the human TFs High TF density Hox TFs