410 likes | 487 Views
CS 598SS Lecture 2. Saurabh Sinha. Transcription. Process of making a single stranded mRNA using double stranded DNA as template Only genes are transcribed, not all DNA. Transcriptional Regulation in action. Segmentation of fruitfly embryo. Adult fruitfly. Cavity with Single cell. Early
E N D
CS 598SSLecture 2 Saurabh Sinha
Transcription • Process of making a single stranded mRNA using double stranded DNA as template • Only genes are transcribed, not all DNA
Segmentation of fruitfly embryo Adult fruitfly Cavity with Single cell Early Embryo Source: From DNA to Diversity, Carroll et al.
Some genes are asymmetrically deposited by mother Target genes are expressed in “gapped” domains Further refinement of striped pattern.
Asymmetry of gap genes Transcription Factor R (ACTIVATOR) Gene G
Gene off here Asymmetry of gap genes Transcription Factor R1 (REPRESSOR) Transcription Factor R (ACTIVATOR) Gene G
Gene on here Gene off here Gene off here Asymmetry of gap genes Transcription Factor R1 (REPRESSOR) Transcription Factor R (ACTIVATOR) Transcription Factor R2 (REPRESSOR) Gene G
Gene on here Module Asymmetry of gap genes Gene G
Module Kr Gt Gt Kr Gt Kr Repressors Activators bcd bcd bcd Hb bcd bcd Another example: eve stripe 2 module Gene SOURCE: http://www.nyu.edu/fas/dept/biology/faculty/small/smallfig7_big.html
“Module” • A “module” has a cluster of binding sites that mediate the action of several transcription factors, to control a target gene’s expression • Modules are typically 200-1000 bp long • One or many occurrences of binding sites for transcription factors • Typically, 3-6 transcription factors are involved in regulating a module
From Steve Small, NYU Why “module”? Expression pattern of even-skipped (eve) gene
Why “module”? Expression pattern of even-skipped (eve) gene Eve stripe 2
Eve Stripe 2 From Steve Small, NYU Why “module”? Expression pattern of even-skipped (eve) gene Eve stripe 2 Eve gene on Chromosome 2R
Eve Stripe 2 From Steve Small, NYU Why “module”? Expression pattern of even-skipped (eve) gene Eve stripe 2 Regulatory sequence associated with eve Stripe 2
Eve Stripe 2 From Steve Small, NYU Why “module”? Regulatory sequence associated with eve Stripe 2
Eve Stripe 2 From Steve Small, NYU Why “module”? Reporter gene Regulatory sequence associated with eve Stripe 2
Eve Stripe 2 From Steve Small, NYU Why “module”? Reporter gene Reporter gene shows same pattern ! Regulatory sequence associated with eve Stripe 2
Binding sites • Binding sites of transcription factor “Bicoid”, collected experimentally
T A A T C C C Motif http://webdisk.berkeley.edu/~dap5/data_04/motifs/bicoid.gif
W A A T C C N Motif W = T or A N = A,C,G,T “Consensus String” http://webdisk.berkeley.edu/~dap5/data_04/motifs/bicoid.gif
Motif • Common sequence “pattern” in the binding sites of a transcription factor • A succinct way of capturing variability among the binding sites
Alternative way to represent motif Position weight matrix (PWM) Or simply, “weight matrix”
Motif representation • Consensus string • May allow “degenerate” symbols in string, e.g., N = A/C/G/T; W = A/T; S = C/G; R = A/G; Y = T/C etc. • Position weight matrix • More powerful representation • Probabilistic treatment
The motif finding problem • Suppose a transcription factor (TF) controls five different genes • Each of the five genes should have binding sites for TF in their promoter region Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 Binding sites for TF
The motif finding problem • Now suppose we are given the promoter regions of the five genes G1, G2, … G5 • Can we find the binding sites of TF, without knowing about them a priori ? • Binding sites are similar to each other, but not necessarily identical • This is the motif finding problem • To find a motif that represents binding sites of an unknown TF
A variant of motif finding • Given a motif (e.g., consensus string, or weight matrix), find the binding sites • For consensus string, problem is trivial • For weight matrix, not so trivial
Given a string s of length l = 7 • s = s1s2…sl • Pr(s | W) = • Example: • Pr(CTAATCCG) = • 0.66 x 0.88 x 0.99 x 0.99 x 0.88 • x 0.99 x 0.88 x 0.11 Binding sites from a weight matrix motif W Probability of each base In each column Counts of each base In each column Wk = probability of base in column k
Binding sites from a weight matrix motif • Given promoter sequence S (e.g., 1000 base-pairs long) • For each substring s of S, • Compute Pr(s|W) • If Pr(s|W) > some threshold, call that a binding site • Look at S, as well as its “reverse complement” • Rev.Compl. of AGTTACACCA is TGGTGTAACT • (That’s what is on the other strand of DNA)
Regulatory network Genetic regulatory network controlling the development of the body plan of the sea urchin embryo Davidson et al., Science, 295(5560):1669-1678.
Regulatory network • Computational problem is to unravel the entire regulatory network • Sequence data • Other forms of data (e.g., information about which genes are on and which genes are off, under different conditions)
DNA Replication http://www.ncc.gmu.edu/dna/repanim.htm
“Slippage” in replication SOURCE: http://www.virtuallaboratory.net/Biofundamentals/lectureNotes/Topic3-8_repair.htm
Tandem repeats • Short string repeated • Almost identical strings next to each other • Result of slippage during replication ACGCGGACGGTGAACGCTGTATACTA Tandem repeat
Tandem repeats • Very frequent in some genomes • Mechanism for evolution • Motif finding algorithms should consider the existence of tandem repeats • Why ?