240 likes | 352 Views
Molecular Information Theory. Niru Chennagiri Probability and Statistics Fall 2004 Dr. Michael Partensky. Overview. Why do we study Molecular Info. Theory? What are molecular machines? Power of Logarithm Components of a Communication System Discrete Noiseless System Channel Capacity
E N D
Molecular Information Theory Niru Chennagiri Probability and Statistics Fall 2004 Dr. Michael Partensky
Overview • Why do we study Molecular Info. Theory? • What are molecular machines? • Power of Logarithm • Components of a Communication System • Discrete Noiseless System • Channel Capacity • Molecular Machine Capacity
Motivation • Needle in a haystack situation. • How will you go about looking for the needle? • How much energy you need to spend? • How fast can you find the needle? • Haystack = DNA, Needle = Binding site, You = Ribosome
What is a Molecular Machine? • One or more molecules or a molecular complex: not a macroscopic reaction. • Performs a specific function. • Energized before the reaction. • Dissipates energy during reaction. • Gains information. • An isothermal engine.
Where is the candy? • Is it in the left four boxes? • Is it in the bottom four boxes? • Is it in the front four boxes? You need answer to three questions to find the candy Box labels: 000, 001, 010, 011, 100, 101, 110, 111 Need log8 = 3 bits of information
More candies… • Box labels: 00, 01, 10, 11, 00, 01, 10, 11 • Candy in both boxes labeled 01. • Need only log8 - log2 = 2 bits of information. In general, m boxes with n candies need log m - log n bits of information
Ribosomes 2600 binding sites from 4.7 million base pairs Need log(4.7 million) - log(2600) = 10.8 bits of information.
Information Source • Represented by a stochastic process • Mathematically a Markov chain • We are interested in ergodic sources: Every sequence is statistically same as every other sequence.
How much information is produced? Measure of uncertainty H should be: • Continuous in the probability. • Monotonic increasing function of the number of events. • When a choice is broken down into two successive choices, Total H = weighted sum of individual H
Properties of Entropy • H is zero iff all but one p are zero. • H is never negative. • H is maximum when all the events are equally probable • If x and y are two events H(x,y)£ H(x) + H(y) • Conditional entropy: Hx(y)£ H(y)
Why is entropy important? • Entropy is a measure of uncertainty. • Entropy relation from thermodynamics • Also from thermodynamics • For every bit of information gained, the machine dissipates kBTln2 joules.
Information curve Information gain for site l is Plot of this across the sites gives Information curve. For E.Coli, Total information is about 11 bits. … same as what the ribosome needs.
Channel capacity Source transmitting 0 and 1 at 1000 symbols/sec. 1 in 100 symbols have an error. What is the rate of transmission? Need to apply a correction correction = uncertainty in x for a given value of y Same as conditional entropy = 81 bits/sec
Channel capacity contd. For a continuous source with white noise, Signal to noise ratio Bandwidth Shannon’s theorem: As long as the rate of transmission is below C, the number of errors can me made as small as needed.
Molecular Machine Capacity • Lock and key mechanism. • Each pin on the ribosome is a simple harmonic oscillator in thermal bath. • Velocity of the pins represented by points in 2-d velocity space • More pins -> more dimensions. • Distribution of points is spherical.
Machine capacity For larger dimensions: All points are in a thin spherical shell Radius of the shell is the velocity and hence square root of the energy Before binding: After Binding:
Number of choices = Number of ‘after’ spheres that can sit in the ‘before’ sphere =Vol. of Before sphere/Vol. Of after sphere Machine capacity = logarithm of number of choices