850 likes | 1.02k Views
Lab Meeting June 7, 2001. John Wrobel. HIV-1 Reverse Transcriptase. p66. heterodimer. p51. HIV-1 RT with DNA template. p66. p51. 66 kd subunit. thumb. fingers. RNaseH. connection. palm. Catalytic Site (aspartic acid triad). D110 D185 D186. p66 with DNA template.
E N D
Lab Meeting June 7, 2001 John Wrobel
HIV-1 Reverse Transcriptase p66 heterodimer p51
HIV-1 RT with DNA template p66 p51
66 kd subunit thumb fingers RNaseH connection palm
Catalytic Site (aspartic acid triad) D110 D185 D186
HIV-1 Reverse Transcriptase dNTP binding site nnRTi binding site RNaseH active site Image from Arnold Lab
Fingers & Palm subdomainsin MMLV RT Active site Fingers Palm
HIV-1 RT subunits(primary sequence) 1 85 323 438 120 151 243 560 fingers palm fingers palm thumb connection RNaseH p66 fingers palm fingers palm thumb connection p51
Region of Mutagenesis in Fingers & Palm of HIV-1 RT Mutagenic cassette = P95 to E203
Collection of HIV-1 RT single missense mutations 95 100 105 110 115 120 125 130 135 140 145 150 | | | | | | | | | | | | wild-type P H P A G L K K K K S V T V L D V G D A Y F S V P L D E D F R K Y T A F T I P S I N N E T P G I R Y Q Y N V L P sequence Conservative A A P A I R R A I P M N A E G F A I V E D E K R F P G Y P L S G V D D P T M K F E F D L V Substitutions T A N A A S S V T A I I S S A A Non- E N L V E F Q E N T L Q R G P V E F H V S L L D L F A G Y L W L D D L L I K H Y E H K G R R V N I N L D T A P L Conservative L L E L Q I M T V E G Y G V C T H W G A V N V E C V T L R K I A I L E F S C N T R Substitutions A Q A Q G R Y V F R A Y V I H V A E R R N I H K R Q D C F Q W Q C N T V S Q F C V V N T N Q L N L L F Q Q N 155 160 165 170 175 180 185 190 195 200 | | | | | | | | | | wild-type Q G W K G S P A I F Q S S M T K I L E P F R K Q N P D I V I Y Q Y M D D L Y V G S D L E I G Q H R T K I E sequence Conservative A F A S T M Y E I S R L V D A Y K R E A E V M F E F I E E V F I A A E I D L E R A V D Substitutions T S V V M S L L L N N M L P N V S M L V V T P Non- P R G I E L R L R C P R C K I T S F V H C S T K K L H T D F H P N K V A F C A R F Y F K T V R Y T I N K V Conservative H V C E R K L L C I T T S L I T Q P T R V R F T S R C G H W G E Y V G K W L N I K I R G Substitutions R N L F A I N R F F Q R L G N R S A F S H D Y G S G V R R H P S R Q A Q N H N N N Q I L N K H Y N A Q F S N
Problems with contact definition • How to define a contact unambiguously? • How many contacts does a residue make? • Analysis of 3D structures requires 3D definition of contacts • Delaunay tessellation provides a unique way to define FOUR nearest neighbors in 3D as vertices of tetrahedra
Voronoi Tessellation Voronoi tessellationpartions space into convex polytopes calledVoronoi polyhedra Proteins Voronoi polyhedronis the region of space around an atom, such that each point of this region is closer to the atom than to any other atom
Delaunay Simplex A group of 4 atoms whose Voronoi polyhedra meet at a common vertex forms aDelaunay simplex Delaunay tessellationof a protein structure generates an aggregate of space-filling non-overlapping irregular tetrahedra
Delaunay Simplices 2D Delaunay Simplices = triangles 3D Delaunay Simplices = tetrahedra
Voronoi/Delaunay Tessellation in 2D Delaunay simplex isdefined by points, whose Voronoi polyhedra havecommon vertex Delaunay simplex is always a triangle in a 2D space and a tetrahedron in a 3D space Voronoi Tessellation Delaunay Tessellation
Defining nearest neighbors Each amino acid residue represented by a single point (aC) Vertices of each simplex objectively define 4 nearest aC atoms & therefore 4 nearest neighbor residues
Differences Voronoi polyhedramay differ topologically (they may have different number of faces & edges) Delaunay simplicesare always topologically equivalent (tetrahedra in 3D space)
5 Classes of Delaunay Simplices {4} = all residues of simplex are consecutive in protein seq {3,1} = 3 residues consecutive, 4th is distant in seq {2,2} = 2 pairs of consecutive residues are distant in seq {2,1,1} = 2 residues consecutive & 2 other residues are distant from the first 2 & from each other {1,1,1,1} = all 4 residues are distant from each other
Database • Dataset of unique protein structures identified (Pro Sci 3, 522) • This dataset contains 322 protein chains (66,852 amino acids) • with high crystallographic resolution that do not have apparent • structural similarity and carry low sequence identity (25%) • Tessellation of the dataset generates 387,880 simplices
Statistical analysis of Delaunay Simplex Composition • Quadruplet Composition Types • 204 = 160,000 • Geometrical Description of Tetrahedra: sequence order independence of composition • Theoretical Number of Quadruplets is reduced to 8855
Compositional Propensities of Delaunay Simplices observed freq expected freq Ratio: 1 non-specific to folding > 1 some forces that bring them together, some specificity
Equation q = log-likelihood factor ijkl = amino acid residues qijkl = q for a given quadruplet (likelihood of finding 4 particular residues in a simplex) fijkl = observed freq of occurrence of a given quadruplet
Log-likelihood of amino acid quadruplets with different compositions 1 CCCC 3.081003 2 CCCY 2.13004 3 CCHH 1.960814 4 CCCG 1.782267 5 CCCH 1.742759 6 CCCW 1.724275 7 CCCS 1.724275 8 CCCQ 1.657329 Log-likelihood ratio 8343 CDDL -0.90166 8344 IRRV -0.90217 8345 AEYY -0.90535 8346 KKRV -0.95081 8347 CKRS -0.96133 8348 CEKP -0.98433 8349 HKKS -0.98472 8350 CGLR -1.14737
Plot reveals highly non-random distribution For some quadruplets observed frequencies are orders of magnitude higher (or lower) than expected from random model
PROCAM Protein Core Alignment Map View with Netscape
My Project Goal: Combine protein chemistry & protein evolution Evolution of retroviral RTs Are hydrophobic cores found in same place? (Procam) Keeping track of aa residues among retroviral RTs conversion file (Microsoft Access)
Retroviral Tree HSRV (spumavirus) MMLV (mammalian C-type) BLV (HTLV/BLV) RSV (avian C-type) MMTV (B-type) MPMV (D-type) HIV-1 (lentivirus)
Region of Eickbush alignment fingers palm fingers palm thumb connection RNaseH HIV E44 L234 MMLV L82 L273
MMLV conserved tetrahedra (negative) Kinimage