Protein Analysis

1. Protein Analysis

2. Protein Analysis Peptide Mapping Structural/Functional Motifs Secondary Structure Prediction

3. Motifs Identification of Functional Domains

4. Identifying Functional Protein Domains Search protein sequences with a database of defined functional motifs Motifs are derived by aligning peptide regions which have been shown to have common function A sequence specification is derived from the alignment which can be used to search for similar motifs in other protein sequences

5. Motif Sequence Specifications The sequence specification is the same as for FindPatterns. This is used as a consensus pattern in a search Motifs The sequence specification may also be defined as a profile constructed from a set of aligned sequences and used as a part of a library of profiles in a search ProfileScan

6. Pattern Definitions Findpatterns, Map, Mapsort, Mapplot, and Motifs all let you search with ambiguous expressions Expressions can include any legal GCG sequence character Expressions can also specify: OR and NOT matching Begin and end constraints Repeat counts

7. TAATA(N){20,30}ATG TAATA, followed by 20 to 30 of any base, followed by ATG

8. Repeats Parentheses () enclose one or more symbols that can be repeated Braces {} enclose numbers that tell how many times the symbol(s) must be found (GA){2,10} - GA repeated 2 to 10 times G{2,} - G repeated 2 to 350,000 times (GAT){,10} - GAT repeated 0 to 10 times

9. OR Matching Enclose the different choices in parentheses and separate the choices with commas RGF(Q,A)S RGF followed by either Q or A followed by S. GAT(TG,T,G){1,4}A means GAT followed by any combination of TG, T, or G repeated from 1 to 4 times followed by A

10. NOT Matching Use the ~ symbol GC~CAT GC, followed by any symbol except C followed by AT GC~(A,T)CC GC followed by any symbol except A or T, followed by CC.

11. BEGIN AND END Constraints The pattern <GACCAT can only be found at the beginning of the sequence The pattern GACCAT> can only be found at the end of the sequence

12. Motifs Uses the Prosite dictionary of peptide motifs to search for occurrences of each motif in a query sequence

13. Prosite Dictionary of protein sites and patterns http://www.expasy.ch/prosite/ Distributed by EMBL and maintained by Dr. Amos Bairoch at the University of Geneva Release 16.35; 13-Apr-2001 1,462 motif descriptions GCG at release 16, 7/1999

14. Prosite Files Site name Site Description The sequence motif in FindPatterns format An abstract file describing the motif along with references

15. Restrictions Patterns are limited to 350 characters Motifs does not introduce gaps Mismatches can be tolerated with /Mis=n

22. ProfileScan Uses a database of profiles to scan query sequences for matching structural motifs New profiles can be created with ProfileMake

23. Validated Profiles Profiles derived from a group of sequences aligned at a common functional domain All sequences used to create the profile correctly align to the profile All sequences known to contain the motif score above the high level Supplied profiles validated by Dr. Michael Gribskov San Diego Supercomputing Center

24. Profile List In ProfileDIR 629 different profiles analyze% to profiledir analyze% more profilename.prf to see documentation

35. Isoelectric Plots the charge as a function of pH for any protein

42. CoilScan Locates coiled-coil motifs Involved in protein-protein interactions Uses weight-matrix from known coiled-coil structures to search for matching structures Locates solvent exposed coiled-coils parallel and antiparallel two-stranded coiled-coils parallel three-stranded coiled-coils

43. Coiled-Coil Structures Bundles of two or more alpha helices that are supercoiled together Each alpha helix in a coiled coil is strongly amphipathic The pattern of hydrophilic and hydrophobic amino acids repeats every seven residues Five of the seven residue positions in the coiled-coil heptad repeat are hydrophilic 1 and 4 are hydrophobic

44. CoilScan Settings Use the largest window length (28) for predicting new coiled-coil segments Higher resolution Use smaller window sizes to identify the ends of the coiled-coil segment with greater precision. -weight increases the weighting of the hydrophobic residues allowing less chance of detecting highly charged sequences

50. HTHscan Locate helix-turn-helix motifs Signature of DNA binding structures Gene regulation Uses a weight-matrix of known H-T-H structures AraC (bacterial regulatory helix-loop-helix proteins) LysR (bacterial regulatory helix-loop-helix proteins) homeobox domains

55. SPScan Locate secretory signal peptide motifs Available weight matrices: Eukaryotes Gram-positive prokaryotes Gram-negative prokaryotes

56. SPScan Calculation von Heijne's weight matrix method McGeoch's criteria Scan entire protein sequence for potential starting points Only methionines are considered SP starting points

57. SPScan Calculation Identify n-region/charged region 11 or fewer residues containing at least one charged amino acid residue R, K, or E Identify h-region/uncharged region Hydrophobicity of >=15 Kyte-Doolittle 8-residue Window Score with Weight Matrix

62. Secondary Structure Prediction

63. Protein Secondary Structure Predictions The primary sequence of a protein contains the information necessary to predict higher order interactions among the constituent groups of that protein Once the rules are known, we can let the computer do the work What are the rules?!!!!

64. Considerations Many of the measures used in structure prediction have been derived empirically The data has been obtained from proteins whose structure has been determined by X-ray crystallography. The dataset is limited Very little "hard" data is used to derive the mathematical formulas and constants used to make the predictions.

65. Chou-Fasman Predictions Applies to soluble (globular) proteins. Derive numerical value for each amino acid reflecting its conformational preference. Numbers derived empirically.

66. Rules Derive "arbitrary" rules to determine local peptide conformation based on the amino acid sequence and the secondary structure propensity for each amino acid.

67. Helices Cluster of 4 helical residues out of 6 will nucleate a helix. Pro, Asp, Glu are at the amino terminal ends of a structure Pro can only occur in the first three amino acids

68. Helices His, Lys, and Arg appear at the carboxy terminus of a helix Structure continues until alpha tetrapeptide breakers with P-alpha falls below 1.0

69. Beta Sheets 3/5 beta formers nucleate a sheet until beta tetrapeptide breakers with P-beta < 1.00 are reached Regions with both alpha and beta formers Helical if P-alpha > P-beta Sheet if P-beta > P-alpha

70. Turns Turns based on tables giving the frequencies of all 20 amino acids in a 4-residue bend, and the P turn values for each amino acid.

71. Garnier-Osguthorpe-Robson Conformational state of a particular residue is determined not only by the empirical values of the residue itself, but also include the values of 8 residues on each side of that particular residue. Cooperative effects are included at the outset. Fewer arbitrary rules as to structure formation.

72. Including Biological Data Can include decision constants to weight (and therefore improve) the prediction when information as to the amount of helix and sheet structures is known. Not available in GCG Circular dichroism Raman spectroscopy

73. Accuracy of the Predictions Q-score; percentage of residues placed in the correct structural class. Three state predictions; Alpha, Beta, Coil Random = 33% Four state predictions include turns Random = 25%

74. Observed Accuracy Using structures known at the time the predictive methods were compiled (late 1970s), a Q score of approximately 68% was obtained (three state). Analysis of protein structures determined since then gave Q scores of around 55% for any of the predictive schemes (three state).

75. Observed Accuracy Four state predictions gave Q scores of around 45%. Internal beta sheets more accurately predicted than external sheets. Q drops from 60 to 20% (three state).

76. Hydropathic Plots Determine antigenic regions Determine membrane spanning regions Determine protein folding patterns; Residues on the interior vs. exterior of the protein.

77. Hopps-Woods Measure directed towards the determination of antigenic sites. Values for each amino acid derived from published values of the partitioning of individual amino acids between solvents. Ethanol-Water Ethanol chosen as a solvent which might resemble the solvent phase on the interior of a protein. Ethanol may not be the best solvent choice.

78. Hopps-Woods The values were altered for some amino acids based on empirical data to better reflect an association between hydrophilicity and known antigenic sites. Proline; -1.4 to 0 Aspartic acid; 2.5 to 3 Glutamic acid; 2.5 to 3

79. Hopps-Woods Window Window of six was chosen as the approximate size of an antigenic determinant. GCG uses a default of 7

80. Kyte-Doolittle Use hydrophilicity values for individual amino acids based upon water-vapor transfer free energies. Ethanol is not a neutral, non-interacting solvent. Also use empirical data based on the partitioning of individual amino acids to the exterior or interior of proteins with known structures.

81. Kyte-Doolittle Included subjective assessments as to the hydropathic character of any individual amino acid. Best window found to be 7 - 11 residues lowering the noise without smoothing out significant peaks.

82. Flexibility Flexibility measure determined from B-value of C-alpha atoms of the individual amino acids Temperature Factor reflecting the flexibility constraint on the alpha-Carbon. Derive a formula empirically to fit known flexibility data from crystal structures Flexibility can be severely constrained by tertiary interactions. S-S bonds

83. Surface Probability Utilizes empirical determination of which amino acids were found to reside at the surface of proteins with known structure.

84. Predicting Antigentic Sites Surface features with high degree of exposure to the solvent. Hydrophilic Regions of high numbers of turns. Determination of the most probable antigenic sites allows some predictive value for the most likely synthetic peptides to use for making antibodies

85. Antigenic Index Measure combining all of the above data; Note that many of these measures are derived from similar empirical data. AI= 0.3*H + 0.15*S + .15*F + 0.2*C + 0.2*G C; G: Chou-Fasman and Garnier turn predictions

86. N-Glycosylation Sites Asn-X-Ser Asn-X-Thr Only minor probability when X=Asp, Trp, or Pro.

87. The Prediction Programs

88. Moment Calculates the hydrophobic moment for a peptide. May be predictive of amphipathic alpha helical or beta sheet structures.

89. Moment Calculation Calculates the hydrophobicity of one side of the peptide chain as the amino acid residues are rotated through 180? Plots a contour graph of the hydophobicity versus the angle of rotation for each residue (or window of residues)

90. Moment Predictions Alpha Helices show a typical rotation per residue of 100? A significant increase in hydrophobicity in the contour plot at 100? would indicate the possible existence of an amphipathic helix for the corresponding residues.

91. Moment Predictions Beta Sheets show a rotation of approx. 160? Significant amphipathic hydrophobicity at a rotation of 160? would then indicate the possibility of a beta sheet for the corresponding residues.

93. HelicalWheel Plots the amino acids of a peptide sequence along a helical wheel in order to recognize regions of amphipathic helices Residues are plotted at 100? offsets from the preceding residue

98. PepPlot Plots of Various Computer Predictions of Peptide Structure

99. PepPlot Sequence Charged-polar-hydrophobic residues Beta forming-breaking residues Chou-Fasman alpha-beta prediction Alpha forming-breaking residues

100. PepPlot Chou-Fasman amino-end predictions Chou-Fasman carboxy-end predictions Chou-Fasman turn predictions Helical Hydrophobic Moment plot for alpha and beta Hydropathy and hydrophilicity

108. Panel A - The Sequence

110. Panel b - Residue Schematic Hydrophilic, charged (Green) down = acidic up = basic Hydrophilic, uncharged (Red) short = amides long = alcohols Hydrophobic (Blue) short = aliphatic long = aromatic Proline (Black) Alanine, Glycine, Cysteine (UnMarked)

112. Panel c - Beta Forming and Breaking Residues Chou-Fasman rules indicating amino acids which tend to form or break beta sheet structures.

114. Panel d - Alpha and Beta Prediction Curves Chou-Fasman rules indicating the propensity of the sequence to form an alpha helix or a beta sheet.

116. Panel e - Alpha Forming and Breaking Residues Chou-Fasman rules indicating amino acids which tend to form or break alpha helical structures.

118. Panel f - Amino End Association Chou-Fasman rules indicating amino acids which tend to be present at the amino ends of an alpha or beta structure.

120. Panel g - Carboxy end association Chou-Fasman rules indicating amino acids which tend to be present at the carboxy ends of an alpha or beta structure.

122. Panel h - Chou-Fasman Turn Predictions Chou-Fasman prediction of the likelihood of a turn.

124. Panel i - Helical Hydrophobic Moment Eisenberg's hydrophobic moment prediction of the likelihood of the presence of an amphiphilic structure. Alpha helix - Plot of HM maximum for 90? - 110? of rotation using a window of eight residues. Beta sheet - Plot of HM maximum for 140? - 180? of rotation using a window of six residues.

126. Panel j - Hydropathy and Hydrophilicity Plot Kyte and Doolittle (black curve); Average hydrophobicity over a window of nine residues.

127. GES Goldman, Engleman, and Steitz Transbilayer Helices (green curve) Identification of nonpolar transbilayer helices over a window of 20 residues. Based on possible lipid-protein interactions and the helical arrangement of the hydrophobic residues.

129. PeptideStructure Compiles various pieces of information concerning protein structure for display using PlotStructure

134. PlotStructure Display of PeptideStructure Results

139. VSV G 1-100 CF

140. VSV G 1-100 G

141. VSV G

142. VSV G Garnier

144. VSV G Surface Probability

145. VSV G Flexibility

146. VSV G Antigenic Index

147. Flu HA

148. Towards the Holy Grail� Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction http://predictioncenter.llnl.gov/ Structure prediction of proteins whose coordinates are not yet publicly available

149. CASP4 � Asilomar, 12/2000 Are the models produced similar to the corresponding experimental structure? Is the mapping of the target sequence onto the proposed structure (i.e. the alignment) correct? Have similar structures that a model can be based on been identified? Are the details of the models correct? Has there been progress from the earlier CASPs? What methods are most effective? Where can future effort be most productively focused?

150. Web-based Prediction Tools ExPASy Swiss Institute of Bioinformatics http://www.expasy.ch/ BCM Search Launcher Baylor College of Medicine http://searchlauncher.bcm.tmc.edu/ The PredictProtein server Columbia University http://maple.bioc.columbia.edu/

151. Next Up Sequence Comparison

Protein Analysis

Protein Analysis

Presentation Transcript

Protein Purification and Analysis

Protein Sequence Analysis - Overview

CEG Protein Analysis Workshop

Network Analysis of Protein-Protein Interactions

Protein Analysis

Protein Analysis Course

Protein network analysis

PROTEIN SEQUENCE ANALYSIS

Protein kinase analysis Kinome

PROTEIN ANALYSIS

Protein Analysis

PROTEIN ANALYSIS

Protein sequence analysis

Protein Structure Analysis - II

Protein Ubiquitination Analysis

Protein Structure Analysis

Protein Methylation Analysis

Protein analysis

Protein Structure Analysis - II

Protein analysis

Protein Structure and Analysis