650 likes | 1.11k Views
Protein structure prediction. Einat Granot Liron Atedgi. Protein folding . Protein folding determined by A ” A sequence Why knowing the folding is importance ? Determine it ’ s functionality Find distant evolutionary relationship Design drugs. Protein structures. Primary structure
E N D
Protein structure prediction Einat Granot Liron Atedgi
Protein folding • Protein folding determined by A”A sequence Why knowing the folding is importance ? • Determine it’s functionality • Find distant evolutionary relationship • Design drugs
Protein structures • Primary structure • Secondary structure • Tertiary structure
Two prediction methods • PSI-PRED– secondary structure prediction based on PSIBLAST • GenTHREADER– tertiary structure prediction Were developed by the group of David T.Jones,University of Warwick
Methods general format Sequence Alignment + Additional data Neuron networks Structure prediction
Neuron networks Output Numerical inputs Units Why do we call it neuron network ? Every unit performs weighted calculation
Neuron network hidden layer with the increasing number of added layers the mean square error is lower Hidden layer
Neuron networks training • Network connections and weights determined by training process • Training performs by samples of input and expected output. • The learning algorithm is called back propagation
Network training & testing After training we perform testing • Training and testing groups must be chosen very carefully • What problems can arise ? • Insufficient training or testing • Testing group may be biased
Neuron networks is a “black-box” • The specific algorithm ofa working neuron networkis not known • It’s hard to deduce new biological principles about the solved problem
PSI-PRED Secondary structureprediction
Secondary structure prediction • In DSSP – 8 secondary structures categories • In PSI-PRED – were joined into 3:Strand(E), Helix(H) and Coil(C) AA: RLMPHIKRSAIPVNHGQCRWEDNVDERTNCMIQYVLIMRD Pred: CCCCCHHHCCCCCCEEEEEECCCCCCHHHHEEEEEECCCC
PSI-PRED sequence alignment (Find homologous) Create protein profile Insert to first neuron network Insert to second neuron network Final prediction
sequence alignment • Finding homologous for target protein using PSI-BLAST Reminder … ? What is PSI-BLAST…? Position Specific Iterated Blast,giving output to PSSM.
PSI-BLAST Pros & Cons Pros : • Sensitive to distant homologous • Reliable • Accessible from every workstation Cons : • Sensitive to distant homologous - Result might be biased • Sensitive to repetitive sequences
Solving PSI-BLAST problems • A special DB of 340,000 sequences was constructed for PSI-PRED • This DB contains only unique and unrepetitive sequences
PSI-PRED sequence alignment (Find homologous) Create protein profile Insert to first neuron network Insert to second neuron network Final prediction
Create protein profile • PSI-PRED uses the PSSM from PSI-BLAST produced after 3 iteration • This matrix is processed by transformation f(x) = , so the final values are between 0 to 1
PSSM – Output of PSI-BLAST Transformation
Create protein profile • The matrix size is M x 20, when M is the sequence length • Addition column is added which defined the N/C terminus -> M x 21 matrix
PSI-PRED sequence alignment (Find homologous) Create protein profile Insert to first neuron network Insert to second neuron network Final prediction
Networks training & testing • 187 proteins were selected according to CATH and PSI-BLAST • CATH filters proteins according to their folding domains configuration (T-level) • This considered to be a strict selection
First neuron network Every time, a sequence of 15 A”A long is inserted into the first network The output is a matrix 15 x 3
PSI-PRED sequence alignment (Find homologous) Create protein profile Insert to first neuron network Insert to second neuron network Final prediction
Second neuron network The input for the 2nd network is the output from the 1st one Again, another column is added, indicates the N/C terminus
Why do we need a second network? Let’s examine a possible prediction from the 1st network… What is the problem with this prediction ? Seq VLFLNDNLDDVVIGRPKRTYTAITL Pred EEEECCCCHHHCCCHCCCEEEECC A single A”A helix does not exist The 2nd network maintains the coherency between adjacent A”A and improves the accuracy
PSI-PRED sequence alignment (Find homologous) Create protein profile Insert to first neuron network Insert to second neuron network Final prediction
Final prediction Image of prediction Degree Of confidence Target sequence Secondary structure
PSI-PRED evaluation • CASP– Critical Assessment of technique for protein Structure Prediction experiments • At CASP3 PSI-PRED achieved the best results from all other methods participated
PSI-PRED evaluation Q3 average : PSI-PRED - 76.3% JPRED – 72.4% DSC - 67.3% Q3 score – percentage of A”A predicted correctly
Reasons for success • The use of PSI-BLAST • More sensitive (iterative algorithm) • More accurate (pairwise local alignments) • Usage of neuron networks • Strict selection for training & testing
Possible improvements • Larger data bases (training & alignment) • Combinations with other methods (JPRED) • Predict more than 3 secondary structure
GenTHREADER Tertiary structure Prediction
Threading methods • Trying to thread a target A”A sequence on a template 3D structure M Q S N I L D V R E R A Q T V L C N K
Templates collection • Target sequence is compared against a collection of sequences with known folding • The collection was taken from Brookhaven Protein Data Bank and includes unique sequences
GenTHREADER Sequence alignment Calculate threading potential Insert to neuron network Final prediction
Sequence alignment • The target sequence is aligned against each of the templates twice: • Target profile against template sequence • Target sequence against template profile • The best result is taken
Creating a profile Steps for creating a profile : • Alignment against OWL DB(A DB for coding sequences) • Selection of sequences with E-Value lower than 0.01 • Constructing a profile using BLOSUM50
Creating a profile A L M P H I K R S A I P V N H G Y V I M Q C R W E D N S T K V
GenTHREADER Sequence alignment Calculate threading potential Insert to neuron network Final prediction
Calculate threading potential Threading potential includes : • pairwise potential • solvation potential
Pairwise potential • Potential for interaction between two A”A • Considerate analysis of known structure and favorable energy configuration • Lower pairwise potential indicates a favorable state
Solvation potential • Calculated per A”A and proportional to its degree of burial • Degree of burial (DOB)– The num of other A”A located in a radius of 10Å • Hydrophobic acids - a high DOB is preferred • Hydrophilic acids - a low DOB is preferred
GenTHREADER Sequence alignment Calculate threading potential Insert to neuron network Final prediction
Insert to neuron network • Prediction is very complex therefore a neuron network is used
Neuron network • Again, the 6 input parameters were converted to values between 0 – 1 using the function f(x) = • The output is a value between 0 -1 showing the confidence of the match
Network training & testing • The network was trained using pairs of proteins with known folding patterns • Again the training and testing sets were separated to avoid bias
GenTHREADER Sequence alignment Calculate threading potential Insert to neuron network Final prediction