220 likes | 230 Views
Classifying Protein Secondary Structure Using Hybrid Neural Networks. Instructor: Dr.Yanqing Zhang Presented by: Navin Viswanath Vijaya S.Chitturi. Contents. Introduction Functional Characteristics From Protein Structure Protein Secondary Structure Prediction Significance
E N D
Classifying Protein Secondary Structure Using Hybrid Neural Networks Instructor: Dr.Yanqing Zhang Presented by: Navin Viswanath Vijaya S.Chitturi
Contents • Introduction • Functional Characteristics From Protein Structure • Protein Secondary Structure Prediction • Significance • Neural Network • Neural Network Architecture • Neural Training and Testing • Hybrid Neural Networks • Training The Network • Running the Network • Future Work
Introduction • Proteins are one of the most basic components in all organisms. • Proteins are made up of linear sequences of twenty natural amino acids joined together by peptide bonds.
Functional Characteristics From Proteins Structure • Proteins are evolved through selection pressure to perform certain functions. • The secondary protein structure is classified into three types: - Alpha Helix - Beta Sheets - Coils
Significance • Today we have much more sequenced proteins than protein's structures. • The gap is rapidly increasing. Problem: Finding protein structure isn’t that simple.Solution: A good start: find the secondary structure.
Neural Networks • A technology idea Using neural networks An attempt to imitate the human brain construction,(assuming this is the way it works)
Neural Network • The neural network basic structure: • Big amount of processors – “neurons”. • Highly connected. • Working together.
Neural Network What does a neuron do? • Gets “signal from its neighbors. • Each signal has different weight. • When achieving certain threshold – send signal. S1 W1 W2 S2 S3 W3
Neural Network General structure of NN : • One input layer. • Some hidden layers. • One output layer. • This NN have • one-direction flow
Concepts of machine learning • Two main categories of learning Supervised learning: Both the inputs and the outputs of a component can be observed. Unsupervised learning: The agent has no inform action about what the correct outputs are.
Network training and testing Test set Correct Neural network Training set Incorrect Back - propagation • Training set - inputs for which we know the wanted output. • Back propagation - algorithm for changing neurons pulses • “power”. • Test set - inputs used for final network performance test.
Hybrid Neural Network Network to detect H o1 Network to detect G o2 o3 Network to detect I Input sequence Max(o1…o7) o4 Network to detect E o5 Network to detect B o6 Network to detect T o7 Network to detect C
Alanine A = 0.40; Cysteine C = 0.50; Aspartic acid D = 0.75; Glutamic acid E = 0.80; Phenylalanine F = 0.15; Glycine G = 0.65; Histidine H = 0.99; Isoleucine I = 0.25; Lysine K = 0.90; Leucine L = 0.20; Methionine M = 0.30; Asparagine N = 0.70; Proline P = 0.45; Glutamine Q = 0.85; Arginine R = 0.95; Serine S = 0.55; Threonine T = 0.60; Valine V = 0.35; Tryptophan W= 0.05; Tyrosine Y = 0.10 Assigning normalized values for amino acids
Design of neural network Hidden layer class CBPNet { public: CBPNet(); ~CBPNet(){}; float Train(float, float, float); float Run(float,float); private: float m_fWeights[3][3]; // Weights for the 3 neurons. float Sigmoid(float); // The sigmoid function. }; Input layer Output layer
Training the neural network • We generate an input file which contains a protein sequence which is used to train the neural network
Training the neural network (contd.) • The hybrid neural network is trained in such a way that each of the 7 networks recognizes a class of amino acids – H,G,I,E,B,T,C. • Known sequences of amino acids are used to train the neural network. • The trained neural network is then run by feeding it random sequences of amino acids.
Running the network • In order to run the network a sequence is first fed as input. • The input sequence is divided into windows each of size 5 or 7. • The network then predicts the class of the amino acid at the middle of the window. • The advantage of using windows is that the structure is predicted based on the structure of the neighboring acids. • Increasing the size of the window may increase the accuracy since a larger neighborhood is considered for prediction.
Running the network(contd.) The classification into seven categories may help improve the accuracy of classifying into a-helix, beta sheet and coil.
Future Work • The method of selection of the class (max.) may result in errors if two networks produce outputs that are almost equal. • Fuzzy logic may be used to select the output in the event of such an occurrence. • Normalized values for each amino acid may be chosen also based on the shape of the molecule.
References • [1] Baldi, P.; Pollastri, G.; Anderson, C.A.F.; Brunak, S., “Matching Protein b-Sheet Partners by Feedforward and Recurrent Neural Networks.”, 2000 • [2] D. G. Kneller, F. E. Cohen and R. Langridge (1990) "Improvements in Protein Secondary Structure Prediction by an Enhanced Neural Network" J. Mol. Biol. (214) 171-182 • [3] Bates, P.A., Kelley, L.A., MacCallum, R.M. and Sternberg, M.J.E. “Enhancement of Protein Modelling by Human Intervention in Applying the Automatic Programs 3D-JIGSAW and 3D-PSSM.”, 2001 • [4]Chou, P. Y. and Fasman, G. D. “Conformational parameters for amino acids in helical, -sheet, and random coil regions calculated from proteins.”, 1974 • [5]ftp://ftp.cmbi.kun.nl/pub/molbio/data/pdbfinder2
Thank you Questions????