250 likes | 401 Views
Use of Neural Networks to Predict and Analyze Membrane Proteins in the Proteome. Subrata Kumer Bose Intelligent Systems Research Centre London Metropolitan University, UK. Abstract.
E N D
Use of NeuralNetworks to Predict and AnalyzeMembrane Proteins in the Proteome Subrata Kumer Bose Intelligent Systems Research Centre London Metropolitan University, UK Intelligent Systems Research Centre
Abstract • Transmembrane (TM) proteins are one of the most understudied groups of proteins in biochemical research, because of the technical difficulties of obtaining structural information about transmembrane regions. 3D structures of proteins derived by X-ray crystallography have been determined for about 15000 proteins, but only about 30 of these are transmembrane proteins, despite the fact that TM proteins may account for about 30% of the proteome. This project seeks to make a contribution to knowledge and understanding in the field of neural networks, through the development of a particular area of theory and application of a novel methodology. The project seeks to develop software for analysing protein sequences for the presence of membrane spanning regions using artificial neural network approaches. The expected benefits include an increased understanding of how to create and train optimal neural networks for membrane protein datasets, which will be extremely useful in both academia and industry. Intelligent Systems Research Centre
Bioinformatics Introduction • Bioinformatics is the application of computer technology to the management of biological information. Computers are used to gather, store, analyze and integrate biological and genetic information which can then be applied to gene-based drug discovery and development. Intelligent Systems Research Centre
Membrane Proteins Introduction Contn.. • In recent years, many bioinformaticians have researched into the prediction of globular proteins, which is roughly about 75% of the whole proteome • However, membrane proteins, which are 20-30% of the proteome offer more novel targets for newer drug developments, are largely ignored Intelligent Systems Research Centre
Data Mining Introduction contn… • Data mining (or more precisely, knowledge extraction) can be described as the process of discovering previously unknown dependencies and relationships in data sets. • A (learning) system may discover salient features in the input data whose importance was not previously recognized. • It is now established that algorithms can be designed which extract understandable representations from trained neural networks, enabling them to be used for data mining (Browne, A., 2004) Intelligent Systems Research Centre
Knowledge Extraction Introduction contn… • In the past, most data mining has been performed using symbolic artificial intelligence data algorithms such as C4.5 and C5 or CART. • Neural Networks (NNs) have in the past been treated as ‘black boxes’: systems unable to explain the process by which a decision or output has been reached. Intelligent Systems Research Centre
Objectives of the investigation • The project seeks to develop software for analysing protein sequences for the presence of membrane spanning regions using artificial neural network approaches. Beyond simply identifying membrane spanning regions the approach would be used to analyse biologically useful subsets of proteins with membrane spanning regions, which would include: • (i) The large family of G-protein coupled receptors (GPCRs). These form an important group of drug targets of interest to the pharmaceutical industry, and are the site of interaction of many hormones, neurotransmitters and other chemical stimuli around the body. Attempts have been made to develop methods for predicting coupling specificity of GPCRs using Hidden Markov Matrices (Möller et al. 2001b). and the project would extend work in this area. Intelligent Systems Research Centre
Objectives of the investigation (ii) Membrane proteins with distinct cellular locations. Prediction of the localization of membrane proteins to the Golgi apparatus has been attempted (Yuan and Teasdale 2002) and it would be useful to attempt analysis of proteins localized to other membrane compartments, such as plasma membrane, endoplasmic reticulum, lysosomes, and peroxisomes, to look for discriminating motifs in membrane spanning regions in addition to known localizing signals. Intelligent Systems Research Centre
Objectives of the investigation The methodology could also be applied to membrane proteins unique to bacteria and other micro-organisms and could potentially identify new targets for antibiotics. Intelligent Systems Research Centre
Background The relationship of this investigation to previous work in thearea • A large number of researchers are investigating globular proteins because of the easy availability of the data • The prediction of membrane protein structures is a key area that remains unsolved (Baldi et al. 2002). • There have been several attempts over the last 20 years to develop tools for predicting membrane spanning regions, reviewed recently by (Möller et al. 2001a). Intelligent Systems Research Centre
Current Tools The relationship of this investigation to previous work in the area • The problem of prediction is made topologically more complex by the presence of several transmembrane domains in many proteins, and the same authors (Möller et al.,2001b) conclude that current tools are far away from achieving a 95% reliability in prediction.The same group have mentioned that the software developed so far are basically divided on two principles-local approach and global approaches. Intelligent Systems Research Centre
Neuron Neural Networks • An artificial neuron is an information processing element that operates in a manner that resembles some operation of a biological neuron (simplified). A collection of several elements that can process information in parallel (and in connection) is a network of artificial neurons. Intelligent Systems Research Centre
Definition Neural Networks • According to Haykin, S. (1994)A neural network is a massively parallel distributed processor that has a natural propensity for storing experiential knowledge and making it available for use. It resembles the brain in two respects: • Knowledge is acquired by the network through a learning process. • Interneuron connection strengths known as synaptic weights are used to store the knowledge. Intelligent Systems Research Centre
xm x2 Output x1 y1 Input Signals y2 Layer of Output Neurons Layer of Input Neurons Layer of Hidden Neurons Network Used Architecture of the Neural Network Amino Acid Sequence Membrane Protein Nonmembrane Protein Intelligent Systems Research Centre
Steps Involved Architecture of the Neural Network Data Collection: ensuring that the correct data are gathered. Data Preparation: cleaning the data, and ensuringthat they are in the appropriate format for Neural Connection. Design: choosing the best neural approach (Here MLP) Training and Testing: building the application. Experimentation: tailoring the application to improve the results. Implementation: producing the results. Intelligent Systems Research Centre
Architecture Data Collection Intelligent Systems Research Centre
Architecture of the Neural Network Data Preparation Intelligent Systems Research Centre
Architecture of the Neural NetworkDesign • The first consideration in designing the application was the neural technique to be adopted. This type of problem, where we want to link a set of inputs (sequence of Amino acids) to an output (membrane or nonmembrane or sub classification of membrane) should be solved using a supervised neural technique. There are three supervised neural techniques in Neural Connection, Radial Basis Function, the Bayesian Network and Multi-Layer Perceptron. Intelligent Systems Research Centre
Architecture of the Neural NetworkDesign • MLPs are the most commonly used neural computing technique. • The MLP differs from the Simple Perceptron in two major ways. • Firstly, it has an additional layer of neurons between the input and output layer, known as the hidden layer.This layer vastly increases the learning power of the MLP. • Secondly, it uses a transfer, or activation function to modify the input to a neuron. • The activation of hidden and output layer neurons is the same as in the case of simple Perceptrons, while the transfer function is a smooth non-linear function, usually the sigmoid function. Intelligent Systems Research Centre
Architecture of the Neural NetworkTraining and Testing TrainingCycles Intelligent Systems Research Centre
Results Intelligent Systems Research Centre
Results Intelligent Systems Research Centre
Conclusions • This technique demonstrates that it is possible to combine the generalization accuracy of NNs with the comprehensibility generated by the knowledge extraction method . • Preliminary results will be analysed and further improvements will be designed Intelligent Systems Research Centre
Conclusion Contn.. • Modern data gathering techniques are producing vast amounts of data. However, data can be useless in the absence of understanding. • The extraction of decision trees from trained NNs is an important addition to the data mining toolkit of knowledge extraction techniques(Browne, A & R.Sun.2001,1999) • The combination of NNs with an algorithm to extract knowledge from the trained networks potentially offers the ‘best of both worlds’ to those attempting to make predictions on their data and simultaneously understand it. Intelligent Systems Research Centre
Reference • Bose, S. and Browne, A.,Hassan K.,White,K. (2003) Knowledge Discovery in Bioinformatics using Neural Networks. Proceedings 6th International Conference On Computer And Information Technology, Dhaka, Bangladesh • Baldi.P., G.Pollastri. "Machine Learning Structural and Functional Proteomics", IEEE Intelligent Systems (Intelligent Systems in Biology II), March/April 2002. • Browne, A., Hudson, B. D., Whitley, D. C., Ford, M. G. and Picton, P. (2003) Biological Data Mining with Neural Networks: Implementation & Application of a Flexible Decision Tree Extraction Algorithm to Genomic Problem Domains. Neurocomputing: Special Issue on Neural Networks in Bioinformatics (In Press) ISSN: 0925-2312. • Browne, A., Hudson, B. D., Whitley, D. C. , Ford, M. G. and Picton, P. (2004) Biological data mining with neural networks: Implementation and application of a flexible decision tree extraction algorithm to genomic problem domains. Neurocomputing (In Press) ISSN: 0925-2312. • Browne, A. (2002). Representation and extrapolation in multi-layer perceptrons. Neural Computation, 14(7), 1739-1754. ISSN: 0899-7667. • Browne, A. & R. Sun. (2001). Connectionist inference models. Neural Networks, 14(10), 1331-1355. ISSN: 0893-6080. • Browne, A. & R. Sun (1999). Connectionist variable binding. Expert Systems: The International Journal of Knowledge Engineering and Neural Networks 16(3), 189-207. ISSN: 0266-4720. • Browne, A. & P. Picton (1999). Two analysis techniques for feed-forward networks. Behaviormetrika: Special Issue on Analysis of Knowledge Representations in Neural Network Models 26(1), 75-87. ISSN: 0385-7417. • Möller, Michael D. R. Croning, and Rolf Apweiler (2001a) Evaluation of methods for the prediction of membrane spanning regions Bioinformatics Vol 17: 646-653. • Möller, Jaak Vilo, and Michael D.R. Croning (2001b)Prediction of the coupling specificity of G protein coupled receptors to their G proteins Bioinformatics 17: 174S-181S. • Möller, Evgenia V. Kriventseva, and Rolf Apweiler (2000) A collection of well characterized integral membrane proteins Bioinformatics 16: 1159-1160. • Yang, S. & Browne, A. (2002a). Multistage neural networks: Adaptive combination of ensemble results. Proceedings of the Fourth International Conference on Recent Advances in Soft Computing (RASC2002), Nottingham, UK. • Yang, S. & Browne, A. (2002b). Multistage Neural Network Ensembles. Proceedings of the Third International Workshop on Multiple Classifier Systems, Caligari, Italy, published as Lecture Notes in Computer Science 2364, Springer Verlag, Berlin, Heidelberg. Intelligent Systems Research Centre