60 likes | 157 Views
Genetic Algorithms Select Protein Features Most Predictive of Enzyme Function. Andrew Kernytsky, Burkhard Rost Columbia University. Enzyme function prediction. Given protein sequence predict Enzyme Commission (EC) number. Ligases. Isomerases. Oxidoreductases. Lyases. Transferases.
E N D
Genetic Algorithms Select Protein Features Most Predictive of Enzyme Function Andrew Kernytsky, Burkhard Rost Columbia University
Enzyme function prediction Given protein sequence predict Enzyme Commission (EC) number Ligases Isomerases Oxidoreductases Lyases Transferases Hydrolases NC-IUBMB (1992) Recommendations of the International Union of Biochemistry on the Nomenclature and Classification of Enzymes. In, Enzyme Nomenclature. Academic Press, New York. EC Wheel Figure: Porter CT, Bartlett GJ, Thornton JM. Nucleic Acids Res. 2004 January 1; 32: D129–D133.
Limited local information All Global 1% All Intersection 0.1% TAGHCVNYDYGAGCQSGSPV bbbbbieeeiibbieeeeee ..|....|......||.... HHHEEEEELLEEEEELLLLL iiibbbbbbboooobbbbbb 36788842100000000123 AA Acc Cons Feat 4 Feat 5 Feat 6 0.01% Significant risk of overfitting during training 103+ features > 102 positive samples Intersection properties capture local information 20% 10% 5%
All intersection and global feature classes All possible combinations of feature classes[genomes] Protein sequence 2nd Generation Genome Pop. Inner Learning Algorithm 3rd Generation Genome Pop. Fitness Assesed M S N L L K D F E V A Q C AA×sec AA AA×sec sec AA×sec AA AA×sec sec AA×sec 0.635 0.688 0.677 AA sec AA×sec AA sec AA AA×sec sec AA×sec AA sec AA×sec AA sec AA×sec GA Evolution Neural Network Selection Crossover Mutation OR SVM 1st 2nd 3rd 4th Generation Populations Algorithm overview Genetic Algorithm
GA improves performance EC Level
Balance between intersection and global features gives best performance