1 / 6

Genetic Algorithms Select Protein Features Most Predictive of Enzyme Function

Genetic Algorithms Select Protein Features Most Predictive of Enzyme Function. Andrew Kernytsky, Burkhard Rost Columbia University. Enzyme function prediction. Given protein sequence predict Enzyme Commission (EC) number. Ligases. Isomerases. Oxidoreductases. Lyases. Transferases.

Download Presentation

Genetic Algorithms Select Protein Features Most Predictive of Enzyme Function

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Genetic Algorithms Select Protein Features Most Predictive of Enzyme Function Andrew Kernytsky, Burkhard Rost Columbia University

  2. Enzyme function prediction Given protein sequence predict Enzyme Commission (EC) number Ligases Isomerases Oxidoreductases Lyases Transferases Hydrolases NC-IUBMB (1992) Recommendations of the International Union of Biochemistry on the Nomenclature and Classification of Enzymes. In, Enzyme Nomenclature. Academic Press, New York. EC Wheel Figure: Porter CT, Bartlett GJ, Thornton JM. Nucleic Acids Res. 2004 January 1; 32: D129–D133.

  3. Limited local information All Global 1% All Intersection 0.1% TAGHCVNYDYGAGCQSGSPV bbbbbieeeiibbieeeeee ..|....|......||.... HHHEEEEELLEEEEELLLLL iiibbbbbbboooobbbbbb 36788842100000000123 AA Acc Cons Feat 4 Feat 5 Feat 6 0.01% Significant risk of overfitting during training 103+ features > 102 positive samples Intersection properties capture local information 20% 10% 5%

  4. All intersection and global feature classes All possible combinations of feature classes[genomes] Protein sequence 2nd Generation Genome Pop. Inner Learning Algorithm 3rd Generation Genome Pop. Fitness Assesed M S N L L K D F E V A Q C AA×sec AA AA×sec sec AA×sec AA AA×sec sec AA×sec 0.635 0.688 0.677 AA sec AA×sec AA sec AA AA×sec sec AA×sec AA sec AA×sec AA sec AA×sec GA Evolution Neural Network Selection Crossover Mutation OR SVM 1st 2nd 3rd 4th Generation Populations Algorithm overview Genetic Algorithm

  5. GA improves performance EC Level

  6. Balance between intersection and global features gives best performance

More Related