750 likes | 764 Views
Explore a study on predicting immune responses using peptide microarrays and machine learning techniques. Understand how epitopes, antibodies, and peptide arrays play a crucial role in immune response prediction.
E N D
Napovedovanje imunskega odzivaiz peptidnih mikromrež Mitja Luštrek1 (2), Peter Lorenz2, Felix Steinbeck2, Georg Füllen2, Hans-Jürgen Thiesen2 1 Odsek za inteligentne sisteme, Institut Jožef Stefan 2 Univerza v Rostocku
Introduction • Immune response prediction • Interpretation
Introduction • Immune response prediction • Interpretation
Peptide = part of protein = short sequence of amino acids Image taken from EMBL website
Peptide = part of protein = short sequence of amino acids SNDIVLT Image taken from EMBL website = string of letters from 20-letter alphabet (1 letter = 1 amino acid, 20 standard amino acids)
Epitope Antigen protein Antibody binding Antibody
Epitope Epitope Antigen protein Antibody binding Antibody
Epitope Epitope Antigen protein Peptide
Epitope Epitope Antigen protein
Epitope Epitope Antigen protein Antibody binding Antibody
Epitope Epitope Antigen protein Antibody binding Antibody
Epitope Epitope Antigen protein Antibody binding Antibody
Epitope Epitope Antigen protein
Epitope Epitope Antigen protein
Peptide arrays Peptide array Peptides (15 amino acids) Glass slide
Peptide arrays IVIg antibody mixture Peptide array Peptides (15 amino acids) Glass slide
Peptide arrays IVIg antibody mixture Red = epitopes (bind antibodies) Black = non-epitopes Peptide array Peptides (15 amino acids) Glass slide
Peptide arrays Red = epitopes (bind antibodies) Black = non-epitopes Antibody against antibody + dye Antibody Peptide Glass slide
Peptide arrays Red = epitopes (bind antibodies) Black = non-epitopes
Introduction • Immune response prediction • Interpretation
Our task Machine learning
Our task Machine learning Training set: 13,638 peptides (3,420 epitopes) Test set: 13,640 peptides (3,421 epitopes) Balanced until the final testing
Machine learning Attribute representation
Machine learning Attribute representation ML Classifier Proability for epitope p
Machine learning Attribute representation ML Classifier Proability for epitope p
Machine learning Attribute representation 1 Attribute representation 8 ... ML ML Classifier 1 ... Classifier 8
Machine learning Attribute representation 1 Attribute representation 8 ... ML ML Final proability for epitope p Classifier 1 ... Classifier 8 Meta classifier ML
Machine learning SVM (SMO), Logistic regression Attribute representation 1 Attribute representation 8 ... ML ML Final proability for epitope p Classifier 1 ... Classifier 8 Linear regression Meta classifier ML
Attribute representation 1 Amino-acid counts
Attribute representation 2 Amino-acid count differences
Attribute representation 3 Subsequence counts
Attribute representation 4 Amino-acid class counts
Attribute representation 5 Amino-acid class subsequence counts
Attribute representation 6 Amino-acid pair counts Rationale: antibodies may bind in two places due to their two-chain structure. Antibody Peptide
Attribute representation 6 Amino-acid pair counts Rationale: antibodies may bind in two places due to their two-chain structure. Antibody 3 3 1 2 Peptide
Attribute representation 7 Amino-acids at distances from first + first amino acid Rationale: antibodies may bind in two places, first amino acid most accesible on the peptide array. Antibody Peptide
Attribute representation 7 Amino-acids at distances from first + first amino acid Rationale: antibodies may bind in two places, first amino acid most accesible on the peptide array. Antibody Peptide
Attribute representation 8 Average amino-acid properties
Attribute representation 9 (not used) Amino-acid countswith a difference Equivalent for epitope prediction?
Attribute representation 9 (not used) Amino-acid countswith a difference Equivalent for epitope prediction? • Count F as: • 1 F • 0.8 W • 0.4 Y • ... • Count W as: • 1 W • 0.7 F • 0.3 Y • ...
Attribute representation 9 (not used) Amino-acid substitution matrix
Attribute representation 9 (not used) Amino-acid substitution matrix Optimize with a genetic algorithm to maximize classification accuracy
Results – test set Epitope : non-epitope = 1 : 1 Epitope : non-epitope = 1 : 3
Results – test set State of the art: SVM + string kernel (EL-Manzalawy et al., 2008) Trained and tested on our data.
Results – test set Our results Balanced: 0.883 / 83.7 % Original: 0.884 / 85.9 % EL-Manzalawy Balanced: 0.868 / 82.0 % Original: 0.874 / 83.9 %