1 / 45

Biological sequence analysis and information processing by artificial neural networks

Biological sequence analysis and information processing by artificial neural networks. Objectives. Input. Neural network. Output. Neural network: is a black box that no one can understand over predict performance. Pairvise alignment.

adwayne
Download Presentation

Biological sequence analysis and information processing by artificial neural networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Biological sequence analysis and information processing by artificial neural networks

  2. Objectives Input Neural network Output • Neural network: • is a black box that no one can understand • over predict performance

  3. Pairvise alignment • >carp Cyprinus carpio growth hormone 210 aa vs. • >chicken Gallus gallus growth hormone 216 aa • scoring matrix: BLOSUM50, gap penalties: -12/-2 • 40.6% identity; Global alignment score: 487 • 10 20 30 40 50 60 70 • carp MA--RVLVLLSVVLVSLLVNQGRASDN-----QRLFNNAVIRVQHLHQLAAKMINDFEDSLLPEERRQLSKIFPLSFCNSD • :: . : ...:.: . : :. . :: :::.:.:::: :::. ..:: . .::..: .: .:: :. • chicken MAPGSWFSPLLIAVVTLGLPQEAAATFPAMPLSNLFANAVLRAQHLHLLAAETYKEFERTYIPEDQRYTNKNSQAAFCYSE • 10 20 30 40 50 60 70 80 • 80 90 100 110 120 130 140 150 • carp YIEAPAGKDETQKSSMLKLLRISFHLIESWEFPSQSLSGTVSNSLTVGNPNQLTEKLADLKMGISVLIQACLDGQPNMDDN • : ::.:::..:..: ..:::.:. ::.:: : : ::. .:.:. :. ... ::: ::. ::..:.. : .: . • chicken TIPAPTGKDDAQQKSDMELLRFSLVLIQSWLTPVQYLSKVFTNNLVFGTSDRVFEKLKDLEEGIQALMRELEDRSPR---G • 90 100 110 120 130 140 150 160 • 170 180 190 200 210 • carp DSLPLP-FEDFYLTM-GENNLRESFRLLACFKKDMHKVETYLRVANCRRSLDSNCTL • .: : .. : . . .:. : ... ::.:::::.:::::::.: .::: .::::. • chicken PQLLRPTYDKFDIHLRNEDALLKNYGLLSCFKKDLHKVETYLKVMKCRRFGESNCTI • 170 180 190 200 210

  4. HUNKAT

  5. Biological Neural network

  6. Biological neuron

  7. Diversity of interactions in a network enables complex calculations • Similar in biological and artificial systems • Excitatory (+) and inhibitory (-) relations • between compute units

  8. Biological neuron structure

  9. Transfer of biological principles to artificial neural network algorithms • Non-linear relation between input and output • Massively parallel information processing • Data-driven construction of algorithms • Ability to generalize to new data items

  10. Neural networks • Neural networks can learn higher order correlations (0,1) (1,1) XOR function: 0 0 => 0 1 0 => 1 0 1 => 1 1 1 => 0 (0,0) (1,0) No linear function can separate the points

  11. Error estimates (0,1) (1,1) XOR 0 0 => 0 1 0 => 1 0 1 => 1 1 1 => 0 Predict 0 1 1 1 Error 0 0 0 1 (0,0) (1,0) Mean error: 1/4

  12. Neural networks Linear function v1 v2

  13. w21 w12 w22 v1 v2 Neural networks Higher order function w11

  14. Neural networks. How does it work? Input { 1 (Bias) w12 w21 wt1 w22 w11 wt2 v1 v2 vt

  15. Neural networks (0 0) Input { 1 (Bias) 0 0 1 6 4 -6 6 4 -2 o2=-2 O2=0 o1=-6 O1=0 1 -9 9 -4.5 y1=-4.5 Y1=0

  16. Neural networks (1 0 && 0 1) Input { 1 (Bias) 1 0 1 6 4 -6 6 4 -2 o2=4 O2=1 o1=-2 O1=0 1 -9 9 -4.5 y1=4.5 Y1=1

  17. Neural networks (1 1) Input { 1 (Bias) 1 1 1 6 4 -6 6 4 -2 o2=10 O2=1 o1=2 O1=1 1 -9 9 -4.5 y1=-4.5 Y1=0

  18. What is going on? Input { 1 (Bias) XOR function: 0 0 => 0 1 0 => 1 0 1 => 1 1 1 => 0 6 4 -6 6 4 -2 y1 y2 -9 9 -4.5

  19. y2 (2,2) (1,0) y1 (0,0) What is going on? x2 (0,1) (1,1) (1,0) (0,0) x1

  20. DEMO

  21. Training and error reduction

  22. Transfer of biological principles to neural network algorithms • Non-linear relation between input and output • Massively parallel information processing • Data-driven construction of algorithms

  23. Neural network training • A Network contains a very large set of parameters • A network with 5 hidden neurons predicting binding for 9meric peptides has 9x20x5=900 weights • Over fitting is a problem • Stop training when test performance is optimal Temperature years

  24. Neural network training. Cross validation Cross validation Train on 4/5 of data Test on 1/5 => Produce 5 different neural networks each with a different prediction focus

  25. Neural network training curve Maximum test set performance Most cable of generalizing

  26. Network training • Encoding of sequence data • Sparse encoding • Blosum encoding • Sequence profile encoding

  27. Sparse encoding of amino acid sequence windows

  28. Sparse encoding Inp Neuron 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 AAcid A 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 R 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 N 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 D 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Q 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 E 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0

  29. BLOSUM encoding (Blosum50 matrix) A R N D C Q E G H I L K M F P S T W Y V A 4 -1 -2 -2 0 -1 -1 0 -2 -1 -1 -1 -1 -2 -1 1 0 -3 -2 0 R -1 5 0 -2 -3 1 0 -2 0 -3 -2 2 -1 -3 -2 -1 -1 -3 -2 -3 N -2 0 6 1 -3 0 0 0 1 -3 -3 0 -2 -3 -2 1 0 -4 -2 -3 D -2 -2 1 6 -3 0 2 -1 -1 -3 -4 -1 -3 -3 -1 0 -1 -4 -3 -3 C 0 -3 -3 -3 9 -3 -4 -3 -3 -1 -1 -3 -1 -2 -3 -1 -1 -2 -2 -1 Q -1 1 0 0 -3 5 2 -2 0 -3 -2 1 0 -3 -1 0 -1 -2 -1 -2 E -1 0 0 2 -4 2 5 -2 0 -3 -3 1 -2 -3 -1 0 -1 -3 -2 -2 G 0 -2 0 -1 -3 -2 -2 6 -2 -4 -4 -2 -3 -3 -2 0 -2 -2 -3 -3 H -2 0 1 -1 -3 0 0 -2 8 -3 -3 -1 -2 -1 -2 -1 -2 -2 2 -3 I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 2 -3 1 0 -3 -2 -1 -3 -1 3 L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 -2 2 0 -3 -2 -1 -2 -1 1 K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 -1 -3 -1 0 -1 -3 -2 -2 M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 0 -2 -1 -1 -1 -1 1 F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 -4 -2 -2 1 3 -1 P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 -1 -1 -4 -3 -2 S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 1 -3 -2 -2 T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 -2 -2 0 W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 2 -3 Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 -1 V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4

  30. Sequence encoding (continued) • Sparse encoding V:0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 L:0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 V.L=0 (unrelated) • Blosum encoding V: 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 L:-1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 -2 2 0 -3 -2 -1 -2 -1 1 V.L = 0.88 (highly related) V.R = -0.08 (close to unrelated)

  31. Applications of artificial neural networks • Talk recognition • Prediction of protein secondary structure • Prediction of Signal peptides • Post translation modifications • Glycosylation • Phosphorylation • Proteasomal cleavage • MHC:peptide binding

  32. XOR function => Higher order sequence correlations • Neural networks can learn higher order correlations! • What does this mean? Say that the peptide needs one and only one large amino acid in the positions P3 and P4 to fill the binding cleft How would you formulate this to test if a peptide can bind? S S => 0 L S => 1 S L => 1 L L => 0

  33. What have we learned • Neural networks are not so bad as their reputation • Neural networks can deal with higher order correlations • Be careful when training a neural network • Always use cross validated training

More Related