1 / 41

Visualization of Influenza Protein Segment HA in Manifold Space

Explore a visualization method to study the family of viruses and infer their ancestors' relations, monitoring mutation rates and evolution trends, aiding in medical antigen research. Utilizes relative distances among patterns.

meghanw
Download Presentation

Visualization of Influenza Protein Segment HA in Manifold Space

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Visualization of Influenza Protein Segment HA in Manifold Space Cheng-Yuan Liou* and Wei-Chen Cheng Department of Computer Science and Information Engineering National Taiwan University Republic of China *cyliou@csie.ntu.edu.tw 24th, March, 2010 11:00-13:00 Hue City, Vietnam

  2. Idea D>>2 Dimension: D Dimension: 2 2

  3. LDIM Energy function • : distance between gene and gene • : a distance in the manifold space 3

  4. LDIM Algorithm • Initialize the cell set • Find the minimum and maximum distances among all protein sequences, • For each epoch t from to • Set • For every pair protein sequences and , adjust their cell positions by • End For • End For 4

  5. LDIM Algorithm 5

  6. Difference between SOM and LDIM 6

  7. Levenberg–Marquardtmethod for LDIM algorithm Batch Mode • Set time • Set • Calculate the error for all data • Calculate Jacobian matrix • Initialize • Calculate the update function, , to have the moving vectors of every data, is a vector which have elements. • Try to update and rearrange to . • Recalculate the new errorby • If the new error is greater than old error , set and then go back to step 6, otherwise continue Step 10. • Update the real output • Shrink the value of by • if then and go back to Step 2, otherwise stop the procedure. 7

  8. Levenberg–Marquardtmethod for LDIM algorithm Batch Mode Jacobian Matrix 8

  9. Levenberg–Marquardtmethod for LDIM algorithm Sequential Mode, in this paper • Set time • Set • For from 1 to • Calculate the error for data by • Calculate Jacobian matrix • Initialize • Calculate the update function, , to have the moving vector of the data, is a vector which have elements. • Try to update and rearrange to . • Recalculate the new errorby • If the new error is greater than old error , set and then go back to step 7, otherwise continue Step 11. • Update the real output • End For • if then and go back to Step 2, otherwise stop the procedure. 9

  10. Levenberg–Marquardtmethod for LDIM algorithm Sequential Mode, in this paper Jacobian Matrix 10

  11. Computer Simulation for LDIM algorithms • Input: 100 H1N1 sequences • Aligned sequences length: 570 • Min length: 561 • Max length: 566 • Learning rate: 0.00025 • Epochs: 150 11

  12. 100 H1N1 sequences 12

  13. Pandemic (H1N1) 2009Mortality Statistics source: Global Alert and Response, WHO http://www.who.int/csr/disease/swineflu/updates/en/index.html 13

  14. H5N1 Mortality C: Cases, D: Deaths http://www.who.int/csr/disease/avian_influenza/country/cases_table_2010_03_04/en/index.html

  15. Goal This work presented a visualization method to study the family of viruses and to infer their grandmother (ancestor). This manifold reveals the relations among all family members. Their grandmothers are useful sources for medical antigen. By this manifold, one can monitor the mutation rates and the evolution trends of the viruses. A new viral sample can be placed in the population with its family relations. 15

  16. Data Information Distance: Hamming distance after performing multiple alignment Characters: 20 amino acid Information of data:Influenza A virusessubtype H1N1, H1N2, H2N2, H3N2, H5N1, H7N2, H7N3, H7N7, H9N2 Data source: NCBI Influenza Virus Resource 16

  17. Parameters of multiple alignment • Program: Clustalw2 • European Bioinformatics Institute • Parameter Setting: • Penalty for opening a gap : 10.0 • Penalty for extending a gap: 0.2 • The gap separation penalty: 4 • GONNET 250 matrix Larkin MA, Blackshields G, Brown NP, etc.: Clustal W and Clustal X version 2.0. Bioinformatics, 23(21):2947-8

  18. Protein HA (Hemagglutinin) The function of this protein is responsible for bind the virus to the cell.

  19. Influenza A Segment HA-- Timeline

  20. Pandemic (H1N1) 2009Segment HA in 3D The color shows value of z-axis.

  21. H1N1 Segment HA in 3D Space 21

  22. Ten H1N1 HA sequences that are closest to the center 22

  23. Partial Phylogenetic Tree (H1N1) 23

  24. Avain Flu (H5N1)Segment HA in 3D The color shows the time.

  25. H5N1 Segment HA in 3D Space 25

  26. Ten H5N1 HA sequences that are closest to the center 26

  27. Partial Phylogenetic Tree (H5N1) 27

  28. H1N1 & H3N2 & H5N1Hemagglutinin (HA) in 3D 28

  29. Overview Influenza AHemagglutinin (HA) in 3D 29

  30. Influenza AHemagglutinin (HA) in 3D Detailed 30

  31. Protein NA (Neuraminidase) The protein facilitates the release of progeny viruses from infected cells.

  32. Influenza A Segment NA-- Time Line

  33. Pandemic (H1N1) 2009Segment NA in 3D The color shows value of z-axis.

  34. Avain Flu (H5N1)Segment NA in 3D The color shows the time.

  35. H1N1 & H3N2 & H5N1Neuraminidase (NA) in 3D 35

  36. Influenza ANeuraminidase (NA) in 3D Overview 36

  37. Influenza ANeuraminidase (NA) in 3D Detail 37

  38. Influenza ANeuraminidase (NA) in 3D Detail 38

  39. Comparison of Mutation Rate Reference: [Manna and Liou (2007); Tutorial link (ppts); Jukes and Cantor (1969); Nei and Gojobori (1986)]

  40. Summary • This manifold is designed for information visualization. • It uses relative distances among patterns and is invariant under translation, rotation and scaling of the pattern coordinates. • It has a perfect energy function. We expect that it can preserve the physical meaning among patterns and reveal their various hidden relations maximally. • The initial setting of the algorithm is flexible. The computation can be parallelized and distributed. The cells can be trained in a sequential mode or a batch mode. 40

  41. Thanks for your attention.

More Related