410 likes | 429 Views
Explore a visualization method to study the family of viruses and infer their ancestors' relations, monitoring mutation rates and evolution trends, aiding in medical antigen research. Utilizes relative distances among patterns.
E N D
Visualization of Influenza Protein Segment HA in Manifold Space Cheng-Yuan Liou* and Wei-Chen Cheng Department of Computer Science and Information Engineering National Taiwan University Republic of China *cyliou@csie.ntu.edu.tw 24th, March, 2010 11:00-13:00 Hue City, Vietnam
Idea D>>2 Dimension: D Dimension: 2 2
LDIM Energy function • : distance between gene and gene • : a distance in the manifold space 3
LDIM Algorithm • Initialize the cell set • Find the minimum and maximum distances among all protein sequences, • For each epoch t from to • Set • For every pair protein sequences and , adjust their cell positions by • End For • End For 4
Levenberg–Marquardtmethod for LDIM algorithm Batch Mode • Set time • Set • Calculate the error for all data • Calculate Jacobian matrix • Initialize • Calculate the update function, , to have the moving vectors of every data, is a vector which have elements. • Try to update and rearrange to . • Recalculate the new errorby • If the new error is greater than old error , set and then go back to step 6, otherwise continue Step 10. • Update the real output • Shrink the value of by • if then and go back to Step 2, otherwise stop the procedure. 7
Levenberg–Marquardtmethod for LDIM algorithm Batch Mode Jacobian Matrix 8
Levenberg–Marquardtmethod for LDIM algorithm Sequential Mode, in this paper • Set time • Set • For from 1 to • Calculate the error for data by • Calculate Jacobian matrix • Initialize • Calculate the update function, , to have the moving vector of the data, is a vector which have elements. • Try to update and rearrange to . • Recalculate the new errorby • If the new error is greater than old error , set and then go back to step 7, otherwise continue Step 11. • Update the real output • End For • if then and go back to Step 2, otherwise stop the procedure. 9
Levenberg–Marquardtmethod for LDIM algorithm Sequential Mode, in this paper Jacobian Matrix 10
Computer Simulation for LDIM algorithms • Input: 100 H1N1 sequences • Aligned sequences length: 570 • Min length: 561 • Max length: 566 • Learning rate: 0.00025 • Epochs: 150 11
Pandemic (H1N1) 2009Mortality Statistics source: Global Alert and Response, WHO http://www.who.int/csr/disease/swineflu/updates/en/index.html 13
H5N1 Mortality C: Cases, D: Deaths http://www.who.int/csr/disease/avian_influenza/country/cases_table_2010_03_04/en/index.html
Goal This work presented a visualization method to study the family of viruses and to infer their grandmother (ancestor). This manifold reveals the relations among all family members. Their grandmothers are useful sources for medical antigen. By this manifold, one can monitor the mutation rates and the evolution trends of the viruses. A new viral sample can be placed in the population with its family relations. 15
Data Information Distance: Hamming distance after performing multiple alignment Characters: 20 amino acid Information of data:Influenza A virusessubtype H1N1, H1N2, H2N2, H3N2, H5N1, H7N2, H7N3, H7N7, H9N2 Data source: NCBI Influenza Virus Resource 16
Parameters of multiple alignment • Program: Clustalw2 • European Bioinformatics Institute • Parameter Setting: • Penalty for opening a gap : 10.0 • Penalty for extending a gap: 0.2 • The gap separation penalty: 4 • GONNET 250 matrix Larkin MA, Blackshields G, Brown NP, etc.: Clustal W and Clustal X version 2.0. Bioinformatics, 23(21):2947-8
Protein HA (Hemagglutinin) The function of this protein is responsible for bind the virus to the cell.
Pandemic (H1N1) 2009Segment HA in 3D The color shows value of z-axis.
Avain Flu (H5N1)Segment HA in 3D The color shows the time.
Overview Influenza AHemagglutinin (HA) in 3D 29
Influenza AHemagglutinin (HA) in 3D Detailed 30
Protein NA (Neuraminidase) The protein facilitates the release of progeny viruses from infected cells.
Pandemic (H1N1) 2009Segment NA in 3D The color shows value of z-axis.
Avain Flu (H5N1)Segment NA in 3D The color shows the time.
Influenza ANeuraminidase (NA) in 3D Overview 36
Influenza ANeuraminidase (NA) in 3D Detail 37
Influenza ANeuraminidase (NA) in 3D Detail 38
Comparison of Mutation Rate Reference: [Manna and Liou (2007); Tutorial link (ppts); Jukes and Cantor (1969); Nei and Gojobori (1986)]
Summary • This manifold is designed for information visualization. • It uses relative distances among patterns and is invariant under translation, rotation and scaling of the pattern coordinates. • It has a perfect energy function. We expect that it can preserve the physical meaning among patterns and reveal their various hidden relations maximally. • The initial setting of the algorithm is flexible. The computation can be parallelized and distributed. The cells can be trained in a sequential mode or a batch mode. 40