200 likes | 502 Views
Phylogenetic Analysis of the SARS virus. Zhang Louxin Dept of Math, NUS. Our Study. Genome-wide analysis of the SARS virus: a) Comparison to other coronaviruses b) Comparison between different strains of the SARS viruses. Objective:
E N D
Phylogenetic Analysis of the SARS virus Zhang Louxin Dept of Math, NUS
Our Study • Genome-wide analysis of the SARS virus: a) Comparison to other coronaviruses b) Comparison between different strains of the SARS viruses. • Objective: a) Search for clues to where the SARS virus came from b) Provide a technique in tracking the step-by-step transmission of the SARS.
Genome Structure of the SARS virus(Marra et al., Rota et al., Ruan et al., 2003) The RNA genome contains about 30k bps, having five major open reading frames (ORFs): ORF1a and ORF1b: replicase polyprotein (13149, 7887 bps) S: spike glycoprotein (3768 bps) E: small envelope protein (231 bps) M: membrane glycoproteins (666 bps) N: nucleocapsid protein (1269 bps) and7 unknown ORF’s X’s (total 2595 bps)
Part 1:Comparison to other cornaviruse • In comparing the sequences of SARS virus and each known coronavirus, we did a) pairwise alignment between genomes of SARS-Cov and the coronavirus. b) the quality of a) was assessed using VISTA which measuring the match rate f(x) in the window [x-99, x]
Reference: Sars_HK Conserve level: 60% Highlighted segments: ORF1a, ORF1b, S, M, N Cow vs SARS_HK
Reference: SARS_HK Conservation Level: 50% Highlighted Segments: ORF1a, ORF1b, S, M, N Human(229E) Pig Turkey Cow Mouse Bird
Observations and Questions • The SARS coronavirus is not a recombinant between known coronaviruses. (Marra et al., Rota et al., Ruan et al., 2003). • It probably evolved from an ancestor of the coronavirus family (in anunidentified host) for a long time before infecting humans in 2002. Question 1: What is the host? Civet cat or cow? Question2: How did the SARS-COV evolve? Did it acquire genes from its host, and/or exchange genetic information with other viruses?
Only Bioinformatics is not Enough • Searches of the GeneBank and genomic databases (with BLAST) indicated there is no significant sequence matches to ORF1a, and other ORF’s in the last one thirds of the genome. This motivates us to consider the other ORF’s X1-X7 that encode putative proteins, which are located in the last one thirds of the genome. We conducted a) BLAST searches b) Analysis of nucleotide base difference among different strains of the SARS-COV. Our analysis shows clearly that they are very unique to the SARS-COV.
BLAST Searches: No significant hits. • X1: • gi|30794373|ref|NM_178804.2| Mus musculus slit homolog 2 (D... 44 0.35 • gi|20830572|ref|XM_132035.1| Mus musculus slit homolog 2 (D... 44 0.35 • gi|5532494|gb|AF144628.1|AF144628 Mus musculus SLIT2 (Slit2... 44 0.35 • gi|26343252|dbj|AK053145.1| Mus musculus 0 day neonate lung... 44 0.35 • gi|4151258|gb|AF074960.1|AF074960 Mus musculus neurogenic e... 44 0.35 • gi|26084277|dbj|AK034918.1| Mus musculus 12 days embryo emb... 44 0.35 • X2: • gi|4096106|gb|U23446.1|DMU23446 Drosophila melanogaster tri... 40 3.0 • gi|21689975|emb|AL670472.7| Mouse DNA sequence from clone R... 40 3.0 • gi|12743807|emb|AL365322.10| Mouse DNA sequence from clone ... 40 3.0 • X3: • gi|21212249|emb|AL672015.6| Mouse DNA sequence from clone R... 46 0.019 • gi|1065946|gb|U40800.1| Caenorhabditis elegans cosmid D2096... 42 0.29 • gi|17539537|ref|NM_069020.1| Caenorhabditis elegans putativ... 42 0.29 • gi|13625036|emb|AL390789.13| Human DNA sequence from clone ... 42 0.29 • gi|16041563|gb|AC067844.6| Homo sapiens chromosome 8, clone... 40 1.2 • gi|18855236|emb|AL645928.9| Mouse DNA sequence from clone R... 40 1.2 • gi|30267397|gb|AY261744.1| Rhingia campestris 28S ribosomal... 38 4.6 • gi|28372410|gb|AY228336.1| Homo sapiens Kell blood group (K... 38 4.6 • gi|21844631|gb|AC122049.2| Mus musculus clone RP24-456N19, ... 38 4.6 • X4: • gi|18464246|gb|AC104684.3| Homo sapiens BAC clone RP11-1N7 ... 44 0.15 • gi|30682091|ref|NM_120859.2| Arabidopsis thaliana formin ho... 40 2.4 • gi|23462974|gb|AC121780.3| Mus musculus chromosome 5 clone ... 40 2.4 • gi|19482381|gb|AC023886.7| Homo sapiens BAC clone RP11-402J... 40 2.4 • gi|19034012|gb|AC097520.2| Homo sapiens BAC clone RP11-562F... 40 2.4 • gi|15668150|gb|AC096670.1| Homo sapiens BAC clone RP11-438K... 40 2.4 • gi|28411192|emb|AL662879.4|HS279F22 Homo sapiens chromosome... 40 2.4 • gi|16973056|emb|AL590082.9| Human DNA sequence from clone R... 40 2.4 • gi|12657180|emb|AL390882.12| Human DNA sequence from clone ... 40 2.4
BLAST Searches (Con’t) • X5: • gi|8927595|gb|AC019026.12| Mus musculus chromosome 6 clone ... 42 0.40 • gi|28830180|gb|AC115682.2| Dictyostelium discoideum chromos... 40 1.6 • gi|21306527|gb|AC120337.4| Homo sapiens X BAC RP11-804N7 (R... 40 1.6 • gi|24286735|gb|AY160107.1| Dictyostelium discoideum nucleot... 40 1.6 • gi|18423447|ref|NM_124670.1| Arabidopsis thaliana pyruvate ... 38 6.3 • gi|30522907|gb|AC124906.3| Equus caballus clone CH241-268I1... 38 6.3 • gi|21403217|gb|AY084507.1| Arabidopsis thaliana clone 10991... 38 6.3 • gi|28850462|gb|AF277315.6| Homo sapiens chromosome X clone ... 38 6.3 • gi|27802686|gb|AY213194.1| Homo sapiens Cockayne syndrome 1... 38 6.3 • gi|25189039|gb|AC116179.6| Mus musculus chromosome 10 clone... 38 6.3 • X6 • gi|6513908|gb|AC005875.2|AC005875 citb_188_b_12, complete s... 40 1.9 • gi|21111045|gb|AE012104.1| Xanthomonas campestris pv. campe... 38 7.4 • gi|30142462|emb|AL954861.11| Zebrafish DNA sequence from cl... 38 7.4 • gi|18642949|gb|AC104083.3| Homo sapiens BAC clone RP11-588K... 38 7.4 • gi|14196413|gb|AC010881.8| Homo sapiens BAC clone RP11-289J... 38 7.4 • gi|7839913|gb|AC016678.4|AC016678 Homo sapiens BAC clone RP... 38 7.4 • gi|7740042|gb|AC005703.2|AC005703 Homo sapiens chromosome 1... 38 7.4 • gi|26095929|dbj|AK053670.1| Mus musculus 0 day neonate eyeb... 38 7.4 • gi|23337151|emb|AL606479.16| Mouse DNA sequence from clone ... 38 7.4 • X7 • gi|22833002|gb|AE003426.2| Drosophila melanogaster chromoso... 40 1.3 • gi|27676657|ref|XM_218459.1| Rattus norvegicus similar to h... 40 1.3 • gi|27597031|gb|AC131788.3| Mus musculus chromosome 7 clone ... 40 1.3 • gi|21397252|gb|AC104148.5| Drosophila melanogaster X BAC RP... 40 1.3 • gi|18129378|gb|AC098575.6| Drosophila melanogaster X BAC RP... 40 1.3 • gi|6634463|emb|AL117344.12|HSJ395C13 Human DNA sequence fro... 40 1.3
Variations among the SARS-Cov strains:Nucleotide base differences Vietnam_CDC tagtggggtt cagtttcgtg ccattccgta ccct Singapore_2774 tagcggggtt cagtctcgaa tcattttgta ctct Toronto_02 tagcggggtt cagtctcgta ctatgctata ctct HongKong cagcaccgtt caagttcata ctattctgaa ttct CUHK tatcggggcc cagtcgtgtg ctactctgta ctcc Beijing_01 tagcgggtct tcgtcgcgta ctgctctgtc cctt Gunagzhou_01 tagcgggtcc cagtcgcgta ctactctgta ctcc Taiwan_01 tggcggggtt cagtctcgta ctattctgta ctct Summary: a) Differences occur in 34 positions; b) There are 10 positions where the sequence variants appear in >=1 sequences; c) The base differences could be sequencing errors, mutational noises, and mutational sites in the SARS genomes.
Simple Statistics All differences appear in 78 positions Reference: Their consensus sequence
No Significant Differences in X’s Total length is 2595 bps (8.65%). The expected differences in these Regions are about 5 to 6. Therefore, those X’s could be useful as markers for inferring the evolutionary history of SARS virus.
Part2: Phylogenetic analysis in tracking the step-by-step transmission • One unusual property of viruses is that they evolves rapidly. Such a property is an unfortunate one from the perspective of creating a vaccine. However, their rapid evolution enables fine-scale phylogenetic analysis. For example, it has been used for a) finding the origins of the HIV viruses (Gao et al.’99, Hahn et al.’00) b) tracking man-to-man transmission of the HIV virus (Ou et al.’92) c) deriving the transmission model of influenza (Fitch et al.’97, Bush et al.’99)
Phylogenetic Analysis A molecular phylogeny summaries the genetic variations of the molecular sequences being studied. (Marra et al, 2003) The rationale behind disease tracking: If A infects B and then B infects C, then the virus in C is more similar to B than A.
Phylogenetic Analysis of 8 Strains • Compute pairwise alignemnts over • ORF1b region. • b) Find pairwise distances using • Jukes and Cantor model; • Draw the phylogeny using • neighboring drawing method
Phylogenetic Analysis of 8 Strains (Con’t) The phylogeny was constructed over S-E-M-N regions using the parsimony method.
Phylogenetic Analysis of 8 Strains (Con’t) Our work indicates that it is possible to track the step-by-step transmission of the SARS using molecular data. However, a lots of work need to be done in the future. For example, we need to study the effects of mutational noise on the phylogeny analysis.
Conclusion a)Our genome-wide analysis indicates that the regions that encode unknown proteins in the last 1/3 part of the SARS genome could be useful as markers for studying the virus. b). Phylogenetic analysis can be used for tracking the step-by-step transmission of the SARS. The work was discussed with Louis Chen, Choi Kwok Pui, David Chew, Phil Long, P. Kolatkar, Vega Vinsensius, Lin Chin-Yo. The work was done with Li Quan and Wu YongHui.