390 likes | 487 Views
Identification of human-to-human transmissibility factors in PB2 proteins of influenza A by large-scale mutual information analysis. Sixth International Conference on Bioinformatics (InCoB2007) Hong Kong, 28 th August 2007. Olivo Miotto
E N D
Identification of human-to-human transmissibility factors in PB2 proteins of influenza A by large-scale mutual information analysis Sixth International Conference on Bioinformatics (InCoB2007) Hong Kong, 28th August 2007 Olivo Miotto Institute of Systems Science and Yong Loo Lin School of Medicine, National University of Singapore AT Heiny Tan Tin Wee J Thomas August Vladimir Brusic Yong Loo Lin School of Medicine Johns Hopkins University Cancer Vaccine Center National University of Singapore School of Medicine Dana-Farber Cancer Inst.
Outline • Background • Mutual Information Analysis • Materials and Methods • Results • Discussion and conclusions
Outline • Background • Mutual Information Analysis • Materials and Methods • Results • Discussion and conclusions 1
Avian Flu: is The Pandemic coming? Can H5N1 viruses spread amongst humans?
The Influenza A Virus SerologicalSubtyping: HA Haemagglutinin 16 subtypes Viral RNA NA Neuraminidase 9 subtypes Matrix protein http://www.roche.com/pages/facets/10/viruse.htm
Avian vs Human Influenza Humans - Only 4 subtypes transmitted human-to-human (H2H) - Avian-to-human (A2H) infection in small number of subtypes - Affects the respiratory tract Wild Waterfowl - Natural pool - Over 100 subtypes observed - Affects the digestive tract - Often asymptomatic
Influenza Circulation Human-to-Human (H2H) only 4 subtypes Avian-to-Avian (A2A) > 100 subtypes Domestic Poultry Wild Waterfowl Humans Swine cf. Webster RG et al. (1992). Microbiol Rev. 1992, 56(1), 152-179.
Avian origins of pandemic strains From: Belshe RB (2005) N Engl J Med. 2005;353:2209-2211.
Pressing Questions • What are the mechanisms of adaptation to human hosts? • Which genes/products are involved? • Can we identify mutations responsible for the capability to infect humans? • Can we identify mutations responsible for adaptation to human-to-human transmission? • Can we elucidate the role of such mutations? • Can we assess the pandemic potential of current H5N1 (and/or other strains)?
Study goals • Analyze all influenza protein sequence data available • Historical data • Whole Genome • Use statistical approaches to identify amino sites that characterize H2H transmissibility • Compare H2H with non-H2H (A2A) • Create an "adaptation map" • Use the information acquired to characterize individual isolates and strain evolution • Map out the emergence of characteristic mutations • Assess a strain's potential for H2H transmissibility
Why PB2? • Initial study performed on PB2 protein • Internal protein, component of RNP • Some experimentally determined functional regions • Well-known E627K mutation involvement in mammalian and cold-temperature adaptation From: http://www.omedon.co.uk/influenza/influenza/ Subbarao EK, London W, Murphy BR (1993) J Virol, 67(4), 1761-1764.
Outline • Background • Mutual Information Analysis • Materials and Methods • Results • Discussion and conclusions 2
Information Theory • Information EntropyH is a measure of uncertainty • where e is an event from a possible set E, and pe is the probability of e occurring • Lower entropy -> more predictable outcome • Entropy is affected by • the number of outcomes • their relative probabilities Shannon CE (1948) Bell System Tech J, 27: 379-423,623-656.
Entropy in multiple alignments • In a multiple sequence alignment, we can treat each alignment site as a separate "variable" • Each observed residue at that site as a separate "event" • The "event probability" as the percentage of sequences in the alignment that contain the residue • H = 0 at fully conserved positions • Single, 100% predictable outcome • H increases when • several residues are observed at the same position, and/or • their probability is evenly distributed
Entropy is a measure of diversity • Both full sequences and sequence fragments can be used in entropy computation Entropy of Influenza A PB2 protein based on alignment of 3132 sequences
Entropy in Sequence Alignments Z H M Z Z M H Z = zero entropy M = medium entropy H = high entropy
Comparing Alignments Z C C N Z N C = characteristic sites Z = zero entropy N = non-characteristic AVIAN sequences HUMANsequences
Mutual Information • Mutual Information (MI) uses information entropy to measure relationship between two variables • The higher the MI, the more information about variable A can be obtained by knowing the value of variable B • where H(A) and H(B) are entropies of A and B, • and H(A,B) is the joint entropy of A and B Joint entropy is computed by considering eachcombination of the two variables as a separate outcome
Using MI to detect Characteristic Sites • At a characteristic site, the residue observed is strongly associated to a set of sequences • E.g. : Arg -> Avian Thr -> Human • This association is explored by measuring MI of • The residue observed at a site • The label of the set in which it is observed • MI is in range 0 – 1.0 • MI = 0.0 -> no statistical significance in the occurrence of residues in the two sets • MI = 1.0 -> Residues observed in one set are never observed in the other, and vice versa
Spikes indicate characteristic sites MI Entropy A2A (719 sequences) PB2 Protein PB2 Protein H2H (1650 sequences)
Outline • Background • Mutual Information Analysis • Materials and Methods • Results • Discussion and conclusions 3
Source Sequences • Comprehensive set of PB2 proteins • 3,132 protein sequences with accompanying metadata: • Host • Subtype • Country of isolation • Year of Isolation • Extracted from NCBI Protein and Nucleotide databases • (all proteins > 40,000 sequences) • Automated aggregation, metadata extraction and metadata cleaning - using the ABK software • Multiple sequence alignment (MSA) using Muscle 3.6 • Manually verified and corrected metadata and MSA
Datasets • Three subsets produced for comparison • A2A • Avian sequences for all subtypes, except those that circulate amongst humans (H1N1, H2N2, H3N2, H1N2) and H5N1 • H1N1H • Human sequences for H1N1 • HxN2H • Human sequences for H2N2, H3N2, H1N2 • To retain alignment, subsets are extracted from single MSA H1N1 and HxN2 are separate co-circulating lineages Webster RG et al. (1992). Microbiol Rev. 56(1), 152-179.
Identification of characteristic sites • Compare each of H1N1H, HxN2H against A2A • Pick sites with high MI (>0.4) • Identify characteristic variants of human transmission: At least 4x more frequent in human than in avian set Appear in at least 2% of human sequences • Identify avian characteristic variants • Discard site if >5% human sequences contain avian variants All sites with >2% avian variants were verified by hand • Merge catalogues of sites for H1N1H and HxN2H • Keep only sites that are present in both catalogues
Outline • Background • Mutual Information Analysis • Materials and Methods • Results • Discussion and conclusions 4
Results: 17 characteristic sites Chen 2006 Naffakh 2000 Chen GW et al. (2006) Emerg Infect Dis 12(9), 1353-1360. Naffakh N et al. (2000). J Gen Virol, 81, 1283-1291.
Functional Atlas of PB2 Adaptations Nuclear Localization Signal PB1 binding NP binding RNA cap binding NT DE A S M T T MV VM TA S A A TI IV T K R M L DE N I AV VA TI K E T A T AS R K 9 44 64 81 105 199 271 292 368 475 567 588 613 627 661 674 702 http://www-micro.msb.le.ac.uk/3035/Orthomyxoviruses.html A2A H2H
Reconstructing adaptation timelines Spanish Flu - H1N1 A/Brevig Mission/1/1918 • Characteristic sites can show "adaptation signature" • A summary of mutations necessary for H2H adaptation • We can then characterize any PB2 sequence at these sites
1918: Mostly Avian Signature 1940s: Fully H2H Signature 1957, 1968: No disruption by pandemics: no introduction of avian PB2 protein Sporadic avian/swine infections H1N1 1918-1957 H2N2 1957-1968 H3N2 1968-now Remarkable stability, to present day Human Timeline over 3 pandemics
Swine Influenza Timeline Evidence of avian and human mutations Supports role of Swine as “mixing vessel”
H5N1: Timeline 1997-2006 Presents H2H mutations more frequently than other avian strains H2H mutations usually do not persist H5N1 not “becoming” H2H
Outline • Background • Mutual Information Analysis • Materials and Methods • Results • Discussion and conclusions 5
Discussion: Methodology • Detection of characteristic sites by MI has greater resolving power than previous approaches • Allows multiple characteristic variants at a site • MI method allows large-scale analysis • Thousands of sequences, strong support for findings • Fragments can also be used too • Sequence signatures are effective for recapitulating strain characteristics and understanding trends • Good metadata is necessary for quality analysis • Luckily, this is largely available for Influenza • Other viruses have poorer coverage
Discussion: Human Sequences • H2H variants show remarkable historical stability • Resilience to HA and NA changes suggests limited interplay in adaptation between internal and external proteins • Location of characteristic sites in binding domains suggests complex interactions are involved in adaptation to H2H transmission • Cataloguing characteristic sites in other RNP proteins may shed new light on their roles • Both current lineages of PB2 (H1N1, HxN2) have evolved from the same source (1918 Spanish Flu) • No evidence of PB2 interchange between the two lineages
Discussion: Avian Sequences • Avian strains rarely show any H2H mutation • 77% contain none (H5N1 excluded) • Only one sequence had 3 out of 17 mutation • Spanish Flu had 5 H2H mutations • Could be the minimum set, probably not optimal • H5N1 repeatedly exhibits H2H mutations, but they do not “stick” • May account for its ability to jump the species barrier • May indicate that H5N1 PB2 is far from suited for H2H • Even the E627K mutation was not conserved • Reassortment is still possible- but how pathogenic?
Future Developments • Full Catalogue of Influenza Characteristic Sites • Preliminary results: • Characterization of subgroups of Influenza • Application of the method to other viruses • Release of AVANA tool
Acknowledgements and Thanks • Institute of Systems Science, NUS • Funding support for this conference • Asif M Khan • KN Srinivasan • Testing and feedback on AVANA tool • Partial Grant Support: • National Institute of Allergy and Infectious Diseases, NIH • Grant No. 5 U19 AI56541, Contract No. HHSN2662-00400085C • ImmunoGrid Project • EC Contract FP6-2004-IST-4, No. 028069