E N D
“Exploration of adenovirus biology and diversity with genome sequencing and analysis; Explorations of genomes with application and development of bioinformatic tools"Don Seto, Ph. D.Associate ProfessorBioinformatics and Computational Biology, School of Computational SciencesGeorge Mason University September 27, 2005
Scientific interests • Genomics and bioinformatics: Human adenovirus diversity and natural history based on genome determinations and analysis. Evolution and patho-epidemiology. • new insights: adaptive evolution and host jumping. • Comparative virology: Adenovirus and Poxvirus genomes. • Genome informatics: Software tools development and applications. GeneOrder, CoreGenes, automated genome annotation and multiple whole genome nucleotide/coding sequence alignments, as applied to • Virus genomes (ca 36,000 to 350,000 bases). • Small bacterial genomes (< 2Mb). • Larger bacterial genomes (> 2Mb).
Informatics: development of software tools Problem: Shortage of tools to analyze whole genomes. Tools developed and under development • GeneOrder • GO 1, 2, 3 have been developed to examine genomes sizes of viruses, mitochondria, chloroplasts and small genome bacteria (<2Mb) • GO aligns two genomes with respect to coding regions that are similar or identical • GO 4 is under development for analysis of ‘regular’ sized bacteria (>2Mb) • CoreGenes • CG 1 has been developed to examine viruses, mitochondria and chloroplasts • CG 2 and 3 for <2Mb and >2Mb genomes are in development Tools under development • Automated genome annotator • using adenovirus genomes as model for more complex genomes • Multiple sequence alignment of large whole genomes • Dr. Xiaoqiu Huang (ISU), combining nucleotide and coding sequences
GeneOrder analysis: Example of a completed genome analysis tool (but still being optimized) • Two genomes are compared to each other • Each point is a coding sequence • Identifies gene order and synteny (comparable areas of the genomes • Co-linear arrangements suggest possible gene expression, function and evolution relationships • Identifies regions of genomic rearrangement events
Poxvirus genomes: GeneOrder analysis • Done, but still being optimized. • up to 350,000 nucleotide virus genomes • including small pox and related genomes • related but diverse viruses related • (FPV= fowl virus; GP= goat virus; • vaccinia= human virus) • Public health, e.g., SARS, influenza, • avian influenza and new dog flu • Biothreats, small pox and recombinants
“MAP” alignment of variola and vaccinia consensus sequences • In development. • Dr. Huang’s MAP series of alignment tools are not optimal currently • for adenovirus genomes at 36,000 nucleotides. • A collaboration will optimize these nucleotide alignments and include a GeneOrder component, e.g., coding sequences as [anchors/]references.
Informatics, but where is the biology?-Adenovirus • As a family, adenoviruses infect all vertebrates. • In humans, can infect a variety of tissues and organs. • Causes a range of diseases including GI problems and respiratory diseases, including Acute Respiratory Disease, potentially fatal pneumonia. • Our group- first new genome sequences in 20 years and comprehensive annotations. • Competition as Ads are now important as diseases; as human gene therapy vectors; and as vaccine vectors.
Adenovirus genomes: Sequence to “thorough” coding annotations
Genome annotation algorithm: Data mining of genes and features -Quality control on DNA sequencing and assembly. -Identity of genome signature probes for molecular diagnostics. -Closure for completing genomics and bioinformatics portions (basic research). -Extend legacy 2, 5, 12, 17, 40. -New Ad1, 3, FS3(N), 4, 4vac, FS4(A#1), FS4(AF#2), 6, 7, 7vac, FS7(N), 14, 16, 21, 34, 50 and BT4/5 [=FS5].... 17 total. Identification of patho-epidemiology and evolution features: Ad4 zoonosis and genome recombinations.
Manuscript and analysis phase: Annotation and analyses gene map non-coding coding From analysis, get....
Ad4 jumped from chimp to human (zoonosis) • Evolution- rapid adaptation to new host in <50 years • Implications for gene therapy/vaccine vector development HAdV-4 SAdV-25 HAdV-7 HAdV-5 Color coded gene order/synteny: -Blue fibers- L5 -Brown E3 “d-1” -Brown-green E3 (E3 genes counter host response)
Analysis and manuscript phase: Evolution and Vaccine, Field (epidemic), Coinfection and BreakThrough strains • Natural history and molecular phylogeny of field strains. • Whole genome phylogeny analysis with Dr. Marc Allard (GTU). • Evolution and patho-epidemiology of adenoviruses. • Evolution rates of genomes and genes. • Ad4FS- rapid adaptive evolution, ‘super’ virus now implicated in 99+% of acute respiratory disease cases. • Genome jumps! (recombinations x2). • Identity of epidemic strains. • Relationship and value of vaccine strains. • Effectiveness of ‘current’ vaccines: Half-life of efficacy. • Biology of “BreakThrough” strains and coinfection strains [new!]
Example 1- genomics data mining: Vaccine strains • Ad4prototype vs Ad4vaccine: Two insertions into vaccine strain. • Ala insertion as a result of GCG into Ad4vaccine at 25989. • T insertion at 28423 of Ad4vaccine with no apparent coding/regulatory effects. • Ad4vaccine is essentially Ad4prototype “as is.” Little genome differences. • Ad7prototype and Ad7vaccine: Very similar, but different...... • An Ad7prototype is “Gomen” strain; Ad7vaccine is essentially “Greider” strain (two contemporary prototype strains. • Vaccines- Wyeth attenuation not by virus manipulation but by ‘gut’ vs lung innoculation. • New data here suggests this is not optimal- genome recombinations common. • “Super” mutant! [Ad4FSs currently!]
Example 2- genomics data mining: “BreakThrough” strain • Strain X- is it 4, left and nasty, or 5, right and mild???? • “BreakThrough Ad4/5” strain- new isolate of adenovirus. • Isolated from a vaccinated recruit or a vaccinated population. • Serotyped as Ad4 by microneutralization (old). • Genome determination and analysis (new) suggest it is Ad5. • BLAST; genome alignments; hexon and fiber trees. • But also contains genome of Ad21. • Other such coinfections contain 2, 3, 4, 5, 6 simultaneous Ads!! • Recall genome recombinations and “super” strains....
Also, other studies- hexon analysis: Genomics to crystals Map critical regions onto crystal structures of proteins, like hexon. Hexons are coat proteins that antibodies can bind and neutralize, so -Why does this antisera not work with AdX but does against AdY?
And- Molecular modeling: Fibers and CAR 1) D-D-A-D P-P-P-P 2) S-S-S-P 3) P-P-P-P 4) K-K-K-K From sequence alignment of.... Fibers can dictate cell types to infect, so Why does AdX infect lung and not GI?
Molecular modeling: Comparative fibers --- - - Species A, C, E bind CAR; Species B1 and B2 do not.
Accomplishments and in progress • 17 genomes sequenced and analyzed; several published. • Families of genes in the process of analysis. • Ad4 is a result of zoonosis. • Ad4FS have adaptively evolved to human host. • accounts for ‘super’ virus, from <10% to 99+% of ARD cases. • through genome recombinations. • Genome recombinations are common between adenoviruses. • Coinfections of adenoviruses occur naturally. • Vaccine development must take the above into account. • or else, ‘super’ virus. • Implications in gene therapy and in vaccine vector development. • Whole genome analysis tools GeneOrder and CoreGenes. • Multiple Sequence (nucleotide and coding anchors) Alignment tool.