240 likes | 391 Views
Analyses of ORFans in microbial and viral genomes. Journal club presentation on Mar. 14 Albert Yu. ORFan. Defenition: an ORF with no detectable sequence similarity to other ORFs in the database considered Nearly all genomes have ORFans (df %)
E N D
Analyses of ORFans in microbial and viral genomes Journal club presentation on Mar. 14 Albert Yu
ORFan Defenition: an ORF with no detectable sequence similarity to other ORFs in the database considered Nearly all genomes have ORFans (df %) The more genomes sequenced, the more ORFans have found Most are annotated as hypothetical proteins of unknown function (no exp.)
ORFan continue More data… real , functional proteins 3D nstructure conserved in closely related species (Ka/Ks) Origin of ORFans ???????? Viral laterally transferred genes (especially phages) ? Viral genome Microbial genome
Viral genome Microbial genome
Question: the origin of ORFans Test hypothesis: ORFans have been acquired through lateral gene transfer from viruses To find homologs to these microbial ORFans within the virus sequence database
Genome-wide quantitative study • BLASTP • 277 microbial genomes • 1456 viral genomes • H(g): the number of genomes having at least one homolog of ORFan g • U(g): uniqueness: the genomic distance between the genomes with ORFan g
Classification of ORFans • Singleton: without any homolog wherever H=1, BLASTP=1 • Paralogous: homologs in the same genome H=1, BLASTP>1 • Orthologous: homologs within very closely related microbial genome H>1, U <= 0.1(by observations)
The U-value for all ORFs in prokaryote genomes In total: ORFs: 818906 ORFans: 110186 S: 64324(7.8%) P: 10419(1.3%) O: 35443(4.3%) S or p 0.64 O
ORFans-VH%(OVH): % of ORFans having homologs in viruses (0% ~ 63.8%) • Non-ORFans-VH%(NOVH): % of non-ORFans having homologs in viruses (4.1% ~ 18.2%) • The strength of the hypothesis = the value between these two VH%
Percentages of microbial ORFs with homologs in viruses Gamma proteobacteria Red: OVH Blue: NOVH 24 phylogenetic clades Bacteria Firmicutes Archea
The average % of OVH and NOVH in various groups 6.6% vs 0.8 % 148 10% vs 9 % 63 8.5% vs 2.7 % 66
Conclusion • Most OVH << NOVH: current evidence supporting the hypothesis is weak • Firmicutes and Gamma-proteobacteria have the highest number of homologs in viruses (viral database is biased) Viral database bias 1456 viruses 280 phages (109--Gamma; 102--Firmicutes; 69--others) Sampling ?????
Viral genome Microbial genome
277 Microbial genomes • 1456 viruses All-virus-DB: 43566 ORFs • 280 phages (20%) Phage-DB: 18368 ORFs (42%) ORFans: all-virus: 13078(30%) (v.s. all-virus-DB) 8200 (v.s. all nr, env-nr) all-phage: 6765 (v.s. all-virus-DB) 7047 (v.s. phage-DB)
Some characteristics of ORFans • Bacterial ORFans are shorter than non-ORFans on average • Bacterial ORFans have significant lower GC3 content than non-ORFans
The length of Viral ORFans and non-ORFans Length: Non-ORFans > ORFans
Length: ORFans < non-ORFans GC3%: ORFans < non-ORFans
The number of ORFs per genome in 1456 viruses Focusing on phage: higher %
The growing of the number of phage ORFans (consistent) Keep increasing 38.4% Drop to 0 ?
Each microbial species is a host for at least 10 phage species --- the phage diversity is at least 10 times higher than microbial diversity • Only 280 phage genomes in database (low phage sampling)
Less than 5 phages Virus sampling bias between and within groups
The H-value percentages for all phage ORFs and prokaryotic ORFs prokaryotes phages 38.4% - ORFans 9.1% - ORFans 32.4% - ortho 11.3% - ortho
4397(61.5%) / 7150(63.8%) / 11212 (prophage/ prokaryotic homologs/ phage non-ORFans) • 589(44.7%) / 1317(18.7%) / 7047 (prophage/ prokaryotic homologs/ phage ORFans) • 4987(58.9%)/8467(46.4%)/18248 (prophage/ prokaryotic homologs/ phage ORFs)