320 likes | 471 Views
Biorange Meeting 2007-04-03 PhyloPat phylogenetic pattern analysis of eukaryotic genes & Immunophyle its application on the evolution of the immune system Tim Hulsen. Introduction (1). Phylogenetic patterns show presence/absence of genes over a certain set of species:
E N D
Biorange Meeting 2007-04-03 PhyloPatphylogenetic pattern analysisof eukaryotic genes&Immunophyleits application on theevolution of the immune systemTim Hulsen
Introduction (1) • Phylogenetic patterns show presence/absence of genes over a certain set of species: e.g. for 10 species: 0011101011 • Very useful for all kinds of evolutionary analyses: • Origin of certain genes • Deletion of certain genes • Clustering of genes with similar patterns: likely to have similar function / be in same pathway
Introduction (2) • Earlier phylogenetic pattern initiatives: • Phylogenetic Pattern Search (PPS), incorporated into COG (Natale et al., 2000) • Extended Phylogenetic Patterns Search (EPPS) (Reichard & Kaufmann, 2003) • Incorporated into OrthoMCL-DB (Chen et al., 2006) • All applied on proteins, not on genes! • PhyloPat: phylogenetic pattern analysis of eukaryotic genes
Method • Genes: easier to check for lineage-specific expansions (no alternative transcripts or splice forms); less redundant • Originally performed on Ensembl (EnsMart) database v40: 21 fully available genomes (i.e. no Pre! versions or low coverage genomes): S. cer. to H. sap. • Make use of accurate Ensembl orthology pipeline (combination of BLAST,SW,MUSCLE and PHYML) • Single linkage cluster algorithm: create orthologous groups containing ALL genes in Ensembl
Results • 446,825 genes were clustered into 147,922 groups, using 3,164,088 orthologies from 21 species • Species ordered from ‘low’ ( ) to ‘high’ ( ), i.e. approximate distance to human : • Can be queried in several ways • Output in HTML, Excel or plain text format
Web interface http://www.cmbi.ru.nl/phylopat
Pattern/ID Search • Binary string: 0=absent, 1=present, *=absent/present e.g. ‘00000********11111111’: must be absent in non-chordata , must be present in all mammals • MySQL regular expression: e.g. ‘^0*1{10}0*$’ gives all genes that occur only in ten subsequent species • Input list of Ensembl/EMBL IDs (PhyloPat contains EMBL to Ensembl mapping)
Oligo-/Polypresent Genes • Oligopresent: present in only one/two species (oligo=few), e.g. ‘000000010000000000100’ • These two species should be highly related • C. sav C. int 1737 div. 100 Mya (Boffelli et al., 2004) • T. nig T. rub 1572 div. 85 Mya (Yakanoue et al., 2006) • A. gam A. Aeg 1058 div. 140 Mya (Service, 1993) • P. tro H. sap 887 div . 6 Mya (Glazko & Nei, 2003) • R. nor M. Mus 713 div. 20 Mya (Springer et al., 2003) • Polypresent: present in all species, except for one/two (poly=many), e.g. ‘111110111110111111111’ • These two species should be related too; similar analysis possible
Omnipresent genes • Omnipresent: present in all 21 species (omni=all): ‘111111111111111111111’ • Currently 1001 omnipresent groups • Tend to have very general/important functions, mostly involved in transcription/translation
FatiGO analysis • FatiGO: connection with GO terms, KEGG pathways, InterPro domains, etc. (El-Shahrour et al., 2004) • Analysis of all human genes in output by just one mouse click • e.g. omnipresent genes:
Other possibilities • Anti-correlating patterns: e.g. ‘001111100011000000000’ and ‘110000011100111111111’ could be completely different, or very similar (analogous)! • Easy homology-inferred functional annotation (using information from other genes in the same lineage)
Case study: Hox genes (1) • Hox genes determine where limbs and other body segments will grow in a developing embryo • Should exist mostly in vertebrates • Expansion in teleost fish species ( , 8-11); seven Hox clusters instead of the mammalian four • Search Ensembl database for human genes with term ‘hox’ in annotation • 44 genes found -> enter in PhyloPat -> 32 groups found (PP######)
Case study: Hox genes (2) PPID # genes per species phylogenetic pattern gene name(s) PP022041 011111136562233233222 011111111111111111111 MSX1, MSX2 PP024984 001000011111001111111 001000011111001111111 HOXC4 PP027791 001110023343233333333 001110011111111111111 TLX1, TLX2, TLX3 PP049478 000000221153112322223 000000111111111111111 HOXB8, HOXC8, HOXD8 PP053824 000000011120010101011 000000011110010101011 HOXD11 PP053827 000000022211111111111 000000011111111111111 HOXA10 PP053828 000000021111212122222 000000011111111111111 HOXC13, HOXD13 PP053829 000000063341122222222 000000011111111111111 HOXA1, HOXB1 PP053830 000000011110010111111 000000011110010111111 HOXB4 PP053832 000000021111011111111 000000011111011111111 HOXA5 PP053833 000000021110111111011 000000011110111111011 HOXB2 PP053834 000000031101011111111 000000011101011111111 HOXD3 PP053835 000000021110111111101 000000011110111111101 HOXA9 PP053836 000000021111111111111 000000011111111111111 HOXA3 PP053838 000000021110101111111 000000011110101111111 HOXC12 PP053839 000000011111111110111 000000011111111110111 HOXD4 PP053840 000000021111201011101 000000011111101011101 HOXC11 PP053842 000000043221111111111 000000011111111111111 HOXA13 PP053844 000000032231011111111 000000011111011111111 HOXB5 PP053845 000000021111111111011 000000011111111111011 HOXB3 PP053846 000000021121111111111 000000011111111111111 HOXD10 PP053847 000000022211111111111 000000011111111111111 HOXA2 PP053849 000000034151132333323 000000011111111111111 HOXA6, HOXB6, HOXC6 PP053853 000000011101111111011 000000011101111111011 HOXA4 PP053854 000000032252223133213 000000011111111111111 HOXB9, HOXC9, HOXD9 PP053858 000000011120011111111 000000011110011111111 HOXA11 PP070659 000000000121212222222 000000000111111111111 HOXA7, HOXB7 PP075622 000000000010001111111 000000000010001111111 HOXC5 PP084287 000000000001101111111 000000000001101111111 HOXC10 PP085049 000000000001011011111 000000000001011011111 HOXD1 PP087941 000000000000111011111 000000000000111011111 HOXD12 PP089685 000000000000111111111 000000000000111111111 HOXB13
Case study: Hox genes (3) PPID(s) name cl.A cl.B cl.C cl.D first sp. position PP053829,085049 HOX1 HOXA1 HOXB1 HOXD1 T. nigrov. anterior PP053847,053833 HOX2 HOXA2 HOXB2 T. nigrov. anterior PP053836,053845,053834 HOX3 HOXA3 HOXB3 HOXD3 T. nigrov. PG3 PP053832,053844,075622 HOX5 HOXA5 HOXB5 HOXC5 T. nigrov. central PP053849 HOX6 HOXA6 HOXB6 HOXC6 T. nigrov. central PP053835,053854 HOX9 HOXA9 HOXB9 HOXC9 HOXD9 T. nigrov. posterior PP053827,084287,053846 HOX10 HOXA10 HOXC10 HOXD10 T. nigrov. posterior PP053858,053840,053824 HOX11 HOXA11 HOXC11 HOXD11 T. nigrov. posterior PP053838,087941 HOX12 HOXC12 HOXD12 T. nigrov. posterior PP053842,089685,053828 HOX13 HOXA13 HOXB13 HOXC13 HOXD13 T. nigrov. posterior PP053853,053830,024984,053839 HOX4 HOXA4 HOXB4 HOXC4 HOXD4 A. gamb. central PP027791 TLX TLX1 TLX2 TLX3 A. gamb. PP070659 HOX7 HOXA7 HOXB7 G. acul. central PP049478 HOX8 HOXB8 HOXC8 HOXD8 C. intest. central PP022041 MSX MSX1 MSX2 C. eleg. ‘First’ vertebrate Non- vertebrate Vertebrate Non- vertebrate Non- vertebrate
Conclusions • PhyloPat: quick and easy tool for phylogenetic pattern search on complete Ensembl database • Also usable for study of lineage-specific expansions of genes • Updated immediately with each new Ensembl version: • v41; 5 new species: • v42; 1 new species: • v43: 4 new species: + extra option: gene neighborhood
Gene neighborhood Conservation of gene order = functionally related Equal color = belonging to same orthologous group
Where to find PhyloPat • Web interface: http://www.cmbi.ru.nl/phylopat (accessible through www.cmbi.ru.nl and www.nbic.nl) • Publication: Hulsen T., Groenen P.M.A., de Vlieg J. BMC Bioinformatics 2006, 7: 398 http://www.biomedcentral.com/1471-2105/7/398 • Powered by Ensembl: http://www.ensembl.org/info/about/ensembl_powered.html
Chicken-human immunogenomics project (part of Biorange SP3.2.2) In collaboration with Martien Groenen, Hinri Kerstens (Animal Sciences Group, Wageningen UR) • Goals: • study evolution of genes/proteins involved in immune system, from chicken to human • check for expansions and deletions in families • zoom in to interesting families
Proteins -> Genes • Earlier initiatives: based on proteins (Protein World, IPI, ParAlign, MCL) • Disadvantages: • large scale computations needed for orthology determination • Difficult to study lineage-specific expansions because of alternative transcripts, isoforms • Difficult to connect to WUR synteny data • --> Genes: connect to PhyloPat tool
PhyloPat (dis)advantages • Advantages: • Usage of accurate orthology determination of Ensembl (BLAST/SW, MUSCLE, PHYML), single linkage clustering by ourselves) • No alternative transcripts, isoforms • Easy to connect to WUR synteny data • 26 species, from S.cer. to H.sap. • Disadvantage: • Genome information sometimes incomplete (but Pre-versions and low coverage genomes are not included)
Immunophyle • Application to immune system: parse through PhyloPat set using IRIS database • Take all HUGO IDs from IRIS database, input in PhyloPat (v41)-> 585 immunologic lineages containing 18,933 genes from 26 species • Divided into immunologic 22 categories from IRIS database (adaptive immunity, innate immunity, inflammation, chemotaxis, etc. • Connected to GO, InterPro, KEGG, etc. by FatiGO
Immunophyle • http://www.cmbi.ru.nl/immunophyle
Example pathway: Toll-like receptors GeneGo MetaCore, canonical pathway Interspecies differences can possibly be explained by looking at number of orthologs for each gene in the pathway
Example pathway: Toll-like receptors Check ImmunoPhyle for each gene involved in the TLR pathway: Green: ‘first’ occurrence Red: deletion
Current/future directions • Differences in immune system between model organism and man cannot be explained only by looking at numbers of orthologs connect to literature, expression data, protein interaction data, structural data • Zoom in to families with help of immunologists
Acknowledgements • Peter Groenen • Wilco Fleuren … and others (Martien Groenen, Hinri Kerstens, Erik Franck, Arnold Kuzniar, etc.) for suggestions!