100 likes | 245 Views
POSTER JO 60. First release of HOGENOM, a database of homologous genes from complete genome. Simon Penel, Laurent Duret, Pascal Calvat, Jean-Fran ç ois Dufayard, Guy Perri è re, Manolo Gouy. Equipe Bioinformatique et Génomique Evolutive Laboratoire de Biométrie et Biologie Evolutive
E N D
POSTER JO 60 First release of HOGENOM, a database of homologous genes from complete genome Simon Penel, Laurent Duret, Pascal Calvat, Jean-François Dufayard, Guy Perrière, Manolo Gouy. Equipe Bioinformatique et Génomique Evolutive Laboratoire de Biométrie et Biologie Evolutive Université Claude Bernard - Lyon 1
Homologous Genes Databases Research fields: • Proteome/genome comparative analysis • Phylogenetic studies • Orthology/Paralogy relationship assignments • Development of generic databases, specialised databases • HOVERGEN: families of homologous vertebrate genes • HOBACGEN: families of homologous bacterial genes • NureBase, RTKdb, Hoppsigen, Mitalib, Polymorphix..
The HoGenom database: Homologous Genes Families from fully Sequenced Organisms European project TEMBLOR Contents: • Nucleic and protein sequences • Sequence annotations • Taxonomic data • Protein multiple alignments • Phylogenetic trees
1 sequence 1 species Protein sequences Mouse Human etc. Rat Proteome sets SwissProt TrEMBL TrEMBL-new The HoGenom database: Building of Database Data selection European Bioinformatic Institute 1 sequence many species
BLASTP BLOSUM62 E ≤ 10-4 Parralelised calculations at IN2P3 The HoGenom database: Building of Database Similarity search Filtering (SEG) Local pairwise alignments
A A B C HSP ≥ 80% length Similarity ≥ 50% A B Protein Family C Cluster A, B, C The HoGenom database: Building of Database Clustering into families 1 : Clustering of complete sequences into families 2 : Including partial sequences to the families defined previously
A B C D E F G CLUSTAL W Default parameters Protein family A B C BIONJ D Neighbor joining, Observed divergence Partial sequences: distance matrix with missing values E F G Phylogenetic tree The HoGenom database: Building of Database Alignments and trees A B C D E F G Multiple alignment Rooting: mid-point
16 10 91 9% 31% 423 577 proteins, 527 925 cds 41 907 families 60% The HoGenom database: Contents Arabidopsis thaliana (plant) Caenorhabditis elegans (nematod) Drosophila melanogaster (fly) Encephalitozoon cuniculi (microsporidia) Guillardia theta (alguae) Homo sapiens (man) Mus musculus (mouse) Rattus norvegicus (rat) Saccharomyces cerevisiae (yeast) Schizosaccharomyces pombe (fungus) 117 organisms
Querying the databases WWW Query Query on sequences and families according to multiple criteria Cross Taxa Query on families according to complex taxonomic criteria
POSTER JO-60 à suivre…