1 / 19

Databases of homologous gene families: new developments and web interfaces.

Databases of homologous gene families: new developments and web interfaces. Simon Penel, Julien Grassot, Laurent Duret, Manolo Gouy, Guy Perri è re. Pôle Bio-Informatique Lyonnais. Equipe Bioinformatique et Génomique Evolutive Laboratoire de Biométrie et Biologie Evolutive

krikor
Download Presentation

Databases of homologous gene families: new developments and web interfaces.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Databases of homologous gene families: new developments and web interfaces. Simon Penel, Julien Grassot, Laurent Duret, Manolo Gouy, Guy Perrière. Pôle Bio-Informatique Lyonnais Equipe Bioinformatique et Génomique Evolutive Laboratoire de Biométrie et Biologie Evolutive Université Claude Bernard - Lyon 1

  2. Homologous Genes Databases Important regions identification in genomic sequences Evolution at the molecular level Species phylogeny Function prediction Research fields: • Proteome/genome comparative analysis • Phylogenetic studies • Orthology/Paralogy relationship assignments • Development of generalist databases, specialised databases • HOVERGEN: families of homologous vertebrate genes • HOBACGEN: families of homologous bacterial genes • NureBase, RTKdb, Hoppsigen, Mitalib,..

  3. The HoGenom database: Homologous Genes Families of fully Sequenced Organisms European project TEMBLOR Extension of HOVERGEN and HOBACGEN to all organisms for which the complete genome sequence has been determined • Structured under the ACNUC (M. Gouy) retrieval system: flat file & index files • Integrates : • Protein multiple alignments • Phylogenetic trees • Taxonomic data • Nucleic and protein sequences • Sequence annotations

  4. Protein Alignments Phylogenetic trees ACNUC Protein database ACNUC Nucleotide database Building of HoGenom Selection of fully sequenced organisms protein sequences on the EBI proteome site. For each family Sequence comparison with BLAST on the whole sequences dataset Clustering of the sequences in genes family on the basis of sequence similarity (transitive association) Add the gene family info in the protein sequence annotations EMBL cross references calculations, nucleotide sequences selection Add gene family info in the EMBL/GenBank nucleotide annotations

  5. Protein sequence annotations Hogenprot: Q9DCD0 ID Q9DCD0 PRELIMINARY; PRT; 483 AA. AC Q9DCD0; DT 01-JUN-2001 (TrEMBLrel. 17, Created) DT 01-JUN-2001 (TrEMBLrel. 17, Last sequence update) DT 01-MAR-2002 (TrEMBLrel. 20, Last annotation update) DE 0610042A05RIK PROTEIN. GN 0610042A05RIK. OS Mus musculus (Mouse). OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; OC Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Mus. OX NCBI_TaxID=10090 RN [1] RP SEQUENCE FROM N.A. RC STRAIN=C57BL/6J; TISSUE=KIDNEY; RX MEDLINE=21085660; PubMed=11217851; RA Kawai J., Shinagawa A., Shibata K., Yoshino M., Itoh M., Ishii Y., ---- RA Hayashizaki Y.; RT "Functional annotation of a full-length mouse cDNA collection."; RL Nature 409:685-690(2001). CC -!- CATALYTIC ACTIVITY: 6-PHOSPHO-D-GLUCONATE + NADP(+) = D-RIBULOSE CC 5-PHOSPHATE + CO(2) + NADPH. CC -!- PATHWAY: HEXOSE MONOPHOSPHATE SHUNT. CC -!- SIMILARITY: BELONGS TO THE 6-PHOSPHOGLUCONATE DEHYDROGENASE CC FAMILY. CC -!- GENE_FAMILY: HBG000005 [ FAMILY / ALN / TREE ] DR EMBL; AK002894; BAB22439.1; -. DR HSSP; P00349; 2PGD. DR MGD; MGI:1914101; 0610042A05Rik. DR InterPro; IPR001744; 6PGD. DR Pfam; PF00393; 6PGD; 1. DR PRINTS; PR00076; 6PGDHDRGNASE. DR PROSITE; PS00461; 6PGD; 1. DR PRODOM; Q9DCD0. DR SWISS-2DPAGE; Q9DCD0. KW NADP; Oxidoreductase; Pentose shunt. FT DOMAIN 5 60 PRODOM:2001.3:PD001594 134 FT DOMAIN 63 296 PRODOM:2001.3:PD001025 91 FT DOMAIN 316 469 PRODOM:2001.3:PD001549 79 SQ SEQUENCE 483 AA; 53247 MW; CD0A3F72EEC2831E CRC64;

  6. Nucleotide sequence annotations Hogennucl: AK002894.PE1 AK002894.PE1 Location/Qualifiers FT CDS_pept 76..1527 FT /codon_start=1 FT /db_xref="MGD:MGI:1914101" FT /db_xref="SWISS-PROT:Q9DCD0" FT /note="data source:SPTR, source key:P52209, evidence:ISS" FT /note="homolog to 6-PHOSPHOGLUCONATE DEHYDROGENASE, FT DECARBOXYLATING (EC 1.1.1.44)" FT /note="putative" FT /transl_table=1 FT /gene_family="HBG000005" FT /protein_id="BAB22439.1" FT /translation="MAQADIALIGLAVMGQNLILNMNDHGFVVCAFNRTVSKVDDFLAN FT EAKGTKVVGAQSLKDMVSKLKKPRRVILLVKAGQAVDDFIEKLVPLLDTGDIIIDGGNS FT EYRDTTRRCRDLKAKGILFVGSGVSGGEEGARYGPSLMPGGNKEAWPHIKAIFQAIAAK FT VGTGEPCCDWVGDEGAGHFVKMVHNGIEYGDMQLICEAYHLMKDVLGMRHEEMAQAFEE FT WNKTELDSFLIEITANILKYRDTDGKELLPKIRDSAGQKGTGKWTAISALEYGMPVTLI FT GEAVFARCLSSLKEERVQASQKLKGPKVVQLEGSKKSFLEDIRKALYASKIISYAQGFM FT LLRQAATEFGWTLNYGGIALMWRGGCIIRSVFLGKIKDAFERNPELQNLLLDDFFKSAV FT DNCQDSWRRVISTGVQAGIPMPCFTTALSFYDGYRHEMLPANLIQAQRDYFGAHTYELL FT TKPGEFIHTNWTGHGGSVSSSSYNA" atggcccaag ctgacattgc actgatcgga ctggctgtca tgggccagaa cttaattttg 60 aacatgaatg atcatggatt tgtggtctgt gctttcaata ggacagtctc caaagtcgat 120 …. ccctgcttca ctactgccct ctccttctat gatgggtaca gacacgagat gctgccagca 1320 aacctcatcc aggctcaacg ggattacttt ggggctcaca cctatgaact cttaaccaaa 1380 ccgggagaat ttatccacac caactggacg ggccacgggg gcagtgtgtc atcctcttca 1440 tacaatgcct ag 1452 //

  7. HoGenom ACNUC contents8th September 2003 117 fully sequenced organisms Data Source Protein data from EBI: non-redondant complete proteome sets (SWISS-PROT, TrEMBL, TrEMBLnew) http://www.ebi.ac.uk/proteome, June 2003 Genomic data from EMBL , June 2003 HoGenom Proteins 423,577 sequences HoGenom Nucleotide Sequences 448,582 cds

  8. Arabidopsis thaliana (plant) Caenorhabditis elegans (nematod) Drosophila melanogaster (fly) Encephalitozoon cuniculi (microsporidia)  Guillardia theta (alguae)  Homo sapiens (man) Mus musculus (mouse) Rattus norvegicus (rat) Saccharomyces cerevisiae (yeast) Schizosaccharomyces pombe (fungus) 16 10 91 9% 31% 60% 117 organisms 423 577 protein sequences

  9. 423 577 protein sequences 115 373 Sequences belonging to a family 305 514 (72%) 305 514 115 373 Orphan Sequences (27%) 41 907 families

  10. Access to HoGenom is available at the PBIL: http://pbil.univ-lyon1.fr/ Web page of HoGenom : http://pbil.univ-lyon1.fr/databases/hogenom.html

  11. Databases Access on the WebTwo main www interfaces • WWW Query • Multiple query on sequences (Guy Perrière) • Multiple query on families • http://pbil.univ-lyon1.fr/search/query_fam.php • Cross Taxa • Search of families in function of complex taxonomic criteria • Selection of families • http://pbil.univ-lyon1.fr/search/cross_fam.php

  12. A list of families Cross Taxa: Selection of Gene Familiesexample : selecting families of animal specific genes

  13. A list of families √ √

  14. display family

  15. Family Page

  16. Application to other databases • Any sequence database can be structured under ACNUC and queried with WWW-Query • Currently available : • SWISS-PROT, • EMBL, • GenBank, • etc. • Any family database can be structured under ACNUC and queried with WWW-Query and Cross-Taxa • For example, an ACNUC version of the HAMAP database developed by SWISS-PROT is currently available at the PBIL

  17. http://pbil.univ-lyon1.fr/cgi-bin/acnuc-link-ac2fam?db=HAMAPprot&query=Q8ZY16http://pbil.univ-lyon1.fr/cgi-bin/acnuc-link-ac2fam?db=HAMAPprot&query=Q8ZY16 Cross-references with external databases1 sequence associated family http Display the family, alignment and phylogenetic tree associated to an sequence accession number via a URL link. Example: sequence Q8ZY16 in NiceProt : cross-references to HAMAP-ACNUC and HOBACGEN

  18. Acknowledgements Peoplefrom BBE: SWISS-PROT group Laurent Duret Alexandre Gattiker Manolo Gouy Julien Grassot Simon Penel Guy Perrière • This project is supported by • the European Commission (TEMBLOR) • the Rhône-Alpes region (Projet Thématiques Prioritaires)

More Related