1 / 103

The STRING database

The STRING database. Michael Kuhn EMBL Heidelberg. protein interactions. example. Tryptophan synthase beta chain E. Coli K12. many sources. genomic context. curated knowledge. experimental evidence. T. literature. 373 genomes. (only completely sequenced genomes). 1.5 million genes.

paul
Download Presentation

The STRING database

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The STRING database Michael Kuhn EMBL Heidelberg

  2. protein interactions

  3. example • Tryptophan synthase beta chain • E. Coli K12

  4. many sources

  5. genomic context

  6. curated knowledge

  7. experimental evidence T

  8. literature

  9. 373 genomes • (only completely sequenced genomes)

  10. 1.5 million genes • (not proteins)

  11. Genome Reviews

  12. RefSeq

  13. Ensembl

  14. model organism databases

  15. data integration

  16. genomic context methods

  17. gene fusion

  18. gene neighborhood

  19. phylogenetic profiles

  20. Cell Cellulosomes Cellulose

  21. automatic inferenceof interactions

  22. correct interactions

  23. wrong associations

  24. gene fusion • score: sequence similarity

  25. gene neighborhood • score: sum of intergenic distances

  26. phylogenetic profiles

  27. SVD • singular value decomposition • (removes redundancy)

  28. score: Euclidean distance

  29. all scores are “raw scores”

  30. not comparable • sequence similarity • sum of intergenic distances • Euclidean distance

  31. benchmarking • calibrate against “gold standard” • (KEGG)

  32. raw scores

  33. probabilistic scores • e.g. “70% chance for an assocation”

  34. curated knowledge

  35. KEGG • Kyoto Encyclopedia of Genes

  36. Reactome

  37. GO • Gene Ontology

More Related