1 / 55

Web Services as Integrators of Public Chemistry Databases

Web Services as Integrators of Public Chemistry Databases. Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu. Interdisciplinary Science. “The boundary between the known and the unknown is where science flourishes.” --Michael Shermer

elspeth
Download Presentation

Web Services as Integrators of Public Chemistry Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Web Services as Integrators of Public Chemistry Databases Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu

  2. Interdisciplinary Science • “The boundary between the known and the unknown is where science flourishes.” --Michael Shermer • “There are known knowns… There are known unknowns… There are unknown unknowns…” --Donald Rumsfeld • BUT…. • What about UNKNOWN knowns?

  3. zzzzzCAS • “In an industry first, Chemical Abstracts Service (CAS) has unveiled a revolutionary new literature searching tool which will permit scientists to search and retrieve the world’s chemical literature—including patents and obscure technical reports—in their sleep.” --Author unknown

  4. Overview of the Talk • Introduction to Web Services • Public Databases and Data Repositories for Chemistry • Projects Underway or Planned at Indiana University

  5. Web Services Introduction • What are “Web Services”? • A distributed invocation system built on Grid computing • Independent of platform and programming language • Built on existing Web standards • A service oriented architecture with • Interfaces based on Internet protocols • Messages in XML (except for binary data attachments)

  6. Service Oriented Architecture (SOA) • Goal is to achieve loose coupling among interacting software agents • Define service: a unit of work done by a service provider to achieve desired end results for a service consumer • Both provider and consumer are roles played by software agents on behalf of their owners.

  7. Web Services Architectures • Individual services are registered globally • Broken down into individual services with inputs and outputs specified • Services are published • Services are requested • Open registry, publishing, and requesting

  8. Service-Oriented Architecture • From Curcin et al. DDT, 2005, 10(12),867

  9. Web Services for Science • Invisible Services, Semantic Web, and Grid • Easy-to-use tools for any scientist • High throughput, resource intensive computing done for low cost/resources • Shared community • Collaborations between labs and fields • Shared data • Shared tools

  10. e-Science and the Grid 1 • e-Science:Major UK Program • global collaboration in key areas of science and the next generation of infrastructure that will enable it • reflects growing importance of international laboratories, satellites and sensors and their integrated analysis by distributed teams • total investment of some £200M over the five-year period from 2001 to 2006 • CyberInfrastructure: the analogous US initiative • Grid Technology: supports e-Science & Cyberinfrastructure

  11. e-Science and the Grid 2

  12. Basic Architectures:Servlets/CGI and Web Services Browser Browser GUI Client Web Server HTTP GET/POST WSDL SOAP Web Server WSDL Web Server WSDL WSDL SOAP JDBC JDBC DB or MPI Appl. DB or MPI Appl.

  13. When To Use Web Services? • Applications do not have severe restrictions on reliabilityandspeed. • Two or more organizations need to cooperate. • One needs to write an application that uses another’s service. • Services can be upgraded independently of clients. • Services can be easily expressed with simple request/responsesemantics and simple state.

  14. Web Services Benefits • Web services provide a clean separation between a capability and its user interface. • Increase in productivity • Increase in flexibility • Rapid return on investment • Integration across multiple applications

  15. Web Services Advantages • Output in human- and computer-readable formats • I/O formats based on standard Internet protocols • Resources accessible server to server allow automated I/O • Integration based on specific services: you select services or data needed without downloading the entire data set

  16. Web Services Advantages • Description protocols provide details of service provided and interface components • Semantic Web standards increase efficiency • Use a central registry and standardized description of services • Quality and status of the information is dynamically available

  17. Web Services Drawbacks • Based on new technologies • Time and commitment required to learn • Standards still in a state of flux • Issues with quality of data, (and for chemistry, quantity of open data), security, and privacy

  18. Components of Web Services • Protocols • WSDL • SOAP • UDDI • XML as a basis for the protocols • Ontologies • OWL: Ontology Web Language • Semantic Web

  19. WSDL: Web Service Definition Language • Describes a service’s interface to clients • Services register themselves with Web Services • WSDL describes how to contact and interact with services • I/O, operations and messages to aid interaction with client

  20. SOAP: Simple Object Access Protocol • Maps abstract WSDL service descriptions to concrete implementations • Flexible protocol to communicate information between server and server or client and server using XML • Supports Remote Procedure Calls • Allows layers (security, authentication, transactions) over the basic SOAP elements

  21. UDDI: Universal Description, Discovery, and Integration • Provides ways for clients and services to interact with other services • Uses XML • Defines the means of access, e.g., • URL • E-Mail • Defines services hosted by an entity • Business-oriented tags • Uses SOAP for communicating

  22. XML and Web services • XML lends itself to distributed computing: • It’s just a data description. • Platform, programming language independent • Web Services Description Language (WSDL) • Describes how to invoke a service • Can bind to SOAP, other protocols for actual invocation • Simple Object Access Protocol (SOAP) • Wire protocol extension for conveying RPC calls • Can be carried over HTTP, SMTP

  23. RDF: Resource Description Framework • A standard for statements describing resources • RDF statements consist of a • Subject: the resource being described • Predicate: the property ascribed to the resource • Object: the entity to which the resource bears that property • An alternative to UDDI: said to more flexible and extensible than UDDI

  24. RDFS: Resource Description Framework Schema • RDF statements ascribe properties to resources, but RDF provides no means for describing those properties. • RDFS defines classes and properties that are used to describe classes, properties, and other resources. • RDFS vocabulary descriptions are written in RDF. • RDFS provides a means of specifying a vocabulary for a particular domain.

  25. OWL: Web Ontology Language • Builds on RDF and RDFS and adds a means for richer descriptions of properties and classes • Disjoint classes • Cardinality of classes • Characteristics of relations, like symmetry

  26. Standards for Web Services • Business Process Execution Language for Web Services (BPEL4WS) • Ontology Web Language Semantics (OWL-S) • Web Service Modeling Ontology (WSMO)

  27. Standards Setting Bodies: OASIS • OASIS: Organization for Advancement of Structured Information Standards • ebXML: e-business XML • UDDI: Universal Description, Discovery and Integration • Global Grid Forum • community of users, developers, and vendors leading the global standardization effort for grid computing

  28. Standards Setting Bodies: W3C • W3C: World Wide Web Consortium • OWL: Ontology Web Language • RDF/RDFS: Resource Description Framework/Schema • SOAP: Simple Object Access Protocol • URI/URL/URN: Universal Resource Identifier/Locator/Name • WSDL: Web Service Definition Language • XML: eXtensible Markup Language

  29. Web Services Integration Projects: Biosciences • myGrid • http://www.mygrid.org.uk/ • BIOPIPE • http://biopipe.org/ • BioMOBY • http://biomoby.org/

  30. Web Services for Chemistry: Problems • Performance and scalability • Proprietary data • Competition from high-performance desktop applications -- Geoff Hutchison, it’s a puzzle blog, 2005-01-05 • ALSO: • Lack of a substantial body of trustworthy Open Access databases • Non-standard chemical data formats (over 40 in regular use and requiring normalization to one another)

  31. Necessary Ingredients in Chemistry • Chemical communities to assemble Open Access databases • Well-defined quality assurance procedures performed by distributed peer-review systems • Software underlying the databases needs to be open source.

  32. BlueObelisk.org • A group of chemists, programmers, and informaticians working collaboratively on projects such as: • Chemistry Development Kit (CDK) • JChemPaint • Jmol • JUMBO • NMRShiftDB • Octet • Open Babel • QSAR • World Wide Molecular Matrix (WWMM)

  33. Components of the Semantic Web for Chemistry • XML – eXtensible Markup Language • RDF – Resource Description Framework • RSS – Rich Site Summary • Dublin Core – allows metadata-based newsfeeds • OWL – for ontologies • BPEL4WS – for workflow and web services • Murray-Rust et al. Org. Biomol. Chem. 2004, 2, 3192-3203.

  34. Chemistry Databases on the Web • Marc Nicklaus lists 37 databases as of October 2001 • Must have structure searching and at least 100 molecules • http://cactus.nci.nih.gov/ncidb2/chem_www.html • SoaringBear’s List has 15 databases • http://geocities.com/soaringbear/biomed/chem.html

  35. Institutional Repositories • NARSTO Quality Systems Science Center • http://cdiac.esd.ornl.gov/programs/NARSTO/ • Pollutant species in the troposphere over North America • Part of the Carbon Dioxide Information Analysis Center at ORNL • NARSTO Data and Information Sharing Tool • http://mercury.ornl.gov/narsto/

  36. Public Databases • Developmental Therapeutics Program/NCI • Some assay data for download • Structures for over 200,000 compounds • http://dtp.nci.nih.gov/docs/dtp_search.html • Zinc and other screening databases • NIST computational chemistry database • Environmental fate and exposure databases

  37. Other Public Databases 1 • ChemExper Chemical Directory • > 200,000 substances; > 10,000 IR spectra • http://chemexper.com/ • HIC-Up; Hetero-Compound Identification Centre – Uppsala • 5384 substances as of 1/15/05 • http://xray.bmc.uu.se/hicup/ • Chemicals with Pharmaceutical Activity; a 3D Structural Database • 400 3D structures • http://www.chem.ox.ac.uk/mom/chemical-database/

  38. Isatin in ChemExper

  39. Isatin ChemExper IR Spectrum

  40. Isatin from HIC-Up

  41. Other Public Databases 2 • Cheminformatics.org • 41 data sets in 9 categories as of 8/18/05 • http://www.cheminformatics.org/ • WebReactions • http://webreactions.net/

  42. Other Public Databases 3 • MolTable • http://www.moltable.org/ • MatWeb Materials Property Data • http://www.matweb.com/index.asp?ckck=1 • Spectral Database for Organic Compounds (SDBS) • Over 32,000 compounds • Has EI-MS, FT-IR, 1H NMR, 13C NMR, Raman, ESR • http://www.aist.go.jp/RIODB/SDBS/cgi-bin/cre_index.cgi • NMRShiftDB (Christoph Steinbeck) • 14,753 structures as of 8/19/05 • Features peer-reviewed submission of data sets • http://www.nmrshiftdb.org/

  43. Comment on Link to PubChem • “It is great to know that - with PubChem - there will be something like a single point of entry for people interested in structures and their properties. We are looking forward to link our datasets to PubChem structures.” --Christoph Steinbeck, commenting on the NMRShiftDB, CHMINF-L, 19 April 2005

  44. Other Public Databases:Commercial Teasers • FTIRsearch.com (Thermo Electron) • Demo file of 575 spectra from 87,000 in the full database • https://ftirsearch.com/default3.htm • ChemACX • 30 of >350 suppliers catalog data • http://chemacx.cambridgesoft.com/chemacx/index.asp • Sunset Molecular Discovery, LLC • Wombat (World of Molecular BioAcTivity) • 117,007 entries with over 230,000 biological activities • Wombat PK • Database for Clinical Pharmacokinetics: 643 substances with 4668 measurements • Three sample files from Wombat containing 341 Histamine-1 receptor antagonists • http://www.sunsetmolecular.com/

  45. Indiana University Existing Projects • System for the Integration of Bioinformatics Services (SIBIOS) • http://sibios.engr.iupui.edu • PlatCom: A Platform for Computational Comparative Genomics • http://bio.informatics.indiana.edu/sunkim/Platcom/ • Reciprocal Net • http://www.reciprocalnet.org/index.html

  46. Indiana University Planned Projects • Design of a Grid-based distributed data architecture • Development of tools for HTS data analysis and virtual screening • Database for quantum mechanical simulation data • Chemical prototype projects • Novel routes to enzymatic reaction mechanisms • Mechanism-based drug design • Data-inquiry-based development of new methods in natural product synthesis

  47. Community Grids Lab

  48. Open Systems Lab

  49. Web Services Future • Depends on • Adoption of standards • Incorporation of WS in current and newly developed applications • Security, privacy, quality of data issues • Development of WS tools and resources for e-Science

  50. “Tripos Embraces Web Services” • Goal: to make it easier to incorporate Tripos’ high-throughput workflows • Tripos’ Service-Oriented Informatics (SOI) • In partnership with SciTegic: Web services ensure software compatibility between the companies’ products.

More Related