1 / 37

Integration of the Biological Databases into Grid-Portal Environments

Integration of the Biological Databases into Grid-Portal Environments. Michal Kosiedowski , Michal Malecki, Cezary Mazurek, Pawel Spychala, Marcin Wolski. Introduction PROGRESS Grid-Portal Environment Data Management System Enabling SRS r esources within DMS Case study Conclusions.

gerodi
Download Presentation

Integration of the Biological Databases into Grid-Portal Environments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Integration of the Biological Databases into Grid-Portal Environments Michal Kosiedowski, Michal Malecki, Cezary Mazurek, Pawel Spychala, Marcin Wolski

  2. Introduction PROGRESS Grid-Portal Environment Data Management System Enabling SRS resources within DMS Case study Conclusions Agenda

  3. PSNC was established in 1993 and is an R&D Center in: New Generation Networks POZMAN and PIONIER networks 6-NET, ATRIUM, Muppet, HPC and Grids GRIDLAB, CROSSGRID, VLAB,PROGRESS, Clusterix, HPCEuropa Portals and Content Management Tools Polish Educational Portal "Interkl@sa", Multimedia City Guide, Digital Library Framework,Interactive TV R&D Center

  4. Project Partners PSNC IBCh Poznan SUN Microsystems Poland Cyfronet AMM, Krakow Technical University Lodz Co-funded by The State Committee for Scientific Research (KBN) and SUN Microsystems Poland PROGRESS (1)

  5. The PROGRESS project produced a set of open source tools for use by: Grid constructors Computational applications developers Computing portals operators PROGRESS (2)

  6. Gdańsk Wrocław PROGRESS (3) • Cluster of 80 processors • Networked Storage of 1,3 TB • Software: ORACLE, HPC Cluster Tools, Sun ONE, Sun Grid Engine, Globus

  7. Hide the data management complexity from the end users Use new standards defined by grid organizations Co-operate with different kinds of client applications Provide seamless access to data and information for grid computing Enable intuitive and efficient methods for resource exploration Providing friendly interface to data management for administrators and scientists Data Management Issues

  8. PROGRESS

  9. PORTLETS GRID SERVICEPROVIDER WS GRID RESOURCEBROKER DATAMANAGEMENT Web Services and PROGRESS

  10. A distributed system enabling the management of grid data files Stores files in distributed storage modules of various types: generic filesystems, archivers, relational databases Uses metadata to describe files Allows access to data banks like a mirror of Sequence Retrieval System Exposes its functionality within the Data Broker Service Data Management System

  11. Virtual file system keeping the data organized in a tree structure Metadirectories – hierarchize other objects Metafiles - represent a logical view of computational data regardless of their physical location DMS provides its services in a form of Web Services API (Data Broker Service) DMS Functionality

  12. Web Services interface: storing, access, describing and delivery of data directory mgmt.: e.g. add, remove and rename directories, retrieve root and current path, change path, file mgmt.: e.g. add, remove and rename files, add, remove and retrieve physical file location, metadata mgmt.: e.g. retrieve list of schemes and attributes, assign schemes to files and edit values external datasource mgmt.: e.g. databanks content retrieving, entry resolving, databanks exploring DMS Functionality

  13. DMS Architecture

  14. Serves as an interface (Web Services) for external clients, such as the HPC Portal and the grid resource broker Mediates in the flow of all requests directed to the DMS Authorizes the client that submitted the request Data Broker

  15. Central and single point of metadata management Responsible for all metadata operations and their storage and maintenance It stores the following sorts of information: metadata about resources: data files, its physical localization and possible way to access them, metadata about rights: all information related to the rights – users, their groups, access rights. metadata describing the standards for file description, e.g. Dublin Core (DC) metadata about services: data brokers, data containers Metadata Repository

  16. Enables access to physical data Data can be stored on various media types Data can be organized as files on generic filesystems, BLOBs in databases or files on data tapes All Containerspossessa uniform interface regardless of the media types they manage Container does not perform file transfers - it uses external services like ftp, https, gass, gridftp Data Container

  17. Enables access to external scientific databases Includes both Repository (listing entries, retrieving attached metadata, building queries) and Data Container (downloading files) functionality DMS treats the Proxy as a separate, independent module, that manages read-only data The PROGRESS grid-portal environment: the Proxy (named SRS Container) enables access to SRS resources Proxy (SRS Container)

  18. Web application allowing users to handle grid data management with the use of a web browser An intuitive interface allowing to execute superset of DMS services An effective way to explore huge SRS resources On-linehelp Administrative Portal

  19. Genbank Release (about 32 mln entries)Updates (about 2 mln entries) EMBL - European Molecular Biology Laboratory Release (about 42 mln entries)Updates (about 2 mln entries) PDB – Protein Data Bank SwissprotSwissprot Release, Swissprot New, SPTREMBL, REMTREMBL SRS Resources in PSNC

  20. Installation uses multiple storage resources Data access interface delivered via a common portal (srs.man.poznan.pl) Administrative tasks (retrieval and data preparation) splitted onto multiple machines Parallel data retrieving from remote resources Offline data indexing and packing on a computational machine (0.5Tb storage) Compressed online data (2*250Gb storage) SRS Installation

  21. flatfiles offindex index flatfiles SRS srs.man.poznan.pl SRS Installation - Schema bellis-e.man.poznan.pl storage 02 offline online indexing viola.man.poznan.pl storage 01

  22. Using shell-based access to the SRS SRS operations are sent via a shell command Access interface based on Web Services Internal functionality delivered using SOAP communication Data access - ftp, gsiftp, gass protocols Data are accessed using external file servers integrated with SRS module Advanced caching system Databanks and entries are cached and reused in the following user requests SRS Container

  23. Portal Interface – databanks list

  24. Portal Interface – databank content

  25. Portal Interface - searching

  26. Portal Interface – search results

  27. Portal Interface – copying entries

  28. Portal Interface – file properties

  29. Java virtual machine: recommended Java(TM) 2 Runtime Environment, Standard Edition 1.4.1 or higher. Database server: DMS is ready to cooperate with Oracle and PostgreSQL engine: Oracle - Oracle8i or higher recommended PostgreSQL - version 7.3 or higher is required with the additional extentions: chkpass and tablefunc from contrib package plpqsql support DMS Installation Requirements

  30. SRS resources can be used as input for grid jobs created, configured and submitted for execution in the grid with the use of the PROGRESS HPC Portal An example application is AminSim –aminoacid sequences similarity – developed by Prof. Jacek Blazewicz group at the Institute of Computing Science, Poznan University of Technology Usage scenario: PROGRESS HPC Portal

  31. AminSim portlet (1)

  32. AminSim portlet (2)

  33. AminSim portlet (3)

  34. AminSim portlet (4)

  35. SRS resources have been integrated with the distributed file structure of DMS and enabled for use within a grid-portal environment (PROGRESS HPC Portal) A web interface (DMS Portal) enhances the efficiency of the SRS resources exploration: fast copying interesting entries directly to the users’ home directory merging files saving files in various formats (e.g. Fasta) The universal access layer to the to the scientific databasesmay by successfully used to connect other data sources to the Data Management System Conclusions

  36. Check http://dms.progress.psnc.pl for more information about DMS Check http://dms.progress.psnc.pl/docs/demo.htm for the DMS Portal demo Check http://progress.psnc.pl for more information about PROGRESS Download it now: http://progress.psnc.pl Mail DMS team: szd@man.poznan.pl Mail PROGRESS team: progress@man.poznan.pl Contact info

More Related