1 / 36

Facilitating access to the scientific data service with the use of the Data Management System

Learn how the Data Management System facilitates seamless access to scientific data services, hiding complexity and enabling efficient resource exploration. Explore its functionality, architecture, and deployment in various projects, enhancing grid computing worldwide.

nicolm
Download Presentation

Facilitating access to the scientific data service with the use of the Data Management System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Facilitating access to the scientific data service with the use of the Data Management System Cezary Mazurekmazurek@man.poznan.pl

  2. Introduction Data management issues Data Management System – functionality and architecture Accessing SRS resources Conclusions Agenda

  3. PSNC was established in 1993 and is an R&D Center in: New Generation Networks POZMAN and PIONIER networks 6-NET, ATRIUM, Muppet, HPC and Grids GRIDLAB, CROSSGRID, VLAB,PROGRESS projects, Clusterix, HPCEuropa Portals and Content Management Tools Polish Educational Portal "Interkl@sa", Multimedia City Guide, Digital Library Framework,Interactive TV R&D Center

  4. Project Partners SUN Microsystems Poland PSNC IBCh Poznań Cyfronet AMM, Kraków Technical University Łódź Co-funded by The State Committee for Scientific Research (KBN) and SUN Microsystems Poland PROGRESS (1)

  5. Deployment phase (2004) Grid constructors Computational applications developers Computing portals operators Enabling access to global grid through deployment of PROGRESS open source packages PROGRESS (2)

  6. Gdańsk Wrocław PROGRESS (3) • Cluster of 80 processors • Networked Storage of 1,3 TB • Software: ORACLE, HPC Cluster Tools, Sun ONE, Sun Grid Engine, Globus

  7. Polish Optical Internet PIONIER

  8. Hiding the data management complexity from the end user Ability to use new standards defined by grid organizations Cooperation with the different kinds of applications Providing seamless access to data and information for grid computing Enabling intuitive and efficient method for resource exploration Facilitating interface to data management for administrators and scientists Data Management Issues

  9. PROGRESS

  10. PROGRESS Communication saveJob() getApplications() getTemplates() saveTaskOfJob() saveStdOfTask() submitJob() getUserJobs() getJobStatus() HPC Portal Grid Service Provider changeTaskStatus() listUserDirectory() addUserFile() submitJob() Data Management System Grid Resource Broker getUserFileLocation()

  11. PORTLETS GRID SERVICEPROVIDER WS GRID RESOURCEBROKER DATAMANAGEMENT Web Services and Progress

  12. A distributed system enabling the management of grid data files Storing files in distributed storage modules of various types: generic filesystems, archivers, relational databases Uses metadata to describe files Allows access to data banks like a mirror of Sequence Retrieval System Exposes its functionality within the Data Broker Service Data Management System

  13. Virtual file system keeping the data organized in a tree structure. Metadirectories - hierarchize other objects Metafiles - represent a logical view of computational data regardless of their physical storage’s location. DMS provides its services in a form of Web Services API to the front-end applications. DMS is a middleware system, belonging to the collective layer as well as the resource layer (Data Container), according to the grid services view. DMS Functionality

  14. Web Services interface with storing, access, describing and delivery of data. directory mgmt.: e.g. add, remove and rename directories, retrieve root and current path, change path, file mgmt.: e.g. add, remove and rename files, add, remove and retrieve physical file location, metadata mgmt.: e.g. retrieve list of schemes and attributes, assign schemes to files and edit values external datasource mgmt.: e.g. databanks content retrieving, entry resolving, databanks exploring DMS Functionality

  15. DMS Architecture

  16. Serves as an interface (Web Services) for external clients, such as the HPC Portal and the grid resource broker Mediates in the flow of all requests directed to the DMS. Authorizes the client that submitted the request Data Broker is distributed in the data management environment Data Broker

  17. Central and single point of metadata management Responsible for all metadata operations and their storage and maintenance. It stores the following sorts of information: metadata about resources: data files, its physical localization and possible way to access them, metadata about rights: all information related to the rights – users, their groups, access rights. metadata describing the standards of file description, e.g. Dublin Core (DC) metadata about services: data brokers, data containers Metadata Repository

  18. Enables access to physical data Data is arranged in Data Containers and can be stored on various media types Data can be organized as files on generic filesystems, BLOBs in databases or files on data tapes Each Containerpossessesa uniform interfaceregardles of media types which they manage Container do not perform file transfers but it uses the external services and demons, like FTP, HTTPS, GASS, GRIDftp Data Container

  19. Enables access to external scientific databases Includes both Repository (listing entries, retrieving attached metadata, building queries) and Data Container (downloading files) functionality DMS treats the Proxy as a separate, independent module, that manages read-only data Within the PROGRESS grid-portal environment the Proxy (named SRS Container) enables access to SRS resources Proxy (SRS Container)

  20. Web application letting user handle DMS through the web browser An intuitive interface allowing to execute superset of DMS services Basic and extended interface (regarding user privileges) An effective way to explore huge SRS resources Online, sensitive help Administrative Portal

  21. Sequence Retrieval System Platform for biological databases integration Delivers uniform data querying interface for resources retrieval Integration of application performing computational tasks on data stored in SRS resources SRS

  22. Genbank Release (about 32 mln of entries)Updates (about 2 mlns of entries) EMBL - European Molecular Biology Laboratory Release (about 42 mln of entries)Updates (about 2 mln of entries) PDB – Protein Data Bank SwissprotSwissprot Releas, Swissprot New, SPTREMBL, REMTREMBL SRS Resources in PSNC

  23. Installation uses different storage recources Data access interface delivered via common portal (srs.man.poznan.pl) Administrative tasks (retrieval and data preparation) splited onto different machines Parallel data retrieving from remote resources Offline data indexing and packing on computational machine (0.5Tb storage) Compressed online data (2*250Gb storage) SRS Installation

  24. flatfiles offindex index flatfiles SRS srs.man.poznan.pl SRS Installation - Schema bellis-e.man.poznan.pl storage 02 offline online indexing viola.man.poznan.pl storage 01

  25. Using shell-based access to the SRS Operations commited to execute using SRS mechanisms are send via shell command Access interface based on Web Services Internal functionality delivered using SOAP communication Data access - ftp, gsiftp, gass protocols Data are accessed with using external file servers integrated with SRS module Advanced caching system Databanks and entries are cached and reused in next user requests SRS Container

  26. Portal Interface – databanks list

  27. Portal Interface – databank content

  28. Portal Interface - searching

  29. Portal Interface – search results

  30. Portal Interface – copying entries

  31. Portal Interface – file properties

  32. Java virtual machine, recommended Java(TM) 2 Runtime Environment, Standard Edition 1.4.1 or higher. Database server. DMS is ready to cooperate with Oracle and PostgreSQL engine: Oracle - Oracle8i or higher recommended PostgreSQL - version 7.3 or higher is required with the additional extends: chkpass and tablefunc from contrib package plpqsql support DMS Installation Requirements

  33. SRS resources have been integrated with the distributed file structure of DMS A web interface enhances the efficiency of the SRS resources exploration: fast copying an interesting entries directly to the users’ home directory merging files saving files in the different format (e.g. Fasta) The universal access layer to the to the scientific databasesmay by successfully used to connect other data sources to the Data Management System (e.g. digital libraries). Conclusions

  34. Check http://dms.progress.psnc.pl for more information about DMS Download it now: http://progress.psnc.pl/English/opensource.html Mail DMS team: szd@man.poznan.pl In Closing

More Related