1 / 20

Grids for Chemical Informatics

Explore the concept of Grids, their role in data deluge challenges, and how they support scientific endeavors through distributed services and education. Learn about applications such as Storm Forming Forecast, Data Mining On-Demand, and more.

kennon
Download Presentation

Grids for Chemical Informatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Grids for ChemicalInformatics Randall Bramley, Geoffrey Fox, Dennis Gannon, Beth Plale Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401

  2. What is a Grid? • Name borrowed from the power grid. • The concept: • A ubiquitous information & computation resource • A definition • a network of compute and data resources that has been supplemented with a layer of services that provide uniform and secure access to a set of applications of interest to a distributed community of users. • Grids may be wide-area or enterprise

  3. Storms Forming Forecast Model Streaming Observations Data Mining On-Demand Storm predictions Scientific Challenges • The current and future generations of scientific problems are: • Data Oriented • Increasingly stream based. • Often need petabyte archives • In need of on-demand computing resources • Conducted by geographically distributed teams of specialists • Who don’t want to become experts in grid computing.

  4. Information/Knowledge Grids • Distributed (10’s to 1000’s) of data sources (instruments, file systems, curated databases …) • Data Deluge: 1 (now) to 100’s petabytes/year (2012) • Moore’s law for Sensors • Possible filters assigned dynamically (on-demand) • Run image processing algorithm on telescope image • Run Gene sequencing algorithm on compiled data • Needs decision support front end with “what-if” simulations • Metadata (provenance) critical to annotate data • Integrate across experiments as in multi-wavelength astronomy Data Deluge comes from pixels/year available

  5. Internet Scale Distributed Services • Grids use Internet technology to manage sets of network connected resources • Classic Web: independent one-to-one access to individual resources • Grids integrate together and manage multiple Internet-connected resources: People, Sensors, computers, data systems • Grids are built on top of commodity web service technology with broad industry support • Organization can be explicit as in • TeraGrid which federates many supercomputers; • CrisisGrid which federates first responders, commanders, sensors, GIS, (Tsunami) simulations, science/public data • Organization can be implicit such as curated databases and simulation resources that “harmonize a community”

  6. The Architecture of Gateway Grids The Users Desktop. Grid Portal Server Gateway Services Proxy Certificate Server / vault User Metadata Catalog Application Workflow Application Deployment Application Events Resource Broker App. Resource catalogs Replica Mgmt Core Grid Services Security Services Information Services Self Management Resource Management Execution Management Data Services OGSA-like Layer Physical Resource Layer

  7. Let’s look at a few real examples (about a dozen … many more exist!)

  8. BIRN – Biomedical Information

  9. Mesoscale Meteorology NSF LEAD project - making the tools that are needed to make accurate predictions of tornados and hurricanes. - Data exploration and Grid workflow

  10. Workflow in the LEAD Grid Katrina output

  11. Renci Bio Gateway • Providing access to biotechnology tools running on a back-end Grid. • - leverage state-wide • investment in • bioinformatics • undergraduate & • graduate education, • faculty research • another portal • soon: national evolutionary synthesis center

  12. X-Ray Crystallography

  13. SERVOGrid

  14. SERVOGrid Requirements • Seamless Access to Data repositories and large scale computers • Integration of multiple data sources including sensors, databases, file systems with analysis system • Including filtered OGSA-DAI (Grid database access) • Rich meta-data generation and access with SERVOGrid specific Schema extending openGIS (Geography as a Web service) standards and using Semantic Grid • Portalswith component model for user interfaces and web control of all capabilities • Collaboration to support world-wide work • Basic Grid tools: workflow and notification • NOT metacomputing

  15. Field Trip Data Database ? GISGrid Discovery Services RepositoriesFederated Databases Streaming Data Sensors Database Sensor Grid Database Grid Research Education SERVOGrid Compute Grid Customization Services From Researchto Education Data FilterServices ResearchSimulations Analysis and VisualizationPortal EducationGrid Computer Farm Grid of Grids: Research Grid and Education Grid

  16. Integrating Archived Web Feature Services and Google Maps Google maps can be integrated with Web Feature Service Archives to filter and browse seismic records.

  17. MyGrid - Bioinformatics

  18. The Williams Workflows A B C A: Identification of overlapping sequence B: Characterisation of nucleotide sequence C: Characterisation of protein sequence

  19. BioInformatics Grid Chemical Informatics Grid Sequencing Tools Biocomplexity Simulations BIS … … HTS Tools Quantum CalculationsCIS Domain Specific Grids/Services Compute/Supercomputer Information/Knowledge Portals Services Collaboration MIS Instrument/Sensor Application Services Policy Data Access/Storage Metadata Discovery Core Low Level Grid Services Security Messaging Workflow Management Physical Network M(B,C)IS is Molecular (Bio, Chem) Information System supportingspecific metadata (CML, CellML, SBML) and physical representations

  20. Comments on Grid Components • Support GT4 and WS-I+(+); Support Java and .NET • Portals – all services will have a portlet interface • Compute Grid -- This is some sort of Condor Grid (as used by Cambridge) • Supercomputer Grid -- (extended) TeraGrid • Workflow, Metadata, Information Management – learn from Taverna, link with BPEL style workflow, link with other Semantic Grid/metadata services • Instruments – learn from CIMA/Reciprocal Net, compare with Sensors in LEAD/SERVOGrid • MIS/CIS – See if idea sensible – in any case need CML, LSID, Molecular visualization • Application Services – Need a wizard. Support “filters” (Wild) and loosely coupled simulations (Baik) • Data – Link to PubChem and Bioinformatics – link to Baik database • Discovery – Extended UDDI • Security – review any special requirements and status of PubChem, caBIG, myGrid etc, • Collaboration, Management, Messaging, Policy -- nothing special needed

More Related