1 / 23

A Software Architecture for Highly Data-Intensive Systems

A Software Architecture for Highly Data-Intensive Systems. Chris A. Mattmann mattmann@usc.edu USC Center for Software Engineering Annual Research Review March 2004. Special thanks to Dan Crichton, Steve Hughes, and Sean Kelly for some of the slides!. Overview. Motivation Problem Statement

beth
Download Presentation

A Software Architecture for Highly Data-Intensive Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Software Architecture for Highly Data-Intensive Systems Chris A. Mattmann mattmann@usc.edu USC Center for Software Engineering Annual Research Review March 2004 Special thanks to Dan Crichton, Steve Hughes, and Sean Kelly for some of the slides!

  2. Overview • Motivation • Problem Statement • OODT: A Software Architecture and Middleware for Data-Intensive Systems • Evaluation: Science Problems • Planetary Science • Cancer Research • Conclusion

  3. Motivation

  4. Problem Statement • Information Integration in Data-Intensive Systems • Needed to support data access, distribution, processing and retrieval across existing heterogeneous data sources • NASA’s Planetary Data System • NCI’s Early Detection Research Network • Software and Techniques exist to perform Information Integration • But….. • No Software Re-use • No Design Methods to start from • No mapping of integration techniques to software components, interaction mechanisms, or arrangements of components • Lack of Re-use and software standards for information integration in data-intensive systems has forced systems to be “built from scratch” • Little or no interoperability with other software systems • Programmer almost always “in the loop” • New GDS proposal accompanies most new NASA mission proposals

  5. Our Approach • A Software Architecture for Data-Intensive Systems • Data Architecture • Data Dictionary • Resource Profiles • Software Architecture • Components: Product Servers, Profile Servers, Query Servers • Connector: Messaging Layer • Configurations of Product/Profile/Query Servers • ..and a middleware implementation based on the software architecture • Middleware leverages existing distributed object middleware frameworks such as CORBA, RMI • We’re currently working on a SOAP version • Built and maintained at the Jet Propulsion Laboratory • Yes, the Mars folks • Architecture+middleware = OODT (Object Oriented Data Technology) • Middleware being developed at JPL • Architecture being formalized at USC-CSE

  6. Data Dictionary • Common Data Model containing • Data Elements which the user is interested in querying for • Data Elements which the user would like to retrieve • Challenge: • Integrate data sources linked in by exploiting the Data Dictionary structure • Map common data model to data source models across data-intensive system • Use a common data element structure • ISO-11179 Specification and Standardization of Data Elements • Handles the integration of data models across the system, but still need to integrate software interfaces

  7. Resource Profiles • Provides mechanisms for describing data systems, data products, etc including • Common data attributes using Dublin Core (I.e. Title, Author, Subject) data elements to describe electronic resources • Mechanisms for describing where the data is located and how to access it • Domain data elements that are useful for describing the product (i.e. TARGET_NAME, MISSION_NAME, INSTUMENT_NAME, etc) • Enables “search and retrieval” of distributed data products • Searches to a Profile Server yields information regarding the characteristics of distributed resources (i.e. descriptive information about the product, access information, etc)

  8. Resource Profiles Example • “country = US and windspeed > 120” <profile>… <resAttributes>… <resLocation>urn:eda:rmi:Western… <profileElement> <elemName>country</elemName>… <elemValue>US</elemValue>… <profileElement> <elemName>state</elemName>… <elemValue>WA</elemValue> <elemValue>CA</elemValue>… <profileElement> <elemName>windspeed</elemName>… <elemMinValue>3</elemMinValue> <elemMaxValue>146</elemMaxValue>… <profile>… <resAttributes>… <resLocation>urn:eda:rmi:Southern… <profileElement> <elemName>country</elemName>… <elemValue>US</elemValue>… <profileElement> <elemName>state</elemName>… <elemValue>LA</elemValue> <elemValue>TX</elemValue>… <profileElement> <elemName>windspeed</elemName>… <elemMinValue>1</elemMinValue> <elemMaxValue>89</elemMaxValue>… Matches!

  9. Components • Product Server • Responsible for abstracting heterogeneous data source interfaces • Attach a Product Server to each data source that is integrated • Provides a common query interface across heterogeneous data sources • Profile Server • Describe data resources using resource profiles • Allow data resources to be discovered and located at query-time • Query Server • Tie it all together • Uses Profile Servers to discover data resources which could potentially satisfy a query • Queries discovered data resources (such as Product Servers) and collects obtained data products to return to the user

  10. Connectors • Messaging Layer • Each OODT component registers itself with a Component Registry • Allows Components to define and provide services • Components defined by unique URNs • Transfers OODT Query Object containing • OODT Style Query • (Keyword = Value) predicates joined by logical operators (AND, OR, etc) • The result list to be populated

  11. Configurations: Example

  12. Configurations: Example (2)

  13. Configurations: Example (3)

  14. Planetary Science • Planetary Data System • Official NASA “Active” Archive for all Planetary Data • Data ingestion required as part of Announcement of Opportunity (AO) for a mission • 9 Nodes with data located at discipline sites • Common Data Architecture • Different data systems located at the sites • Prior to October 2002, no ability to find and share data between PDS nodes • Data distribution via CD ROM • Limited electronic distribution

  15. OODT PDS Deployment

  16. Early Detection Research Network • OODT’s success has lead to interagency agreements with both NIH and NCI • OODT has provided the NCI with a bioinformatics infrastructure for sharing data across the nation • Currently deployed at 10 of 31 NCI Research Institutions for the Early Detection Research Network (EDRN) • Providing real-time access to distributed, heterogeneous databases • Created a national virtual repository for biospecimens (now a NCI Director Initiative) • Now integrating new datasets: validation studies, images, biomarkers, etc • Meet Federal security regulations • Operational September 2002 • Same core software framework as deployed in planetary, earth and engineering

  17. OODT EDRN Deployment

  18. Conclusion • OODT is….. • A novel software architecture to describe data intensive systems • integration, search, retrieval and discovery of heterogeneous data stored in heterogeneous domain data sources • A reference implementation of above software architecture • Java-based middleware • C++. Perl, Python, PHP Client APIs • A process for annotating and creating standard metadata models to describe heterogeneous data based on data standards • Dublin Core • ISO-11179

  19. Referred Papers • Mattmann C, Ramirez P, Crichton D, and Hughes, J.S. Packaging Data Products using Data Grid Middleware for Deep Space Mission Systems. Accepted for Publication at the 8th International Conference on Space Operations, Montreal, Canada, 2004. • Mattmann C, Freeborn D, Crichton D. Towards a Distributed Information Architecture for Avionics Data. In Proceedings of the 2nd International IADIS Conference on the World-Wide-Web and Internet, Volume II, pp 829-832. Algarve, Portugal, 2003. • Crichton D, Hughes, J.S., Kelly, S. A Science Data System Architecture for Information Retrieval. Clustering and Information Retrieval. Kluwer Academic Publishers. December 2003.  - Book Chapter on OODT • Crichton D, Hughes, J.S., Kelly, S, Rameriz, P. A Component Framework Supporting Peer Services for Space Data Management. 2002 IEEE Aerospace Conference. Big Sky, Montana. March 2002.  • Crichton D, Downing G, Hughes J. S, Kincaid H, Srivistava S. An Interoperable Data Architecture for Data Exchange in a Biomedical Research Network. 14th IEEE Symposium on Computer-Based Medical Systems. July 2001.   • Crichton, D., Hughes J. S, Hardman S, Kelly S. A Distributed Component Framework for Data Product Interoperability. 17th CODATA International Conference, Baveno, Italy. October 2000. • Crichton, D., Hughes J. S, Kelly S, Hyon J. Science Search and Retrieval using XML. Second National Conference on Scientific and Technical Data, Washington D.C., National Academy of Sciences. March 2000.

  20. Questions? • Contacts • OODT Website: http://oodt.jpl.nasa.gov • Principal Investigator • Dan Crichton (Dan.Crichton@jpl.nasa.gov) • Co-Investigator • Steve Hughes (Steve.Hughes@jpl.nasa.gov) • Programmer/Research Grunt • Me (chris.mattmann@jpl.nasa.gov) • Thanks for your attention!

  21. Backup Slides

  22. Resource Profiles Example • “country = US and windspeed > 120” <profile>… <resAttributes>… <resLocation>urn:eda:rmi:Western… <profileElement> <elemName>country</elemName>… <elemValue>US</elemValue>… <profileElement> <elemName>state</elemName>… <elemValue>WA</elemValue> <elemValue>CA</elemValue>… <profileElement> <elemName>windspeed</elemName>… <elemMinValue>3</elemMinValue> <elemMaxValue>146</elemMaxValue>… <profile>… <resAttributes>… <resLocation>urn:eda:rmi:Southern… <profileElement> <elemName>country</elemName>… <elemValue>US</elemValue>… <profileElement> <elemName>state</elemName>… <elemValue>LA</elemValue> <elemValue>TX</elemValue>… <profileElement> <elemName>windspeed</elemName>… <elemMinValue>1</elemMinValue> <elemMaxValue>89</elemMaxValue>… Matches!

  23. Object Oriented Data Technology • Object-Oriented Data Technology (OODT) • Funded in 1998 by NASA’s Office of Space Science to develop a national software framework for sharing data across heterogeneous, distributed data repositories • Develop… • a common data and software framework to enable data sharing across multiple science and engineering disciplines • A reusable software architecture across data management projects • Reusable software components with common interfaces • Interfaces to enable new components to be plugged in • Mechanism to wrap legacy data system components with minimal impact • OODT should provide.. • Science domain independence (use in engineering, science and biomedicine) • Data location independence (describe what you want, not how/where to get it

More Related