430 likes | 574 Views
Or A talk on Data Grids and DGL. A Data Storage Language for the Requirements of Rebels and Misfits. Arun Jagatheesan San Diego Supercomputer Center University of California, San Diego. HPTS Workshop Asilomar, California, 25-28 September 2005.
E N D
Or A talk on Data Grids and DGL A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego Supercomputer Center University of California, San Diego HPTS Workshop Asilomar, California, 25-28 September 2005
He has 44 slides and 20 minutes. No infotainment slides either – Boring! Talk Outline • “Next Hype in Grids” • My belief system before we begin • Meet my friends – Rebels and Misfits • File Systems, Databases, Datagrids • Mapping physical data to logical view • Mapping physical data and storage to logical view • SRB Statistics • Mapping physical data, storage and processes to logical view • Data Grid Language • Conclusion • What Now = work and sacrifices; What Next = Vision
Disclaimer and Warning • My own opinion or thoughts • Arun says so… (can be wrong?) • Based on my current knowledge and understanding • On September 2005 – current knowledge and level of understanding (can change?) • My belief system • I believe in Data Grids for Inter/Intra/Multi-Organizational Unstructured Data Management (biased ?) • My belief might not be in sync with your belief, but it can co-exist with your favorite technology
Meet my friends – Rebels and Misfits • Esoteric Requirements from “High-end” users • To keep them alive, they need more… more of every thing • Requirements not broadly felt or required in industry • They push the existing technology to the limits • From the existing technology’s perspective… • These folks are nuts! • The existing technology was not designed for these requirements • My friends become rebels or misfits from the existing technology’s perspective
Talk Outline • “Next Hype in Grids” • My belief system before we begin • Meet my friends – Rebels and Misfits • File Systems, Databases, Datagrids • Mapping physical data to logical view • Mapping physical data and storage to logical view • SRB Statistics • Mapping physical data, storage and processes to logical view • Data Grid Language • Conclusion • What Now = work and sacrifices; What Next = Vision
Mapping physical data to logical view Hierarchical view, independent of network, disk, sector, track, fragments Rule : Storage Abstraction – Hide storage resources
Mapping physical data to logical view Relational view (assume its a database), independent of network, disk, sector, track, fragments Thanks to rebels and misfits in Airline industry who wanted transactional capabilities
Talk Outline • “Next Hype in Grids” • My belief system before we begin • Meet my friends – Rebels and Misfits • File Systems, Databases, Datagrids • Mapping physical data to logical view • Mapping physical data and storage to logical view • SRB Statistics • Mapping physical data, storage and processes to logical view • Data Grid Language • Conclusion • What Now = work and sacrifices; What Next = Vision
NIH BIRN SRB Data Grid • Biomedical Informatics Research Network • Access and analyze biomedical image data • Data resources distributed throughout the country • Medical schools and research centers across the US • Stable high performance grid based environment • Coordinate data sharing • Federate collections • Support data mining and analysis
Mapping distributed data & storage to logical view 25 Universities or Research Hospitals, Multiple heterogeneous storage resources
Approach we have taken in Data Grids • Logical Schema (view) is independent of physical schema • Just like databases or even file systems • Physical Resources are provided in the form of logical resources in the logical view • This is very different from databases (may be similar to tablespaces) • A database is used for mapping • Data path, network, access permissions, meta data, storage type, logical storage resource, physical storage resources • Used for digital libraries, persistent archives and data grids
GRP /txt3.txt Data Grid Resource Providers Grid Resource Providers (GRP) providing content and/or storage GRP
GRP /txt3.txt Data Grid Administrative Domain • Administrative domain with one or more Grid Resource Providers • Could include their data centers Research Lab GRP
GRP GRP GRP GRP GRP GRP GRP /txt3.txt /…/text1.txt /…//text2.txt Data Grid Administrative domains University data + storage (10) Storage-R-Us Resource Providers data + storage (50) Research lab data + storage (40) GRP
GRP GRP GRP GRP GRP GRP GRP /txt3.txt /…/text1.txt /…//text2.txt Data Grid: Logical view of data & resources /home/arun.sdsc/exp1 /home/arun.sdsc/exp1/text1.txt /home/arun.sdsc/exp1/text2.txt /home/arun.sdsc/exp1/text3.txt data + storage (100) Logical Namespace (Need not be same as physical view of resources ) University data + storage (10) Storage-R-Us Resource Providers data + storage (50) Research Lab data + storage (40) GRP
SDSC SRB User Community (Major US) • National Science Digital Library (NSDL) • National Optical Astronomy Observatory (NOAO) • ROADNet • Purdue University • SCCOOS, USA • Scientific Rich Media Archive • Salk Institute • Strand Map Service, USA • UC Berkeley Library • UCSD Library • University of Houston • Persistent Archives Test bed • University of Wisconsin, Madison • WebBase, Stanford University • Yale University Library • BaBar, Stanford Linear Accelerator Center (SLAC) • California Digital Library (CDL) • Center for Integrated Space Weather Modeling (CISM) • CVC, Visualization Portal • LDC Data Storage • NIH Bio Informatics Research Network (BIRN) • NSF Southern California Earthquake Center (SCEC) • National Archives and Records Administration (NARA) • National Aeronautics and Space Administration Centers (NASA) • National Virtual Observatory (NVO) • Npackage, NSF Middleware Initiative (NMI)
Academia Sinica, Taiwan Australian National University Bio-Lab, University of Genoa, Italy Council for the Central Laboratory of the Research Councils (CCLRC), UK CC-IN2P3, France Distributed Framework, Singapore Distributed Aircraft Maintenance Environment (DAME), UK eMinerals Project, UK eScience, Belfast Center Fraunhofer ITWM, Germany High Energy Accelerator Organization, KEK, Japan K* Grid Computing, Korea KEK Computing Center, Japan Lyon, France NorGrid, Norway Nanyang Data Grid, Singapore NCHC, Taiwan Queensland University of Technology (QUT), Australia Rutherford Appleton Laboratory (RAL), UK T-Systems, Germany UK eScience Project, UK UniGrid, Poland UMK, Poland Virtual Laboratory for eScience, Netherlands SDSC SRB User Community
Total data brokered by SDSC SRB 358 TB 324 TB 682 TB
Talk Outline • “Next Hype in Grids” • My belief system before we begin • Meet my friends – Rebels and Misfits • File Systems, Databases, Datagrids • Mapping physical data to logical view • Mapping physical data and storage to logical view • SRB Statistics • Mapping physical data, storage and processes to logical view • Data Grid Language • Conclusion • What Now = work and sacrifices; What Next = Vision
Mapping distributed data, storage and processes to logical view
Long-run Processes in Data Grid • Data Grid ILM • Data Grid Triggers • Data Gridflows
Data Grid (Enterprise Utility) Physical Resources managed by autonomous administrative domains of the same enterprise (ABCZ.com) 3rd Party IT Department US IT Department Asia ABCZ.com US Data center ABCZ.com Asia
Data Grid (Enterprise Utility) Each project has a data grid instance consisting of Logical Resources with different SLAs offered by IT department Project 1 Project 2 3rd Party IT Department US IT Department Asia ABCZ.com US Data center ABCZ.com Asia
Change is Constant • Changes in access patterns • Based on number of users accessing a data • Domains which want to access data • Data Value • The value of data set (collections?) for a particular domain based on it business model and users’ access patterns • Each domain will have a different value based on its users and its role in a data grid
“Data Value” based on users When more users access a project’ data, its data value increases, move that data to a faster storage type Project1 Project2 Project3 Project4 3rd Party IT Department US IT Department Asia ABCZ.com US Data center ABCZ.com Asia
“Data Value” based on domain When more users from the same domain access the data, the data value for that particular data in that particular domain increases, so replicate the data to resources in that domain. (converse is also true) Project1 Project2 Project3 Project4 3rd Party IT Department US IT Department Asia ABCZ.com US Data center ABCZ.com Asia
“Data Value” based on role The 3rd party data center – no users who use data, but is interested in having replica of any data (or deleted data) for long term preservation Project1 Project2 Project3 Project4 3rd Party IT Department US IT Department Asia ABCZ.com US Data center ABCZ.com Asia
Data Grid ILM • ILM = Information Lifecycle Management (Sales Jargon) • Dynamic re-orientation of data placement and data retention policies (rules) • Based on “business value of data” and storage cost • HSM = Hierarchical Storage Management, based on “data freshness”. ILM goes one step further • Applying this concept on Data Grid, very tricky as different autonomous domains have different business rules
Data Grid Triggers • Similar to triggers in databases • Based on ECA concepts • Event • Condition • Action • Example • Event = Insert new file in collection (“/ourProject/data”) • Condition = (color= “blue” && galaxy = “Andromedia”) • Action = Run ( selectiveDataReplicator.dgl )
Talk Outline • “Next Hype in Grids” • My belief system before we begin • Meet my friends – Rebels and Misfits • File Systems, Databases, Datagrids • Mapping physical data to logical view • Mapping physical data and storage to logical view • SRB Statistics • Mapping physical data, storage and processes to logical view • Data Grid Language • Conclusion • What Now = work and sacrifices; What Next = Vision
Data Grid Language • Requirement • Data Grid ILM process • The long run process that has to be run is described in DGL • Data Grid Triggers • Action part of the ECA (Event-Condition-Action) logic • Data Gridflows • Step by step execution of long run process on Data Grid • Analogy of SQL in relational databases • Long-run procedures stored and executed in Data Grid it self • Captures the “Infrastructure Execution Logic”
DGL Request Annotations about the Data Grid Request Can be either a Flow or a Status Query
DGL Requests (2 types) • Data Grid Flow • An XML Structure that describes the execution logic, associated procedural rules and DGL variables. Can be synchronous or asynchronous flow • Status Query • An XML Structure used to query the execution status any gridflow or a sub-flow at any granular level. Status Queries can be made for both synchronous and asynchronous flows
Flow Scoped Variables that can control the flow Logic used by the sub-members Sub-members that are the real execution statements
DGL-Response Responses can be synchronous or asynchronous
Talk Outline • “Next Hype in Grids” • My belief system before we begin • Meet my friends – Rebels and Misfits • File Systems, Databases, Datagrids • Mapping physical data to logical view • Mapping physical data and storage to logical view • SRB Statistics • Mapping physical data, storage and processes to logical view • Data Grid Language • Conclusion • What Now = work and sacrifices; What Next = Vision
Conclusion • Data Grids are for real – they manage Inter/Intra/Multi-organizational unstructured data (files, streams, …) • Data Grids extend the database concepts and internally use a database • A language like Data Grid Language mentioned here is necessary for the proliferation and automation of Data Grid Management Systems (DGMS) • Reference: Paper in VLDB Workshop on Data Management in Grids
We are SDSC SRB Arun is here! - Shameless Self promotion Not in picture: Many students
Additional Thanks (Ignorance is a bliss) • My Advisor: “You already graduated, and have a job at a research firm. Now why are writing to MS Research? Whom did you write to?” • Me: “I wrote to two people. The first person works on social communities, we can use service brokering for them. I have not got any response from him. But there is another person who did respond. His last name is of the color “Gray” and his web page is very cheesy with music in the background. I guess he does not do much computer science – he works with astronomers.
Contact Info Arun Jagatheesan arun@sdsc.edu Or srb@sdsc.edu http://www.sdsc.edu/srb/