90 likes | 221 Views
Networked Storage Technologies. Douglas Thain University of Wisconsin thain@cs.wisc.edu GriPhyN NSF Project Review 29-30 January 2003 Chicago. The Problem of Remote I/O. Unreliable Internet. Remote CPUs. Survive disconnections. Hide high latencies. Hide bursty throughput.
E N D
Networked Storage Technologies Douglas Thain University of Wisconsin thain@cs.wisc.edu GriPhyN NSF Project Review29-30 January 2003Chicago
The Problem of Remote I/O Unreliable Internet Remote CPUs Survive disconnections. Hide high latencies. Hide bursty throughput. Audit progressive results. Ensure consistency between job and storage. Arbitrate between users. Make it easy. Job Storage Douglas Thain, University of Wisconsin thain@cs.wisc.edu
NeST Turns Raw Storageinto a Storage Appliance Appl Appl Web Browser Admin or Owner HTTP User-Level Adapter OS Kernel Cmd Tool NFS Chirp Chirp NeST GridFTP DaP Allocable Auditable Authentic Accessible ClassAds Storage Match Maker Douglas Thain, University of Wisconsin thain@cs.wisc.edu
DaP Makes Data Transfera Managed Job Submit, Query, Remove DaP Allocation Activation NeST NeST Transfer Data Mvmt Queue Storage Storage Douglas Thain, University of Wisconsin thain@cs.wisc.edu
Kangaroo Output NeST Status and Supervision Chirp Output Policy and Control Requests Adapter Chirp Input GridFTP Transfer NeST NeST Chirp Reservation Building the Grid Remote CPUs Job DaP Storage Douglas Thain, University of Wisconsin thain@cs.wisc.edu
Ph.D. ResearchEnabled by Griphyn • NeST: Network Storage Technologies • John Bent • http://www.cs.wisc.edu/condor/nest • DaP: Data Placement Manager • Tefvik Kosar • http://www.cs.wisc.edu/condor/dap • Distributed I/O using Grid Services • Douglas Thain • http://www.cs.wisc.edu/~thain • Grid Security • Ian Alderman • Too many MS and BS students to list! Douglas Thain, University of Wisconsin thain@cs.wisc.edu
Future Work: • I/O - CPU Specialization in Workloads • Automatically provision a cluster with the correct number of storage/worker nodes. • DaP / DAG integration • Convergence of technologies for reliable data scheduling and reliable job scheduling. • Error Management • What happens when something goes wrong? Backup/retry/pause/inform? • Security • An online CA to issue task-specific certificates just-in-time for work to be done. Douglas Thain, University of Wisconsin thain@cs.wisc.edu
Publications Enabled by Griphyn • “Architectural Implications of Pipeline and Batch Sharing in Scientific Workloads”, UW-CS-TR 1463, 2003, also in review. • http://www.cs.wisc.edu/condor/doc/profiling-tr.pdf • “The Case for Sparse Files”, UW-CS-TR 1464, 2003. • http://www.cs.wisc.edu/~thain/library/sparse.pdf • “Error Management in the Pluggable File System”, UW-CS-TR 1448, 2002. • http://www.cs.wisc.edu/condor/doc/pfs-tr.pdf • “Flexibility, Manageability, and Performance in a Grid Storage Appl”, HPDC 2002. • http://www.cs.wisc.edu/condor/nest/papers/nest-hpdc02.pdf • “Error Scope on a Computational Grid: Theory and Practice”, HPDC 2002. • http://www.cs.wisc.edu/condor/doc/error-scope.pdf • “Exploiting Gray-Box Knowledge of Buffer-Cache Management”, USENIX 2002. • http://www.cs.wisc.edu/wind/Publications/dust-usenix02.pdf • “Gathering at the Well: Creating Communities for Grid I/O”, SC 2001. • http://www.cs.wisc.edu/condor/doc/community-sc2001.pdf • “The Kangaroo Approach to Data Movement on the Grid”, HPDC 2001. • http://www.cs.wisc.edu/condor/doc/kangaroo-hpdc10.pdf Douglas Thain, University of Wisconsin thain@cs.wisc.edu
The Real Value:“Why don’t you go to down to visit Fermi next Wednesday?” Douglas Thain, University of Wisconsin thain@cs.wisc.edu