1 / 13

Storage, Data Movement, Grid, Network subgroup

This report outlines the workshop's key points on dynamic data storage, robust data movers, and dataflow automation in large-scale environments. It covers placement strategies, quality of service, access controls, and future data management challenges including costs, gaps, and priorities.

olivial
Download Presentation

Storage, Data Movement, Grid, Network subgroup

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Storage, Data Movement, Grid, Network subgroup DOE Data Management WorkshopDay 1, 5-22-04

  2. Plan • Outline of the report • Additions refinement and corrections • Costs and gaps • Classification into development hardening deployment

  3. Storage, Data Movement, Grid, Network (initial) • Dynamic Data Storage and Caching • Robust Terrabyte Scale data movers • Dataflow automation between components • Multi-resolution Data movement

  4. The whole system environment in the large (even across a WAN) • Placement depending on • Separate (more abstract more high level) • Mechanism (apropos to layering) • Policy (apropos to layering) • Includes “Storage Management “ • Placement QoS and Qos derived from Policy • Management of replicas • Access • Robust, performant, at large volumes of data. (x500-x1000 in 5-10yrs). • N.b faster than evolution of disk speed • Dynamic Data Storage and Caching • Includes pre-staging. • Supporting for sending the function and’/or query to the data. • Access QoS • Security, authorization authentication and access control. • Dataflow automation between components • E.g apropos to workflows, and systematic integration.

  5. Specialized specific needs • Multi-resolution Data movement • Fine-grained object access and latencies

  6. Gaps costs, priorities. • Cost • $ o >100,000 • $$ o >1,000,000 • $$$ o >10,000,000 • Priority – low med high • High – barrier to Science • Med – substantial cost or waste • Low – annoying • Type of work • RD, HP, DS

  7. Gaps, Costs, Priorities • Placement depending on…. $$$, H, RD,HP,DS • Storage Management (storage space availability, quality, etc) • Permanence at the archival scale • Investigation of how to do this apropos to Scientific Storage syst. • Analogs to industry – information life cycle management • Appropriate mix of Exposed interfaces and hints with a preference for standard interfaces (as opposed to parochial, per-system interfaces) • Automatic and manual configurations need to be investigated • Including hints about future accesses.

  8. Physical Considerations • How to deal with the increase of capacity per device. $$, H ,RD?,DS? • No aspects of performance expand with Moore’s law • Possibly mitigated by placement strategy: • Mixed “temperature” on the same spindle

  9. Gaps, Costs, and Priorities • Management of replicas ($$; H; RD, HP, DS) • Movement of files • Movement of namespaces. • Less-than-whole-file level replication. • Consistency of replicated files • Write once (immutable file) case is an important use case. • Investigation of utility of mutable files • Trade off of version management v.s. mutable files

  10. Gaps, Costs, and Priorities • Access (movement) ($$$; H, RD HP DS) • Access requirements are increasing faster than evolution of disk speed. • Exploitation of IP and non-IP based networking • Access contention on physically large volume. • Latency v.s. small grained access. • Investigate supporting sending the function and’/or query to the data. • Investigate supporting virtual data techniques • Investigation of choice of copies and choice of path • Investigation of where to put compression in system architectures.

  11. Gaps costs…. • Security, authorization authentication and access control. ($, M, RD, DS) • Investigate expression of access control • And how it moves with the data. • Dataflow automation between components ($$; H; RD, hs, DS) • API for wide area distributed computing, exposing as apropos many items mentioned • Scheduling., access optimization analogous to query optimization.

  12. Gaps Costs Priorities • Multi-resolution Data movement • Restricted to framework and not solving specific problems ($, ?, ?)? • Important use case for Office of Science • Investigate if a special case of moving functions to the data. (appropriate framework)

  13. Grid

More Related