200 likes | 364 Views
NCAR storage accounting and analysis possibilities. David L. Hart, Pam Gillman, Erich Thanhardt NCAR CISL July 22, 2013 dhart@ucar.edu. Why storage accounting?. Big Data Increasing cost of storage with respect to compute NSF data management plan mandate Tools for users
E N D
NCAR storage accounting and analysis possibilities David L. Hart, Pam Gillman, Erich ThanhardtNCAR CISLJuly 22, 2013 dhart@ucar.edu
Why storage accounting? • Big Data • Increasing cost of storage with respect to compute • NSF data management plan mandate • Tools for users • Some info is better than no info • Some process is better than ad hoc fire drills • Supports allocation processes
Accounting for archive storage • NCAR has “charged” users for archive use for many years. • Archive accounting has institutional inertia • NCAR HPSS details, June-July 2013
Archive storage record • Activity date – date record was collected • Activity type – Read, Write, Storage • Unix uid • Project code – project to charge • Number of files • Bytes – read, written, or stored • Class of service – e.g., single-copy, dual-copy • DNS – of client host • Frequency – interval, in days, between accounting runs
Collecting data from HPSS • Read/write activity • Analyze logs from HSI and HTAR (since May 2013). Logs archived daily, processed weekly. • Storage activity • Weekly DB2 table scan and separate post-processing steps. • Accounting system impact • Approx. 6,000 records per week • Major accounting requirements • Use of HPSS accounting hooks to associate NCAR project code with HPSS file “account” • Accounting system and HPSS enforce requirement for every user to have a “default project” to which files will be charged if no other project provided
Accounting for disk storage • Focus on long-term project spaces, which are allocated • But mechanism captures scratch snapshots, too! • GLADE total storage, June-July 2013
Disk storage record • Event time – date record was collected • Project directory • Group — Unix group • Username • Number of files • kB used • Period — reporting interval, in days • QOS — a quality of service field (for future use)
Collecting data from GPFS • File systems don’t have concept of “project”, but GPFS has notion of “file sets” • Leverage file sets to map to project spaces • For scratch, work, home: report per-user data • Process runs weekly, provides a storage snapshot • With GPFS tools, process requires only a few minutes to complete—full file system scan not required • Accounting system impact • Approx. 4,000 records per week • Major accounting requirements • Agreements and processes between GLADE administrators and User Services about how spaces are created • Deviation would break the system
Storage growth over time (1) HPSS growth in 2013 GLADE growth in 2013
Storage growth over time (3) User reports show project by week and per-user breakdown
Aggregate behavior (1) Net growth, 3/3-4/7 — ~261 TB
Aggregate behavior (2) Data written, 3/3-4/7 — 594 TB
What is “Big Data”? Average file size vs. Total data holdings
Managing “orphaned” files • Verifying accounting records lets site operators identify files owned by inactive users or inactive projects • On July 7, HPSS accounting showed 177 users with 885 TB of “orphaned” files • Early outreach to users and project leads does translate to deletions and fewer files for whom an owner cannot be found • Users required to be “actively engaged” in the disposition of their archive holdings. www2.cisl.ucar.edu/docs/hpss/policies