1 / 28

Fragmentation in Large Object Repositories

Fragmentation in Large Object Repositories. Russell Sears Catharine van Ingen CIDR 2007. This work was performed at Microsoft Research San Francisco with input from the NTFS and SQL Server teams. Clients. Object Stores. Object Stores. Object Stores. Object Stores. Object Stores.

bian
Download Presentation

Fragmentation in Large Object Repositories

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fragmentation in Large Object Repositories Russell Sears Catharine van Ingen CIDR 2007 This work was performed at Microsoft Research San Francisco with input from the NTFS and SQL Server teams

  2. Clients Object Stores Object Stores Object Stores Object Stores Object Stores Object Stores Object Stores Application Servers DB (metadata) Replication / Data scrubbing Background • Web services store large objects for users • eg: Wikipedia, Flickr, YouTube, GFS, Hotmail • Replicate BLOBs or files • No update-in-place • Benchmark before deployment • Then, encounter storage performance problems • We set out to make some sense of this

  3. Problems with partial updates • Multiple changes per application request • Atomicity (distributed transactions) • Most updates change object size • Must fragment, or relocate data • Reading / writing the entire object addresses these issues

  4. Experimental Setup • Single storage node • Compared filesystem, database • NTFS on Windows Server 2003 R2 • SQL Server 2005 beta • Repeatedly update (free, reallocate) objects • Randomly chose sizes, objects to update • Unrealistic, easy to understand • Measured throughput, fragmentation

  5. Reasoning about time • Existing metrics • Wall clock time: Requires trace to be meaningful, cannot compare different workloads • Updates per volume: Coupled to volume size Storage Age: Average number of updates per object

  6. NTFS SQL NTFS SQL Read performance 12 • Clean system • SQL good small object performance (inexpensive opens) • NTFS significantly faster with objects >>1MB • SQL degraded quickly • NTFS small object performance was low, but constant Updates per object 0 2 4 10 8 Read Throughput (MB/s) 6 4 2 0 256 KB Objects 1 MB Objects

  7. SQL Server NTFS 10MB object fragmentation • NTFS approaching asymptote • SQL Server degrades linearly • No BLOB defragmenter 40 35 30 25 20 Fragments/object 15 10 5 0 0 1 2 3 4 5 6 7 8 9 10 Storage Age

  8. Rules of Thumb • Classic pitfalls • Low free space (< 10%) • Repeated allocation and deallocation (High storage age) • One new problem • Small volumes (< 100-1000x object size) • Implicit tuning knobs • Size of write requests

  9. Append is expensive! • Neither system can take advantage of final object size during allocation • Both API’s provide “append” • Leave gaps for future appends • Place objects without knowing length • Observe same behavior with single and random object sizes

  10. Conclusions • Get/put storage is important in practice • Storage age • Metric for comparing implementations and workloads • Fragmentation behaviors vary significantly • Append leads to poor layout

  11. ----BACKUP SLIDES----

  12. Theory vs. Practice • Theory focuses on contiguous layout of objects of known size • Objects that are allocated in groups are freed in groups • Good allocation algorithms exploit this • Generally ignored for average case results • Leads to pathological behavior in some cases

  13. Small volumes • Small objects / Large volumes • Percent free space • Large objects / Small volumes • Number of free objects

  14. Efficient Get/Put • No update-in-place • Partial updates complicate apps • Objects change size • Pipeline requests • Small write buffers, I/O Parallelism Application server 4 2 3 1

  15. Lessons learned • Target systems avoid update-in-place No use for database data models • Quantified fragmentation behavior • Across implementations, workloads • Common API’s complicate allocation • Filesystem / BLOB API is too expressive

  16. Application server 3 4 2 1

  17. Example systems • SharePoint • Everything in the database, one copy per version • Wikipedia • One blob per document version; images are files • Flickr / YouTube • GFS • Scalable append; chunk data into 64MB files • Hotmail • Each mailbox is stored as a single opaque BLOB

  18. The folklore is accurate, so why do application designers… …benchmark, then deploy the “wrong” technology? …switch to the “right one” a year later? …then switch back?!? Performance problems crop up over time

  19. Conclusions • Existing systems vary widely • Measuring clean systems is inadequate, but standard practice • Support for append is expensive • Unpredictable storage is difficult to reliably scale and manage • See paper for more information about predicting and managing fragmentation in existing systems

  20. Comparing data layout strategies • Study the impact of • Volume size • Object size • Workload • Update strategies • Maintenance tasks • System implementation • Need a metric that is independent of these factors

  21. Related work • Theoretical results • Worst case performance is unacceptable • Average case good for certain workloads • Structure in deallocation requests leads to poor real-world performance • Buddy system • Place structural limitations on file layout • Bounds fragmentation, fails on large files

  22. Introduction • Content-rich web services require large, predictable and reliable storage • Characterizing fragmentation behavior • Opportunities for improvement

  23. Clients Object Stores Object Stores Object Stores Object Stores Object Stores Object Stores Object Stores Application Servers DB (metadata) Replication / Data scrubbing Data intensive web applications • Simple data model (BLOBs) • Hotmail: user mailbox • Flickr: photograph(s) • Replication • Instead of backup • Load balancing • Scalability

  24. Databases vs. Filesystems • Manageability should be primary concern • No need for advanced storage features • Disk bound • Folklore • File opens are slow • Database interfaces stream data poorly

  25. SQL Server NTFS Clean system performance 12 • Single node • Used network API’s • Random workload • Get/put one object at a time • Large objects lead to sequential I/O 10 8 6 Read throughput (MB/sec) 4 2 0 256K 512K 1M Object Size

  26. Revisiting Fragmentation • Data intensive web services • Long term predictability • Simple data model: get/put opaque objects • Performance of existing systems • Opportunities for improvement

  27. Introduction • Large object updates and web services • Replication for scalability, reliability • Get / put vs. partial updates • Storage age • Characterizing fragmentation behavior • Comparing multiple approaches • State-of-the-art approach: • Lay out data without knowing final object size • Change the interface?

More Related