310 likes | 675 Views
The Diamond storage runtime decides whether to evaluate a searchlet ... Diamond is a system that supports interactive data analysis of large complex data set ...
E N D
Diamond: A Storage Architecture for early Discard in Interactive Search Larry Huston, et al. FAST ’04 Jan. 26th, 2006 Speaker: Sehwan Lee
Contents • Introduction • Background and Motivation • Diamond Architecture • Diamond Application • Prototype Implementation • Experimental Evaluation • Related Work • Conclusion
Introduction • Goal • To enable interactive search of nonindexed data • Diamond ‘Early Discard’ technique • Focus • Pure brute-force interactive search
Background and Motivation • Limitation of Indexing • Infeasible manual indexing • High-dimensional representation • Sophisticating queries • Complicating user’s need
Background and Motivation • Important of Early Discard
Background and Motivation • Self-Tuning for Hardware Evolution • Flexibility of active disk • Well-suited for ‘early discard’ • Two mechanisms of early discard • Application generates specialized early discard code • Dynamically adapt the evaluation of early discard code • Two aspects of early discard • Adaptive partitioning of computation bet’n toe storage devices and the host computer • Dynamic ordering of search terms to minimize the total computation time
Background and Motivation • Exploiting the Structure of Search • Search tasks • Only require read access • Typically permit stored objects to be examined in any order • Efficient for parallelism • Do not require maintaining state bet’n objects • Efficient for parallelism
Diamond Architecture • Diamond Architecture • Searchlet • Contains all of the domain specific knowledge needed for early discard • Is a proxy of the application that can execute within the back end
Diamond Architecture • Searchlets • Searchlet Structure • A set of filters + some configuration state • Creating Searchlets • A domain application generates searchlets in response to a user’s query in a number of ways • Domain experts implement a library of filter functions • A domain application generates code on the fly
Diamond Architecture • Key Interfaces • Three APIs to isolate components • Searchlet API • Applications use to interact w/ Diamond • Filter API • To interact w/ the storage run-time environment • Associative DMA • Isolates the host and the storage implementations • This abstracts the transport mechanism and flow control bet’n host and storage run-time system
Diamond Architecture • Host and Storage Systems • The host system • Where the domains application executes • The storage system • Provides a generic infrastructure for searchlet execution
Diamond Applications • Suitable characteristics for Diamond application • The user is searching for specific instances of data that match a query rather than aggregate statistics about the set of matching data items • The user’s criteria for a successful match is often subjective, potentially ill-defined, and typically influenced by the partial results of the query • The mapping bet’n the user’s needs and the matching objects is too complex for it to be captured by a batch operations
Diamond Applications • SnapFind Description • Goal • To enable users to interactively search through large collection of unlabeled photographs • by quickly specifying searchlets that roughly correspond to semantic content • to create complex image queries by combining simple filters that scan images for patches containing particular color distributions, shapes or visual textures • Infeasible indexing • Different search filter at query time • High-dimensional content
Diamond Applications • SnapFind Usage Experience • Example task • Retrieve photos from an unlabeled collection based on semantic content • 2 cases using same GUI • Purely manual search • Using SnapFind
Prototype Implementation • Dynamic Partitioning of Computation • The Diamond storage runtime decides whether to evaluate a searchlet locally or at the host computer • Two methods for partitioning computation • CPU Splitting • Queue Back-Pressure
Prototype Implementation • Filter Ordering • Average time to process an object through a series of filters F0…Fn • C=c(F0)+P(F0)c(F1)+P(F1|F0)P(F0)c(F2)+P(F2|F1,F0)P(F1|F0)P(F0)c(F3)+…… • Partial Ordering • Partial ordering linear extension • Ordering Policies • Independent • Hill climbing (HC) • Best filter first (BFF)
Experimental Evaluation • Description of Searchlets • Test queries
Experimental Evaluation • Description of Searchlets • Filters
Experimental Evaluation • Disk and Host Processing Power
Experimental Evaluation • Disk and Host Processing Power
Experimental Evaluation • Impact of Dynamic Partitioning
Experimental Evaluation • Impact of Filter Ordering
Experimental Evaluation • Using Diamond on Large Datasets
Related Work • On interactive data analysis • On approximate query processing
Conclusion • Diamond is a system that supports interactive data analysis of large complex data set • To efficiently perform brute-force search the diamond architecture uses early discard to push filter processing to the edges of the system • The diamond architecture enables the system to adapt to different hardware configurations by dynamically adjusting where computation is performed