40 likes | 164 Views
B REAKOUT S ESSION - Deep Analytics Pipeline -. 3rd W orkshop on Big D ata Benchmarking July 16-17 Xi‘an , China. D EEP A NALYTICAL P IPELINE – F URTHER D EVELOPMENT (1). Pipeline should not be enlarged to more domains, since query types are similar Loading aspect of data:
E N D
BREAKOUT SESSION- Deep Analytics Pipeline - 3rd Workshop on Big Data Benchmarking July 16-17 Xi‘an, China
DEEP ANALYTICAL PIPELINE– FURTHER DEVELOPMENT (1) • Pipeline should not be enlarged to more domains, since query types are similar • Loading aspect of data: • Differentiation between staging servers and analytic server • Raw data has to be there, loading time should be restricted to the “batch”-processing • Sanity checks should be included to check for bottlenecks, e.g., client is not able to produce the amount of data • Pre-configuration of some parameter
DEEP ANALYTICAL PIPELINE– FURTHER DEVELOPMENT (2) • Multi-tendency is excluded from the pipeline • Stream-based queries are excluded, but stream-based loading shall be desired to ensure velocity • Reference implementations need data specifications to hook in between the stages
DEEP ANALYTICAL PIPELINE– OPEN ISSUES • How to deal with different metrics between the stages? • Different kinds of inputs for data generation?