120 likes | 246 Views
PAST. Processing and Storage of Time series Eleni Tzirita Zacharatou, Jasmina Malicevic, Nikolaos Kokolakis, Eric Beguet, Puneet Sharma, Saurabh Jain, Mihaela Turcu, Nicolas Tran, Thomas Mühlematter. 29. 28. 27. 26. 25. 24. 23. 0. 50. 100. 150. 200. 250. 300. 350. 400. 450. 500.
E N D
PAST Processing and Storage of Time series Eleni Tzirita Zacharatou, Jasmina Malicevic, Nikolaos Kokolakis, Eric Beguet, Puneet Sharma, Saurabh Jain, Mihaela Turcu, Nicolas Tran, Thomas Mühlematter
29 28 27 26 25 24 23 0 50 100 150 200 250 300 350 400 450 500 What are time series? 25.2250 25.2500 25.2500 25.2750 25.3250 25.3500 25.3500 25.4000 25.4000 25.3250 25.2250 25.2000 25.1750 .. .. 24.6250 24.6750 24.6750 24.6250 24.6250 24.6250 24.6750 24.7500 A time series is a collection of observations made sequentially in time They are EVERYWHERE (financial data, meteorological data…) People measure things… things change over time! Motivation Time series
Tasks Query by Content Retrieving Data of interest e.g. “Find past sales patterns that resemble last month” e.g. “List all time series with temperature value 70-80” Clustering Motivation Tasks
(Need for) Preprocessing & Transformation • Subjectivity • Different sampling rates • Noise, missing data average value of A A average value of B B • Normalization • Amplitude Scaling • Resampling • Digital Filters • DFT • Different Distance Measures TRANSFORMATIONS System Needs
(Need for) Compression & Indexing • Very Large Datasets • High-Dimensional Data COMPRESSION INDEXING TRANSFORMATIONS System Needs
System Overview • On top of Spark • Development in Scala and Java • Offline Framework • Support for: • Custom backends • Custom data types • Pluggable indexes System Overview
Piece-wise Linear Representation (PLR) • Divide the time series in a set of disjoint segments • Model each segment using regression • For each modeled segment store: • Start time, End time • Minimum value, maximum value • Model coefficients Tunable parameters such as degree N of polynomial curve and maximum Mean Absolut Error System PL Representation
Querying compressed data • Supported Queries: • Time point or range query • Value point or range query • Composite query System PL Representation - Querying
SAX Representation Cardinality Promotion {1, 1, 0, 0} => {11, 11, 01, 00} Tunable Parameters: word size & alphabet size (cardinality) System SAX Representation
Indexing SAX “Similar” Time Series Same SAX word Tunable parameter: Number of Time series in a terminal node • Approximate Search: Terminal Node with same SAX representation as the query • Exact Search: Approximate Search for pruning System Indexing SAX
Command Line Utility • Scala console tweaking • Pseudo-sql statements starting with single quote (') • Conversion to Scala • Execution • Data insertion • From CSV • scala> 'INSERT csv("path/to/file") INTO timeseries; • Using Scala Variables • scala> val dna = scala.io.Source.fromPath("path/to/dna").map({ case 'A' => 1; case 'C' => 2; .... }) • scala> 'CREATE humanDNA (encodedBase BYTE) BACKEND RowStore • scala> 'INSERT @dna INTO humanDNA • Column selection • scala> 'SELECT column1, column3 FROM timeseriesY WHERE column1 > 2 AND column1 < 3 • scala> import past.Transformations • scala> 'SELECT @Transformations.mean(columnX) FROM timeseriesY System Command Line Utility
Thank You! Past Thank You!