240 likes | 267 Views
Debellor Data Mining Platform with Stream Architecture. Marcin Wojnarski. Warsaw University, Poland. Outline. Debellor – data mining platform Motivation Main features Architecture: Cell data streaming multi-threading A vailable in ver . 0. 6 Future releases Summary. Debellor.
E N D
DebellorData Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland
Outline • Debellor – data mining platform • Motivation • Main features • Architecture: • Cell • data streaming • multi-threading • Available in ver. 0.6 • Future releases • Summary
Debellor • Language:Java • Licence:open source (GPL) • Download:www.debellor.org • Debello – to conquer (latin).Debellor – conqueror of data
Debellor – data mining platform Rseslib LibSVM Debellor Weka TA-Lib own… own…
Motivation Demand for more complex algorithms. Necessity to combine elementary algorithms.
Visualize Load Preprocess Preprocess Predict Save Load Motivation • Data Processing Network (DPN)
Classifier A Classifier B Voting Classifier C Motivation • Committee of algorithms
Motivation • Nested algorithms RBF neural network K-means
Requirements Versatile Efficient Simple
Features of Debellor • All types of data processing algorithms • Extendible data types • Stream architecture large data sets • Multi-threading • Immutability of data objects safety
Algorithm= Cell Cell cell = new RseslibClassifier("C45"); cell.set("pruning", "true"); cell
Cell – data source cell.open(); Sample s1 = cell.next(), s2 = cell.next(), ... cell.close(); cell
Cell – data receiver cell.setSource(anotherCell); anotherCell cell
Trainable Cell cell.setSource(…); cell.learn(); EMPTY cell TRAINED cell
A B A B Data Streaming BATCH STREAM It’s the cell who is responsible for asking for data
Benefits of streaming training of k-means X X crash!
Multi-threading Thread_1 A B
Multi-threading A.newThread(); Thread_2 Thread_1 A B
Available in version 0.6 • Rseslib algorithms: • classifiers (~20 algorithms) • Weka algorithms: • ARFF reader • classifiers (~60) • filters (47) • Debellor algorithms: • Train&Test evaluation • k-means for large data (stream-based) • Data types: • numeric andsymbolic features • vectors of features, vectors of vectors of …
Future releases • Multi-input & multi-output cells • Composite cells (e.g. meta-learning) • Serialization and copying • …
Summary • Platform • Stream architecture • Extendible • Multi-threaded • Weka & Rseslib partially integrated
Home www.debellor.org