FAME-DBMS: Challenges and Solutions

FAME-DBMS:Challenges and Solutions Norbert Siegmund, Syed Saif ur Rahman, Sagar Sunkle {nsiegmun,srahman,sagar.sunkle} @cs.iti.uni-magdeburg.de

Overview • Domain analysis • Requirements and challenges of FAME-DBMS • Architecture of FAME-DBMS • Syed Saif ur Rahman • Tailor-made query processing • Sagar Sunkle

Domain Analysis • Functionality provided • Customizability • Special algorithms and implementations • Special architecture • Estimated / measured footprint • Addressed embedded systems

Domain Analysis - PicoDBMS • PicoDBMS 24KB-48KB (not implemented) • Special storage formats for SmartCards

Domain Analysis - PicoDBMS • Footprint and functionality

Domain Analysis – Comet DBMS • DBMS components implemented with AOP (~20 KB) • Optional Locking, B-Tree, “HardTime” support

Domain Analysis – Cougar DBMS • Split Sensor-DBMS (relational DBMS & sensor DB) • Special architecture & schema • Extended SQL

Domain Analysis – Tiny DB • Entangled with OS (TinyOS) • Query processing for sensor networks (~64 KB) • No complete DBMS functionality

Domain Analysis • Lessons learned • Special implementations for special environments (e.g., PicoDBMS) • Different architectures for different scenarios • Limited customizability of DBMS functionality • Subset of Codd’s 9 rules in every scenario • Transactions in SmartCards • Integration in data nodes • Data security • Etc. • Re-development of existing functionality for different devices & scenarios

Requirements I • Current state • New features (functionality) introduce variability • Tailoring databases according to functional requirements of stakeholders • Is that enough? • How should different systems (e.g., embedded devices) be handled? • How should different quality requirements (e.g., maintainability) be fulfilled? • How is it possible to reduce production costs of embedded systems or to reach performance limits?

Requirements II • Environments • Different binary size constraints • Different processing power • Different power consumption • Economic requirements • Different response time (e.g., e-commerce) • Different code quality (e.g., for reducing efforts of software evolution) • Different significance of reliability, etc.

Requirements III • Missing customizability towards non-functional requirements • Multiple implementations of one defined functionality • For example, fast sorting vs. power saving sorting vs. minimal footprint sorting • Need extended tailoring of DBMS • Tailoring functionality • Tailoring non-functional properties of a DBMS

FAME-DBMS • FAME-DBMS • Family of Embedded DataBase Management Systems • Highly customizable data management solution for extreme resource constrained systems • Based on software product line approach • Implemented with FeatureC++ • Extension of C++ to support feature-oriented programming • Funded by DFG (German Research Foundation)

Challenges I • Resource constrained systems require special implementations • Support only restricted binary size • Sensors below 100 KBytes • Data Nodes under 200 Kbytes • PDAs 1MB – 1 GB • Power consumption • Batteries • Wireless communication • Inaccessible locations • Highly constraint main memory size • Minimum usage of programming stack

Challenges II • Hardware and OS diversity • Tailor-made solutions for storing and loading data • Different communication mechanisms • Domain Requirements • Data storage • Sensor (tuple) vs. Data Nodes (tables) vs. PDAs (database) • Data access • Sensor (get) vs. Data Nodes (get,put,delete,update,aggregate) vs. PDAs (SQL variants) • Query execution and optimization • Sensor (function calls) vs. Data Nodes (indexes) vs. PDAs (query optimization)

Challenges III • Handling the variability • Re-design of implementation • Exponential growth of feature interactions • Compatibility to multiple OS • Increasing complexity of configuration process • Combining multiple software product lines • FAME-DBMS and OS (e.g., Ciao) • FAME-DBMS and client product line

Next Presentation • Solution: • Using software product line approach to support • Feature-oriented implementation (functional requirements) • Alternative implementations (non-functional requirements) • How is it implemented? • Next talk of Syed Saif ur Rahman

Development Approach Traditional DBMS Developed decade ago To handle large amount of data Evolved over time Full of layers/functionalities coupled with monolithic engines Embedded DBMS are built Slimmed down version of large DBMS Ground up We reject policy of slimmed down version for embedded DBMS

Introduction Basic database management system Simple access API: put, get, delete of (key, value) pairs All data in one „table“, no columns → design for extensibility Using feature-oriented programming Build software by composing features that are expressed in a modular way Feature: basic block of user-relevant functionality Platform Windows NutOS (BTnode) Linux/Any platform supporting C++

Why FOP in Embedded Domain To support Small footprint Multi-platform support To manage Complexity To achieve Re-configurability

System Characteristics Feature-oriented implementation Highly re-configurable Low complexity Reduced footprint Multi-platform support API based access

Embedded System: BTnode Developed at ETH Zurich Microcontroller: Atmel ATmega 128L (8 MHz @ 8 MIPS) Memories: 64 +180 Kbyte RAM, 128 Kbyte FLASH ROM, 4 Kbyte EEPROM Support for bluetooth and low-power radio PC connectivity via Serial/Com over USB Terminal input/output via standard C functions printf/scanf

High-Level System Design OS-Abstraction Layer Hides platform dependent implementation Buffer Layer Page buffering Management of used and free pages Access/Storage Layer Provides API based access Un-indexed sorted file implementation B+-Tree Index Page Implementation

FAME-DBMS - Feature Diagram

Feature Diagram (Minimal configurations)

Binary Size Results Windows: Un-indexed: Binary size = 17 KB Indexed: Binary size = 19 KB Linux: Un-indexed: Binary size = 47 KB Indexed: Binary size = 63 KB NutOS (BTnode): Un-indexed: Binary size = 40 KB Indexed: Binary size = 41 KB

Problems Identified Lack of support for C++ usage in embedded environment Existing sample codes for BTnode only available for C Different behavior of code in embedded environment and on different operating systems Consideration of embedded environment constructs in code Limited stack size of BTnode

Current State API based access Supports B+-Tree index Single table database Currently supports three platforms Windows Linux NutOS (BTnode) Records stored as key-value pairs Alternative page replacement strategies LFU LRU

Future Directions Development in progress Multiple tables Multiple columns Transaction manager Feature-oriented query processing support Next presentation by Sagar Sunkle Planned extensions Features for distributed DBMS Recovery manager More Indexes Alternative implementations

Feature-Oriented Query Processing Earlier – “Generating Highly Customizable SQL Parsers” SQL Evolution Simple data retrieval >> Call level Interface (ODBC and related) >> Procedural functionality (PL/SQL, Transact-SQL)>> Embed SQL in mainstream languages(e.g., Java) >> Other way round, call Java methods from SQL applications (JRT) >> XML Further Evolution Support XQuery/XPath/RDF/Semantic Web… SQL:2003 is already old.

Managing SQL Functionality Featuresto the rescue Feature-oriented development of SQL engine Tackle each important issue separately Parsing SQL queries Semantic analysis of SQL queries Following SPLE guidelines, anticipate change and provide for customizability Already we’ve created customizable SQL parsers SQL grammar is really complex, with ≈1800 productions in core SQL alone Vote for SQL:2008 standard specifications is already out.

Feature-Oriented Decomposition of SQL • SQL Foundation further decomposed to various SQL statement classes • Data manipulation statements • Data definition statements • Query expressions etc. • Continue decomposing SQL to finer levels of granularity – Individual statements and clauses

Decomposing SELECT Statement for Parsing Select ColumnA from TableB SQL Features SQL Subgrammars

Feature Composition for Parsing Compose grammars representing features Obtain grammar representing feature at higher level Challenges Composition depends on type of grammars (LL, LR/LALR) Abstract Syntax Tree generation – another important factor SQL:2003 standards are reference documents; every vendor has its own syntax for SQL Different domains ≈ additional syntax Sensor networks >> Select x from a Duration b Every c This is different from the SQL:2003 standard!!

How About Query Processing ? Apply feature concept to query processing >> “tailor-made Query Processing” But query processing differs substantially for Standard and Distributed databases Embedded devices and Sensor networks We must abstract from various query processing techniques to obtain a feature model

Top Level Feature Diagram We already have the parsing component Query tree rewrite consist of translating/transforming the original tree to relational expression/algebra forms Optimization consists of establishing search space for execution plans via cost estimation Code generation expresses the final execution plan in executable code

Further Decomposition

Challenges and Benefits Challenges Extremely intricate process to decompose, mandatory/optional nature of features depends on standard/distributed/embedded query processing Many internal dependencies are yet to be resolved Subject to same constraints when implemented for FAME-DBMS Benefits Customizability in all aspects of query processing Selection of specific rewrite/simplification rules Support for nested and other specific types of queries Pluggable heuristics Domain specific optimizations Pluggable types

Future Work Using JastAdd extensible compiler system to implement query processing modularly Implementing a coarse grained query processor to begin with Create the query processing component for FAME-DBMS

Conclusion • Domain analysis • Current state of embedded DBMS • Requirements and challenges • Constraints in embedded devices • Granularity in database functionality • Handling variability • Implementation • Step-wise development of DBMS functionality • Extensibility of underlying architecture • Query processing • SQL decomposition • Tailor-made query engine

Future Work • Integrating software product lines into FAME-DBMS • LinkedList SPL • B-Tree SPL (B-Tree, B+-Tree, etc.) • Communication SPL (decryption, encryption, communication types) • SQL SPL • Collaboration with 3rd party SPLs • OS SPL • Client SPL • Extension to data description • Feature-oriented ER-diagram

FAME-DBMS: Challenges and Solutions