610 likes | 750 Views
A Key-Value-based Persistence Model for Sensor Networks. By: Marcello Alves de Sales Junior Masters of Science in Computer Science Advisor: Prof. Arno Puder , Ph.D. Committee Chair: Prof. Marguerite Murphy, Ph.D. Department of Computer Science. Outline.
E N D
A Key-Value-based Persistence Model for Sensor Networks By: Marcello Alves de Sales Junior Masters of Science in Computer Science Advisor: Prof. Arno Puder, Ph.D. Committee Chair: Prof. Marguerite Murphy, Ph.D. Department of Computer Science
Outline Motivation and Literature Review A Taxonomy for Data Persistence NetBEAMS: A Case Study Empirical Analysis for Technology Selection DSP Data Persistence: Design and Architecture Experimental Results: Correct Behavior and Performance Conclusions and Future Works
Motivation • Persistence for NetBEAMS Sensor Network Infrastructure • Component-based sensor network for environmental monitoring (Puder et al) • Biologists are the main users of the system • What types of database systems? • Use the traditional relational data model? • Bai et al proposes programming languages (domain specific)
State of the Art in Data Persistence • Sensor Networks [Akyildiz et al] • Infrastructure (Topology) and Node Types • Size and Lifetime • Sensor Networks Nodes [Romer et al] • Deployment and Mobility • Size, Resources, Cost, Energy, Heterogeneity • Communication Mode • Coverage and Connectivity • [wp] [snc]
Persistence Storage for Sensor Networks • • How the Collected Data is Used • Real-time Data Stream • Data Archival • Storage Location for Collected Data • Local or External • Data-Centric • Query Processing Used • In-Network • Centralized • Data Volume Produce [ns]
Data Models and Query Engines • Tabular Data Model: flat files • Comma-separated values • Text Comparison • No Index • Relational Data Model: binary files • Data Normalization • Structured Query Language (SQL) • Lee et alproposed new operators for SQL • Structured Data Model: structured XML documents • XML Schema: document structure • XML Xpath: data retrieval • Database System Data Sink
How Collected Data is Described? • Data Stream:334 55.45 -23.44 119.394 44 1 22 | 5/ • Ledlie et al proposes the use of Data Provenance • Metadata: Data about data • What was collected? • Temperature = 54.3 : data • Scale = ’fahrenheit’ : metadata • When was the data collected? • Valid Time = Collected at 10:34am • Transaction Time = Time of Arrival • From where was the data collected? • GPS Coordinates: (12.342, -145.304) • Site: ‘lower-pier’ : metadata •
Problem: recent oil spill in the San Francisco Bay (Oct 2009) [sfb09] • Correlations between the collected data and the oil spill • Describing historical data events • Data Annotation • Liu et al annotates video frames fromsensor cameras • Descriptive Metadata • YouTube Video Tag • Tags for Web 2.0 “Junk” Data [an]
2. Data Persistence in Sensor Networks: a Proposed Taxonomy
2. A Taxonomy for Data Persistence • Taxonomy(Greek τάξις, taxis (meaning 'order', 'arrangement') and νόμος, nomos ('law' or 'science').) [Wikipedia]: • Practice and science of classification • Represented by hierarchical diagrams • Relationships between the root and branches Taxon Taxa
3. NetBEAMS: A Case Study • NetBEAMS: Data collection using Data Sensor Platform (DSP) • Automates operation of SF-BEAMS • SF-BEAMS: single-star sensor network – data archival • Nodes geographicallyfixed • Single-hop communication • Production intervals: 1, 6 or 15 minutes • Heterogeneous Devices • Coverage: Tiburon coast • 1 Data Sink (RTC Labs)
3. NetBEAMS: A Case Study • NetBEAMS Gateway Node • YSI Sonde + Gumstix Embedded System + GSM Modem Centralized Data SinkRTC Labs
Device Used by NetBEAMS • YSI 6600EDS V2: COTS Water Quality Monitoring • 13 Measurement parameters • 1 Year worth of raw data • Max 23.99 Mb at 1/min • 483,840 samples per year • 5 YSI in current deployment [ysi]
SF-BEAMS Classification
NetBEAMS Data Collection Scenario Missing Component!!!! 12.20 192 179 55 88.40 0.09 0.084 0.059 7.98 -79.6 99.5 8.83 0.4 8.7 Collected Data DSP Messages
Non-Functional Requirements • Open-Source • Free of charge • Easy to Scale (Data Partitioning) • Accessibility (API) • Cope with RTC Small Volume of Data
4. Technology Selection Empirical Analysis
4. Empirical Analysis for Technology Selection • Technologies used by the literature reviewed • MySQL: Jacob Nikomused it in Linux cluster for sensor networks; • TinyDB: Madden et al and Lee et al used it or sensor networks; • DB2: Sow et al used as a hybrid approach of XML and Relational • models to persist and query biometric events; • mongoDB: Buyya et al reported it in new trends in persistence in the cloud.
Use traditional Relational Databases • Tony Bain questions the adoption of the Relational Model • Traditional approach: 30 years • Accommodates changes? • Try adding entities • Try adding properties • Changes to the schema • Maintain schema normalized • Change Software Layers
Schema-less: Key-Value Pair Data Model • Data Collections: “denormalized” data • No Data Integrity = Data located on same physical space Annotation Observation Provenance
Tiburon, CA Berkeley, CA South Bay, CA • KVP Databases: better supports horizontal data partitioning • Shenker et alsurveysData-Centric Storage • Targeted Query vs Global Query Collected Sensor Data - Region 1 – Master Shard Collected Sensor Data - Region 2 – Master Shard Collected Sensor Data - Region 3 – Master Shard Projection Collected Sensor Data - Region 1 – Shard 2 Collected Sensor Data - Region 3 – Shard 2 Count Operation
5. DSP Data Sensor Platform: Design and Architecture
5. DSP Data Persistence: Design and Architecture • Persistence Scenario for NetBEAMS – Solution
Data Model Design: mongoDB Document Instance Where When • Data Manipulation: Programming Language Abstraction • ”Dot Notation” • sensor.location.latitude= 37.89 • time.transaction = Dec 17, 2009 • observation.pH = 7.11 What
Adding DSP Data Component Adding mongoDB
Deployment of the DSP Data Persistence • As External Storage Single Server • As Data-Centric Distributed Server
6. Experimental Results: Correct Behavior and Performance
6. Experimental Results: Correct Behavior and Performance • Goal: Simulate RTC Environment • Experiment Setup - Infrastructure • Key-Value definition • Randomly Generated YSI Sonde Data (R0) • Simulates Different Types of Storages using Virtualization; • Workload • Compatible data volume used by RTC • 1 YSI = 483,840 documents = First Round • 5 YSI = 5 * 483,840 = 2,419,200 = Consecutive Rounds
Scenarios • Use Cases as Agile User Stories – Persona, Action, Result • (R1) ”As a marine biologist, I would like to search observations by filtering values of the sensor device’s properties such as water temperature and salinity on December 17, 2009, so that I can find associated values to the observation.”; • db.SondeDataContainer.find( { observation.Salinity : 0.01, • observation.WaterTemperature : 46.47, • time.valid: new Date( 2009, 12, 17) } ) Programming Language mongoDB Abstraction to Access Data
Scenarios • •(U1) ”As a estuarine ecologist, I would like to annotate observations from the time the “oil spill” occurred in the San Francisco Bay, so that I can maintain historical evidence of the impact of such event.” • • db.SondeDataContainer.update( { • time.valid : { $gte:new Date(2009,10,12) , • $lt:new Date(2009,11,13) }} , • {$set : {tag: "oil spill"}} • ) Programming Language mongoDB Abstraction to Access Data
Implementation fulfills all the taxonomies • 1.35GB Claimed Disk Space • ~25,091 Inserts/min • Retrieval ~milliseconds • Update Varies (Depends on Partition Size, Dataset) • Simpler Implementation of Use Cases • Data accessibility • Different APIs, different languages • Key-Value Data Model • No schema changes to modify data design • Trade-off between Disk Storage (commodity) and performance
Data-Centric approach • Scales in terms of disk space available • Decreased processing time • Less data in a shard, faster query processing • Novel approach: alternative to existing ones • New Data Model Taxonomy
7. Conclusions and Future Works • How Important is Data Collection • Environmental Sensor Networks: Hazard Alerts • How to describe data: Data Provenance guidelines • Important descriptions: annotations, tags • Contributions • Data Persistence in Sensor Networks Taxonomies • Novel Approach: KVP data model for sensor networks data • Implementation for External and Data-Centric Storages • Technology ready for Cloud Computing
Future Works • Data-Centric Deployment with MapReduce Application • Sorting, subsets
Future Works • RTC gathers data by time period; Data are mostly repeated • Wang et al surveyed efficient schedulers for Sensor Networks; • Yin et al and Chen et al showed the use of Data Clustering before sending data to data sink; • Creation of a DSP Data Clustering before persisting data; • Research Problems • In-network storage/query using KVP databases • Partitioned Data nodes • Event-Based application developed on top of YSI Sonde Data • “observation.Battery” carries the battery life-time information;
References • Arno Puder, Teresa Johnson, Kleber Sales, Marcello de Sales, andDale Davidson. A component-based sensor network for environmen-tal monitoring. In SNA-2009: 1st International Conference on SensorNetworks and Applications, pages 54–60, San Francisco, CA, USA, November 2009. The International Society for Computers and Their Applica-tions - ISCA. • I.F. Akyildiz, Weilian Su, Y. Sankarasubramaniam, and E. Cayirci.A survey on sensor networks. Communications Magazine, IEEE,40(8):102–114, Aug 2002. • K. Romer and F. Mattern. The design space of wireless sensor networks.IEEE Wireless Communications, 11(6):54–61, December 2004 • Seungjae Lee, Changhwa Kim, and Sangkyung Kim. New database operators for sensor networks. In SERA ’07: Proceedings of the 5th ACIS International Conference on Software Engineering Research, Management & Applications, pages 689–696, Washington, DC, USA, 2007. IEEE Computer Society. • Jonathan Ledlie, Chaki Ng, and David A. Holland. Provenance-aware sensor data storage. In ICDEW ’05: Proceedings of the 21st International Conference on Data Engineering Workshops, page 1189, Washington, DC, USA, 2005. IEEE Computer Society. • Xiaotao Liu, Mark Corner, and PrashantShenoy. Seva: Sensor-enhancedvideo annotation. ACM Trans. Multimedia Comput. Commun. Appl., 5(3):1–26, 2009. • Jacob Nikom. Real-time sensor data warehouse architecture using mysql. InMySQL Users Conference. O’Reilly Media, Inc., April 2005. • [sfb09] Oil spills into s.f. bay south of bay bridge. http://www.sfgate.com/cgi-bin/article.cgi?f=/c/a/2009/10/30/BA9B1ACTST.DTL,October 2009. • Images • [sd] http://www.zess.uni-siegen.de/ipp_home/ipp/research/master-student-topics/ • [snc] http://www.dei.unipd.it/~schenato/pics/SensorNetwork.jpg • [ns] http://www.cc.gatech.edu/projects/disl/specialProjects/figure1.gif • [an] http://eurekr.com/pics/AnnotatinganImageinWPF_A7D8/image.png • []ysi] http://www.ckjorc.org/cn/admin/news/edit/UploadFile/200681616301130.jpg
References • Daby M. Sow, Lipyeow Lim, Min Wang, and Kyu Hyun Kim. Persisting and querying biometric event streams with hybrid relational-xml dbms. In DEBS’07: Proceedings of the 2007 inaugural international conference on Distributedevent-based systems, pages 189–197, New York, NY, USA, 2007. ACM. • Samuel R.Madden, Michael J. Franklin, JosephM. Hellerstein, andWeiHong.Tinydb: an acquisition query processing system for sensor networks. ACMTrans. Database Syst., 30(1):122–173, 2005 • . • Images • [sd] http://www.zess.uni-siegen.de/ipp_home/ipp/research/master-student-topics/ • [snc] http://www.dei.unipd.it/~schenato/pics/SensorNetwork.jpg • [ns] http://www.cc.gatech.edu/projects/disl/specialProjects/figure1.gif • [an] http://eurekr.com/pics/AnnotatinganImageinWPF_A7D8/image.png • []ysi] http://www.ckjorc.org/cn/admin/news/edit/UploadFile/200681616301130.jpg
Department of Computer Science A Key-Value-based Persistence Model for Sensor Networks ? Marcello de Sales Master of Science in Computer Science(msales@sfsu.edu) http://code.google.com/p/netbeams http://www.netbeams.org “The brick walls are not there to keep us out. The brick walls are thereto give us a chance to show how badly we want something. Because the brick walls are there to stop the people who don't want it badly enough.” Dr. Randy Pausch
DSP in practice = NetBEAMSUse Cases • Data Payload for the YSI Sonde 6600V2 • SondeDataType: representation for the collected data • SondeDataContainer: collection of the collected data
Data Sensor Platform (DSP)Message Structure • DSP Message • Header • Producer • Consumer • Body • Message Content • DSP Messages Container • Package of DSP Messages
Data Sensor Platform (DSP)Communication Mechanism • DSP Broker • Local delivery • Remote delivery • Gateway Component • DSP Matcher • Filtering based on rules • Independent Per Host
3. NetBEAMS: A Case Study DSP Data Persistence component Requirements • Open-Source • Support Data-Centric • Free of charge • Accessibility (API) • Cope with RTCSmall Volume of Data