420 likes | 663 Views
Smart Home Technologies. Data Management and Databases . Databases for Smart Homes. Requirements Database Types Database Technologies Smart Home Databases Data Mining. Data Storage Requirements. Sensor data Temperature (15 @ 8 Kbps) Humidity (15 @ 8 Kbps) Gas (15 @ 8 Kbps)
E N D
Smart Home Technologies Data Management and Databases
Databases for Smart Homes • Requirements • Database Types • Database Technologies • Smart Home Databases • Data Mining
Data Storage Requirements • Sensor data • Temperature (15 @ 8 Kbps) • Humidity (15 @ 8 Kbps) • Gas (15 @ 8 Kbps) • Light (15 @ 8 Kbps) • Motion (15 @ 8 Kbps) • Pressure (100 @ 8 Kbps) • Microphone (15 @ 500 Kbps) • Camera (15 @ 10 Mbps)
Data Storage Requirements • User data • Multimedia • Phone messages/conversations (500 Kbps – 10 Mbps) • Music (500 Kbps) • TV/Radio broadcasts (500 Kbps – 10 Mbps) • Home movies (10 Mbps) • Images • Computer • Programs • Data files • Operating systems
Data Storage Issues • Issues • Query frequency and type • Sampling/recording rates • 205 sensors (158,900 Kbps) • Multimedia recordings • Simultaneous playback • Analysis, prediction, decision-making queries • Transaction granularity • Historical data, decay • Security and privacy • Centralized vs. distributed
What Data to Store • Type of Data • Raw data • Pre-processed • Compressed • Frequency of Data Storage for Sensor Data • Tradeoff between precision and quantity
Sensor Data Example • 9/8/2002 2:0:1 AM~A5 (Coffee Maker) ON • 9/8/2002 1:6:59 AM~A9 (A/C) ON • 9/8/2002 3:58:52 AM~A0 (Stereo) ON • 9/8/2002 5:57:0 AM~A2 (Kitchen Light) ON • 9/8/2002 3:1:42 AM~A5 (Coffee Maker) OFF • 9/8/2002 7:8:3 AM~A3 (Stove) ON • 9/8/2002 12:54:52 PM~A10 (Bathroom Light) ON • 9/8/2002 4:58:5 AM~A0 (Stereo) OFF • 9/8/2002 8:1:20 AM~A3 (Stove) OFF • 9/8/2002 9:6:10 AM~A8 (Computer) ON • 9/8/2002 10:8:19 AM~A4 (Bathtub Heater) ON • 9/8/2002 11:9:4 AM~A0 (Stereo) ON • 9/8/2002 9:4:5 AM~A8 (Computer) OFF • 9/8/2002 10:9:4 AM~A4 (Bathtub Heater) OFF • 9/8/2002 2:2:5 PM~A10 (Bathroom Light) OFF • 9/8/2002 2:52:37 PM~A0 (Stereo) OFF • 9/8/2002 4:2:0 PM~A9 (A/C) OFF
Multimedia Example • Digital Silhouettes (Predictive Networks) • Predicting web surfing behavior ($$$) • Microsoft (2002) track TV viewing preferences • 140 data items for each user • Demographics (50) • Subcategories within gender, age, income, education, occupation, and race • 90 Content preferences • golf, music, yoga
Database Types / Data Models • Relational • OO • Hybrid (Object-Relational) • Temporal • Deductive • Others • Spatial, …
Example Data Representations • Relational • We all know…flat tables of atomic attributes with foreign key relationships • OO • Complex data reps • multivalued, composite • Temporal • Relational model: add valid start, end dates to each table (versions of info and when valid) • Includes time, events, durations…
Operations • DDL/DML (data def/manip languages) • SQL • OQL • Update operations • Built-in insert, delete, update • Stored procedures for triggers, active (ECA) rules
Example Operations for Temporal Databases • INCLUDES • Rows valid in a certain time period • BEFORE/AFTER a time condition • Set operations • Union, intersection of 2 time periods
Active DB • Event-Condition-Action rules • Allow for decisions to be made in the database instead of a separate application • Relational • Implemented as triggers • Challenges • Rule consistency • (2+ rules do not contradict) • Guaranteed termination • Trigger loops (T1 <->T2)
Smart Home Active DB Example • Java, Postgres, Jess rules • Event classification (local&composite) • Data Manipulation Events • TV show being viewed (channel, time, genre…) • Temporal Events (instance,recurring) • Set temp to 70 degrees at 7:00am workdays • Exception Events • Power failure • Behavioral Events • Time children home from school; dinner time
Distributed vs. Centralized • Centralized database can produce a bottleneck • Large volume of data input • Large database • Large volume of queries • In distributed databases, data consistency, replication, and retrieval can be more problematic • Consistency of schemas • Retrieval in case the data location is not known • Communication overhead to ensure database consistency
SmartHome Database Architecture • Centralized vs. distributed? • Answer: Both • Central storage of high demand, persistent data • Distributed storage of low demand, dynamic data • Distributed queries • Push processing toward sensors • Adaptive, hierarchical organization • End-effector autonomy (“smart sensor”)
Commercial DB2 Empress Informix Oracle MS Access MS SQL Sybase Free Berkeley DB PostgreSQL MySQL Database Systems
UTA MavHome DB • Active • Reactive & proactive (e.g., to predict) • Distributed • Information collection agents • Rules • Local Agent: what data they need to collect • Distributed: coordinate overall monitoring of collected information • Continuous monitoring of events • Extension of SNOOP
Microsoft Easy Living DB (2002) • Relational • Fast & robust, but awkward for some data • World Model DB Describes: • Computing devices • People and their personal preferences/settings • Services • Rooms and doorways • Serves as Abstraction Layer between sensors and application that use data from sensors • e.g. new sensors no change to applications
Stanford Interactive Workspace • Uses LORE • A semi-structured XML DB system • Still available, but work stopped in 2000 • Data stored is catalog of (index to) • documents, images, 3-D models, application-specific domain models
Sensor Database Systems • COUGAR project • www.cs.cornell.edu/database/cougar • Query processing over ad-hoc sensor networks • Small database component (QueryProxy) at each sensor • Sensor clusters provide local aggregations (e.g., min, max, mean) • Assumes centralized index of all data sources
Siemens Netabase • “The network is the database.” • Navas and Wynblatt, ACM SIGMOD 2001 • Sensor networks • Large number of data sources (105) • Volatile data and data organization • “Thin” data servers on scaled-down hardware • Netabase approach • Query decomposition • Characteristic routing (ala IP routing) • Local joins • Query evaluation
Siemens Netabase • www.netabasesoftware.com
Data Warehouses • Repositories for data mining activities • Aggregates/summaries of data help efficiency • Optimized for decision-support, not transaction processing • Definition (Elmasri, page 900) • A subject-oriented, integrated, non-volatile, time-variant collection of data in support of management’s decisions” • Replace “management”, with “smart home agents”
Warehouse Properties • Very large: 100gigabytes to many terabytes • Tends to include historical data • Workload: mostly complex queries that access lots of data, and do many scans, joins, aggregations. Tend to look for "the big picture". • Updates pumped to warehouse in batches (overnight) • Data may be heavily summarized and/or consolidated in advance (must be done in batches too, must finish overnight). • Research work has been done (e.g. "materialized views") -- a small piece of the problem. 02.15.04 from http://redbook.cs.berkeley.edu/lec28.html
Data Warehouses • Data Cleaning • Data Migration: simple transformation rules (replace "gender" with "sex") • Data Scrubbing: use domain-specific knowledge (e.g. zip codes) to modify data. Try parsing and fuzzy matching from multiple sources. • Data Auditing: discover rules and relationships (or signal violations thereof). Not unlike data mining. • Data Loading • can take a very long time! (Sorting, indexing, summarization, integrity constraint checking, etc.) Parallelism a must. • Full load: like one big xact – change from old data to new is atomic. • Incremental loading ("refresh") makes sense for big warehouses, but transaction model is more complex – have to break the load into lots of transactions, and commit them periodically to avoid locking everything. Need to be careful to keep metadata & indices consistent along the way. 02.15.04 from http://redbook.cs.berkeley.edu/lec28.html
Data Warehouses 02.15.04 from http://redbook.cs.berkeley.edu/lec28.html
Data Mining Definition • Discovery of new information in terms of patterns or rules from vast amounts of data • Extracts patterns that can’t readily be found by asking the right questions (queries) • TOO MUCH DATA FOR HUMANS • Emerged from • Artificial Intelligence:Machine learning, Neural nets, Genetic Algorithms • Statistics • Operations Research
Data Mining Steps • Data selection -- pick the data needed • Data cleansing • Fix bad data (e.g., spelling, zip codes) • Hard to deal with missing, erroneous, conflicting, redundant data • Enrichment • Add data (e.g., age, gender, income) • Data transformation • Aggregate (e.g., zip codes regions) • Data mining • Reporting on discovered Knowledge
Types of Results • Association rules • Buy diapers buy lots of beer • Sequential patterns • Buy house buy furniture within months • Classification trees • Types of buyers (upscale,bargain-conscience, …) • Why do it? • Make more money • Science & medicine
Data Mining Goals • Find patterns to predict future events • Find major groupings • Groupings of buyers, stars, diseases … • Find which group something belongs to • creditworthiness
Data Mining Results • Association rules • Classification hierarchies • Clustering • Sequential patterns • Patterns within time series • Type of result, inputs & algorithms vary • Often interested in some combination of these types of Knowledge
Clustering • Unsupervised learning techniques • Training samples are unclassified • Vs. supervised learning (classification) • Drug categories for depression • Categories of TV viewers • Categories of buyers (likely, unlikely) • Categories of households? • Single male, mother/children, conventional (M/D/kids), DINKs.
Sequential Patterns • Detecting associations among events with certain temporal relationships • Example: • Cardiac bypass for blocked arteries • AND within 18 months, high blood urea • THEN kidney failure likely in next 18 months • Particularly important in smart homes
Sequential Pattern Discovery • Sequence of itemsets • Grocery store purchases by 1 person (3 itemsets) • {soy milk, bread, chocolate}, {bananas, chocolate}, {lettuce, tomato, chocolate} • 2 Subsequences • {soy milk, bread, chocolate}, {bananas, chocolate}, • {bananas, chocolate}, {lettuce, tomato, chocolate}
Sequential Pattern Discovery • The support for a sequence S is the % of the given set U of sequences of which S is a subsequence. • That is: how many times does S show up? • Find all subsequences from the given sequence sets that have a user-defined minimum support. • The sequence S1, S2, … Sn, is a predictor of “fact” that a customer that buys itemset S1 is likely to buy itemset S2, then S3, … • Prediction support based on frequency of this sequence in the past • Many research issues to create good algos
Patterns Within Time Series • Finding 2 patterns that occur over time • 2003 stock prices of Choice Homes and Home Depot • 2 products show same sales pattern in summer but different one in winter • Solar magnetic wind patterns may predict earth atmospheric changes
Time Series Pattern Discovery • Time series are sequences of events • Event could be a transaction (closing daily stock price) • Look at sequences over n days, or • Longest period in which change is no greater than 1% • Comparing • Must define similarity measures
Other Approaches in Data Mining • Neural nets • Infer a function from a set of examples • Non-parametric curve-fitting • Interpolates to solve new problems • Supervised & unsupervised algorithms • Capabilities • classification • time-series prediction • Disadvantages • can’t see what it learned (not declarative)
Other Approaches in Data Mining • Genetic algorithms • Set up • Representation (strings over an alphabet) • Evaluation (fitness) function • Parameters: # of generations, cross-over rate, mutation rate, etc. • Randomized (probabilistic operators), parallel search over search space • Used for problem solving and clustering