Smart Home Technologies. Decision Making. Motivation. Intelligent Environments are aimed at improving the inhabitants’ experience and task performance Provide appropriate information Automate functions in the home
Motivation • Intelligent Environments are aimed at improving the inhabitants’ experience and task performance • Provide appropriate information • Automate functions in the home • Prediction techniques can only determine what would happen next, not what should happen next. • Automated functions can be different from inhabitant actions • Computer has to determine actions that would optimize inhabitant experience
Decision Making • Decision Making attempts to determine the actions the system should take in the current situation • Should a function be automated ? • What should be done next ? • Decisions should be based on the current context and the requirements of the inhabitants • Just programmed timers for automation are not sufficient • Decision maker has to take into account the stream of data
Decision Making in Intelligent Environments • Example Decision Making Tasks in Intelligent Environments: • Automation of physical devices • Turn on lights • Regulate heating and air conditioning • Control media devices • Automate lawn sprinklers • Automate robotic components (vacuum cleaner, etc) • Control of information devices • Provide recipe services in the kitchen • Construct shopping lists • Decide which types of alarms to display (and where)
Decision Making inIntelligent Environments • Objectives of decision making: • Optimize inhabitant productivity • Minimize operating costs • Maximize inhabitant comfort • Decision making process has to be safe • Decisions made can never endanger inhabitants or cause damage • Decisions should be within the range accepted by the inhabitants
Example Task • Should a light be turned on ? • Decision Factors: • Inhabitant’s location (current and future) • Inhabitant’s task • Inhabitant’s preferences • Time of the day • Other inhabitants • Energy efficiency • Security • Possible Decisions • Turn on • Do not automate
Decision Making Approaches • Pre-programmed decisions • Timer-based automation • Reactive decision making systems • Decisions are based on condition-action rules • Decisions are driven by the available facts • Goal-based decision making systems • Decisions are made in order to achieve a particular outcome • Utility-based decision making systems • Decisions are made in order to maximize a given performance measure
Qualities of a Decision Making • Ideal • Complete: always makes a decision • Correct: decision is always right • Natural: knowledge easily expressed • Efficient • Rational • Decisions made to maximize performance
Decision-Making Techniques • Reactive Decision Making • Rule-based expert system • Goal-Based Decision Making • Planning • Decision theoretic Decision Making • Belief Networks • Markov decision process • Learning Techniques • Neural Networks • Reinforcement Learning
Rule-Based Decision Making • Decisions are made based on rules and facts • Facts represent the state of the environment • Represented as first-order predicate logic • Condition-Action rules represent heuristic knowledge about what to do • Rules represent implications that imply actions from logic sentences about facts • Inference mechanism • Deduction: {A, A B} B • The left hand side of rules are matched against the set of facts • Rules where the left hand side matches are active
Rule-Based Inference • Rules define what actions should be executed for a given set of conditions (facts) • Actions can either be external actions (“automation”) or internal updates of the set of facts (“state update”) • Rules are often heuristics provided by an expert • Multiple rules can be active at any given time • Conflict resolution to decide which rule to fire • Scheduling of active rules to perform sequence of actions
Example • Facts: • CurrentTime = 6:30 • Location(CurrentTime,bedroom) • CurrentDay = Monday • Rules: • Internal actions: (CurrentDay=Monday)^(CurrentTime>6:00) ^(CurrentTime<7:00)^(Location(CurrentTime,bedroom)) ->Set(Location(NextTime,bathroom)) • External actions: (Location(NextTime,X)) -> Action(TurnOnLight,X)
Rule-Based Expert Systems • Intended to simulate (and automate) human reasoning process • Domain is modeled in first-order logic • State is represented by a set of facts • Internal rules model behavior of the environment • Experts provide sets of heuristic condition-action rules • Rules with internal actions can model reasoning process • Rules with external actions indicate decisions the expert would make • The system can optionally be provided with queries by including them in the facts set.
Internal Rules • Internal rules have to model the behavior of the system • Persistence over time E.g.: (Location(CurrentTime,X))^(NoMove(CurrentTime)) -> Set(Location(NextTime,X)) • Dynamic behavior of devices E.g.: (Temperature(CurrentTime,X))^(HeatingOn) -> Set(Temperature(NextTime,X+2)) • Behavior of the inhabitants E.g.: (Location(CurrentTime,bedroom)) ^(CurrentTime>23:00) ^(LightOn(CurrentTime, bedroom)) -> Action(TurnOffLight, bedroom)
Logic Inference Systems and Expert System Shells • Logic programming systems provide inference capabilities. • Examples: • Prolog • OTTER • Expert system shells provide the infrastructure to build complete expert systems • Examples: • CLIPS (for C) • JESS (for Java)
Example System: IRoom [Kul02] • Initial versions of the MIT IRoom project used JESS as an inference engine to make decisions about activating devices • For example: If a person enters the room and the room is empty then turn on the light • Rules are programmed by the system designer before the room is used and then refined based on experience [Kul02] Ajay Kulkarni. Design Principles of a Reactive Behavioral System for the Intelligent Room.. 2002.
Rule-Based Decision Making • Characteristics • Complete and correct (given complete rules) • Natural (given expert specified rules) • Advantages • Permits the system to be programmed relatively efficiently by an expert • Can address relatively complex systems • Problems • Quality of the rules is essential • Behavior of the environment mimics the expert • Anticipating all possible contexts is difficult
Planning Decisions • A planning system searches for a sequence of actions which can achieve a defined goal. • States can be represented as logic sequences • Actions are defined as operators (symbolic representations of the effect and conditions of actions) which contain: • Preconditions of actions • Effects of actions • A goal is a set of states • Planning system uses constraints to efficiently search for a sequence of operators that lead from the start state to a goal state.
Example • Initial State :(Location(bedroom))^(Light(bathroom,off)) • Goal: Happy(Inhabitant) • Action 1: MakeHappy Precondition: (Location(X))^(Light(X,on)) Effect: Add: Happy(Inhabitant) • Action 2: TurnOnLight(X) Precondition: Light(X,off) Effect: Delete: Light(X,off), Add: Light(X,on) • Action 3: Move(X, Y) Precondition: (Location(X))^(Light(Y,on)) Effect: Delete: Location(X), Add: Location(Y) • Plan: Action 2, Action 3, Action 1
Start Location(bedroom) Light(bathroom,off) Light(bathroom,off) TurnOnLight Location(bedroom) Light(bathroom,on) Light(bathroom,on) MoveTo Location(bathroom) Location(bathroom) Light(bathroom,on) MakeHappy Happy(Inhabitant) Happy(Inhabitant) Finish Example
Example Planning Systems • Partial Order Planners • Derive plans without requiring to find actions in sequence • SNLP (Univ. of Washington) • GraphPlan (CMU) • Builds and prunes graph of possible plans • Conditional Planners • Derive plans under uncertainty by constructing plans that work under given conditions • UCPOP (Univ. of Washington) • Partial Order Planner with Universal quanitification and Conditional effects CPOP • Sensory GraphPlan (CMU)
Planning Decisions • Characteristics • Complete and correct (given complete rules) • Relatively natural formulation • Advantages • Permits a sequence of actions to be found that performs a given task • Goals can be defined easily • Problems • Requires complete description of the system • Uncertainty is difficult to handle • Planning is generally very complex
Decision Theory • Decision theory addresses rational decision making under uncertainty • Uncertainty is represented using probabilities • Uncertainty due to incomplete observability • Uncertainty due to nondeterministic action outcomes • Uncertainty due to nondeterministic system behavior • Utility theory is used to achieve rational decisions • Utility is a measure of the expected “value” of a given situation or decision • Rational decisions are the ones that yield the highest expected utility in the current situation
Modeling Uncertainty • The current situation can be represented as a Belief state, i.e. as a probability distribution over the states indicating the likelihood that any given state xi is the current state {(x1, P(x1)), (x2, P(x2)),…, (xn, P(xn))} • The probability of a state can be expressed as the probability of all state attributes P(x)=P(a1,a2,…,an) • Uncertainties from incomplete observability and nondeterminism can be modeled as conditional probabilities • State transition model: • Observation model: P(o | x)
Bayes Rule • Bayes rule permits to invert cause and effect when calculating probabilities • It is often easier to estimate P(e | c) • Probability of a state given a set of sensor readings, P(x | o) , can be calculated knowing the observation probabilities P(o | x)
Utility Theory • Utilities U(A) represent the “value” of a given situation or decision A and model preferences • The utility function for a particular system is not unique • Only relative differences between utility values are important • U(A) > U(B) A preferred to B • U(A) = U(B) agent indifferent to A and B • Utilities for uncertain situations can be calculated as the expected value of the utility of all possibilities U({(x1,P(x1)),…,(xn,P(xn))) = i P(xi)* U(xi)
Rational Decisions • The rational decision is the one that leads to the highest utility • Rational decisions in Decision theory requires • Complete causal model of the environment P(xi | xj, d) • Complete knowledge of the observation (sensor) model P(o | xi) • Knowledge of the Utility function for all states U(xi)
Markov Decision Processes • Markov Decision Processes (MDPs) form a probabilistic model of all possible system behavior • MDPs can be described by a tuple <S, A, T, R> representing states, actions, transition probabilities, and reinforcements. • System has to obey the Markov assumption P(xt+1|xt, dt, xt-1, dt-1, …, x0) = P(xt+1 | xt, dt) • Reinforcement represents the instantaneous change in utility obtained in a given state • Models costs and payoffs • Are generally sparse and delayed
Utility Function for MDPs • In an MDP, the utility of a state under a given policy can be defined as the expected sum of discounted reinforcements • The optimal utility function U* can be computed using Value iteration • Optimal policy (decision strategy) can be extracted from the utility function
MDP Example • S = {(1,1), (1,2), … (4, 3)} • A = {,,,} • T: P(intended direction) = 0.8, P(right angle to intended) = 0.1 • R: +1 at goal, -1 at trap, 0.04 in all other states • = 1
MDP Example Optimal Utilities Optimal Policy
Markov Decision Processes • Characteristics • Complete and Correct • Advantages • Takes into account transition uncertainty • Makes optimal decisions • Automatically calculates the utility function • Problems • Requires complete probabilistic description of the system • Requires complete observability of the state
Partially Observable MDPs • Partially Observable Markov Decision Processes (POMDPs) extend MDP by permitting states to be only partially observable. • Systems can be represented by a tuple <S, A, T, R, O, V> where <S, A, T, R> is an MDP and O, V are mapping observations about the state to probabilities of a given state • O = {oi} is the set of observations • V: V(x, o) = P(o | x) • To determine an optimal policy, an optimal utility function for the belief states has to be computed
POMDPs • Characteristics • Complete and Correct • Advantages • Takes into account all uncertainty • Makes optimal decisions • Problems • Requires complete probabilistic description of the system • Optimal solution is so far intractable (dynamic decision networks and approximation techniques exist and work for small state spaces)
Learning Decisions • Learning techniques permit decisions to be learned from past experience and feedback from the inhabitants or the environment. • Supervised learning • Requires the desired decision to be specified during training • Reinforcement learning • Learns by experimentation from scalar reward feedback • Inhabitant feedback (e.g. device interactions) • Explicit environment feedback (e.g. energy consumption) • Implicit feedback (e.g. prediction of comfort of inhabitant)
Feedforward Neural Networks • Neural networks are a supervised learning mechanism that can be trained to make decisions based on a set of training examples. • Training for reactive decisions involves the presentation of a set of examples (xi, d(xi)),where d(xi) is the desired decision to be made in configuration xi. • Training for goal-based or utility-based decisions involves learning a model that maps input (xi, d(xi)) to the outcome of the action f(xi, d(xi))and then selecting the decision with the best outcome.
Example System: Regulation in the Adaptive House [DLRM94] • Neural network learns to regulate the lights in the house to maintain a given light intensity. • Learns a network that predicts the light intensity if a given set of lights are turned on • Input: • The current light device levels (7 inputs) • The current light sensor levels (4 inputs) • The new light device levels (7 inputs) • Output: • The new light sensor levels (4 outputs) [DLRM94] Dodier, R. H., Lukianow, D., Ries, J., & Mozer, M. C. (1994). A comparison of neural net and conventional techniques for lighting control.Applied Mathematics and Computer Science, 4, 447-462.
Example System: Regulation in the Adaptive House continued • Decisions are made by comparing the output of the network for all possible decisions (i.e. combinations of lights to be turned on) with the desired light intensity and taking the decision that most closely matches it. • Decision: Set pointp State xi : : : Decision d : Predictionf(xi, d)
Neural Networks • Characteristics • Efficient • Advantages • Can learn arbitrary decision functions from training data • Generalizes to new situations • Fast decision making • Problems • Requires training data that contains desired decision or a goal/objective • Requires design of sufficient input representation
Reinforcement Learning • Reinforcement learning learns an optimal decision strategy from trial and error and sparse reward feedback. • On-line method to solve Markov Decision Processes (or, with extensions, POMDPs). • Reward, R, is a signal encoding the instantaneous feedback to the system. • System learns a mapping from states to decisions, *(xi), which optimizes the expected utility.
Q-Learning • Q-learning is the most popular reinforcement learning technique for MDPs. • Learns a utility function for state-action pairs • Q(x, d) • Utility U(x) = maxa Q(x,d) • Learns by experimentation. • Update Q(xi ,d) after each observed transition from state xiby comparing the expected utility of (xi,d) with the expectation computed after observing the actual outcome xj. Q(xi,d) = Q(xi,d) + * (R(xi) + maxd’ Q(xj,d’) - Q(xi,d)) • Decisions are made to optimize Q-values • (x) = argmaxd Q(x,d)
Example System: Regulation in the Adaptive House [Moz98] • Neural network regulators can control lighting and heating to achieve a given set point • Set point is learned using reinforcement • Energy usage • Inhabitant interactions with light switches or thermostats [Moz98] Mozer, M. C. The neural network house: An environment that adapts to its inhabitants. In Proc. AAAI Spring Symposium on Intelligent Environments (pp. 110-114). Menlo, Park, CA, 1998.
Example System: MavHome • Uses Q-learning on a state space including device status and the Active LeZi prediction. • State st at time t st = (xt, pt) • Reinforcement includes multiple metrics • Energy usage • Number of inhabitant-device interactions • Decisions are device interactions and an action representing the decision not to perform an action. • System operates event-driven, making a decision every time an event happens. • Learner is pre-trained using the Active LeZi predictor.
Example System: MavHome • Example task: getting up in the morning and taking a shower.
Example System: MavHome • Home learns to automate light activations such as to minimize energy usage without increasing the number of inhabitant interactions
Reinforcement Learning • Characteristics • Optimal policies (given enough training) • Advantages • Can learn optimal decision strategies without explicit training • Can deal with multiple objectives • Problems • Trial and error learning can lead to spurious actions leading to potential safety issues • Requires complete state space representations • Can be very complex