Fusing Machine Learning & Control Theory With Applications to Smart Buildings & ActionWebs

Fusing Machine Learning & Control TheoryWith Applications to Smart Buildings & ActionWebs UC Berkeley ActionWebs Meeting November 03, 2010 By Jeremy Gillula [Some rights reserved unless otherwise noted; see http://tinyurl.com/2qn665] TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAAAAAAA

Talk Outline • Current “State of the Art” • Reinforcement learning and apprenticeship learning • Reachability for guaranteed safe mode switching • Motivation • Goals – Combining Machine Learning and Control Theory • Existing Approaches • Current Research • Extensions • Conclusions • Questions ActionWebs Talk (J. Gillula)

“An Application of Reinforcement Learning toAerobatic Helicopter Flight” (Abbeel et al., 2007) • Analysis: • Great performance • No formal safety analysis • Required some hand-tweaking for stability (e.g. hand-chosen reward weights) • Easily generalizable • Use linear regression to learn parameters of given model • Use differential dynamic programming to solve the MDP • Generate trajectory using current policy and nonlinear dynamics • Compute new policy using LQR and linearized dynamics around that trajectory • Reward function generated using apprenticeship learning [Video from Abbeel et. al. 2007] ActionWebs Talk (J. Gillula)

“Design of Guaranteed Safe Maneuvers Using Reachable Sets…” (Gillula et al., 2010) • Safe given accuracy of model and worst-case disturbances • Used reachability analysis via level-set methods to design and perform a safe backflip ActionWebs Talk (J. Gillula)

Create a level set function such that: Boundary of keep-out set K is defined implicitly by is negative inside region and positive outside Reachability as game: Disturbance attempts to force system into unsafe region, control attempts to stay safe Solution can be found via Hamilton-Jacobi-Bellman PDE: “A…Hamilton–Jacobi Formulation of Reachable Sets for Continuous Dynamic Games” (Mitchell et al., 2005) [Figure from Tomlin 2009] ActionWebs Talk (J. Gillula)

“Design of Guaranteed Safe Maneuvers Using Reachable Sets…” (Gillula et al., 2010) • Analysis: • Decent performance • Formal safety analysis • Required human input for choosing design parameters • Difficult to generalize Recovery Drift Impulse ActionWebs Talk (J. Gillula)

Motivation: “Machine Learning” Techniques vs. “Control Theory” Techniques ActionWebs Talk (J. Gillula)

Goals/Research Statement • How can we get high-performance on complicated systems while still guaranteeing safety • Take advantage of “Machine Learning” techniques for performance • Data-driven models (potentially nonparametric) • Data-driven, sampling-based techniques for estimation and control • While getting “Control Theory”-style safety guarantees • Formal, principled analyses of safety • Several Possible Approaches • Adapt data-driven methods to existing safety-analysis techniques • Closely couple data-driven methods with techniques for generating safety guarantees • Use data-driven techniques in the context of existing safety-analysis techniques • Other alternatives ActionWebs Talk (J. Gillula)

“System Identification of Post Stall Aerodynamics forUAV Perching” (Hoburg and Tedrake, 2009) • Nonlinear and transient aerodynamics in perching • Need to learn model from data • Use physically-inspired basis functions • Nonlinear functions of state x, z, µ, etc. • Compute least-squares fit for every combination of n basis functions: Adapt data-driven methods to existing safety-analysis techniques [Figures from Hoburg and Tedrake 2009] ActionWebs Talk (J. Gillula)

“System Identification of Post Stall Aerodynamics forUAV Perching” (Hoburg and Tedrake, 2009) • Nonlinear and transient aerodynamics in perching • Need to learn model from data • Use physically-inspired basis functions • Nonlinear functions of state x, z, µ, etc. • Compute least-squares fit for every combination of n basis functions: • Analysis/Extensions: • Use standard control theory techniques to generate safety guarantees • Use lasso or other regularization to choose basis functions Adapt data-driven methods to existing safety-analysis techniques [Figures from Hoburg and Tedrake 2009] ActionWebs Talk (J. Gillula)

“Predictive Guidance Intercept Using The Neural EKF Tracker” (Stubberud and Kramer, 2007) • Augmented process model is: • Use an adaptive EKF to learn the error: • Let augmented state be: • Then: Closely couple data-driven methods with techniques for generating safety guarantees NN weights ActionWebs Talk (J. Gillula)

“Predictive Guidance Intercept Using The Neural EKF Tracker” (Stubberud and Kramer, 2007) • Then associated Jacobian is: so state estimation and NN training are coupled • Normal EKF analysis follows • Analysis: • Learns model error • Learning done online • But combining ML and control theory tools can be tricky • E.g. augmented system is not observable Closely couple data-driven methods with techniques for generating safety guarantees ActionWebs Talk (J. Gillula)

[Pioneer image courtesy University of Queensland, http://tinyurl.com/38dje6f] Safely Learning A Bounded System Use data-driven techniques in the context of existing safety-analysis techniques • Learning unknown dynamics of a target vehicle via observation • Limited field of view • Safety = always keeping target in view, i.e. • Bounded system • Assume target dynamics are autonomous and bounded, i.e. • Measurement model given by: ActionWebs Talk (J. Gillula)

[Pioneer image courtesy University of Queensland, http://tinyurl.com/38dje6f] Safely Learning A Bounded System Use data-driven techniques in the context of existing safety-analysis techniques • Problem statement • Learn target dynamics • Minimize error: • Maintain target in view: • For (1) use machine learning: • Fixed model w/linear regression • Physically inspired basis functions • Neural network • (1) leads to (2) via EKF, UKF, or PF • (3) requires controlling our vehicle’s position and height ActionWebs Talk (J. Gillula)

[Pioneer image courtesy University of Queensland, http://tinyurl.com/38dje6f] Safely Learning A Bounded System Use data-driven techniques in the context of existing safety-analysis techniques • For (3) use reachability: • Unsafe set • Treat target motion as adversarial disturbance • Augmented system dynamics: • Result: • Can use any learning/tracking algorithm • Reachability only kicks in on border of unsafe sets ActionWebs Talk (J. Gillula)

Caveat • What follows is pure brainstorming • Feedback and suggestions are welcome ActionWebs Talk (J. Gillula)

Safely Learning A Bounded System Use data-driven techniques in the context of existing safety-analysis techniques • Possible extension: safe autonomous data collection/learning • Attempt to learn/modify building model (or control policy) online • Start w/basic physics model (or control policy) • Assume bounded errors as disturbance • Reachability enables following any exploration policies when safe [Image courtesy Jorge Ortiz, http://tinyurl.com/2dnz5jl] ActionWebs Talk (J. Gillula)

Safely Learning A Bounded System Use data-driven techniques in the context of existing safety-analysis techniques • Limited acceptable range • Safety = always keeping target states within acceptable tolerances, i.e. • Bounded system • Assume target dynamics are bounded, i.e. • Problem statement • Learn system dynamics • Minimize error: • Maintain target states in safe region: • Proposed Approach • Use machine learning • Use the results of (1) with optimal control • Use reachability [Image courtesy Jorge Ortiz, http://tinyurl.com/2dnz5jl] ActionWebs Talk (J. Gillula)

Safely Learning A Bounded System ActionWeb Use data-driven techniques in the context of existing safety-analysis techniques • Difficulties: • Reachable set calculations for high dimensions • And they need to be online [Image courtesy David Culler, http://tinyurl.com/2bcaqnh] ActionWebs Talk (J. Gillula)

Safely Learning A Bounded System ActionWeb Use data-driven techniques in the context of existing safety-analysis techniques • Solution: Building decomposition • Decompose building into separate rooms • Model each room in parallel • Treat interactions between rooms as bounded adversarial inputs • Still fits in machine learning framework (can still model interactions) • Still fits in reachability framework (can still calculate safe sets) [Image courtesy Claire Tomlin, http://tinyurl.com/26bpcl8] ActionWebs Talk (J. Gillula)

NN weights Conclusions • Combining Machine Learning and Control Theory • Achieving high-performance on complicated systems while still guaranteeing safety • Possible Approaches: • Adapt data-driven methods to existing safety-analysis techniques • Closely couple data-driven methods with techniques for generating safety guarantees • Use data-driven techniques in the context of existing safety-analysis techniques • Extension to smart buildings and ActionWebs ActionWebs Talk (J. Gillula)

Questions? ActionWebs Talk (J. Gillula)

Fusing Machine Learning & Control Theory With Applications to Smart Buildings & ActionWebs