10 likes | 42 Views
S T A N F O R D. Models in Prior Work Predict velocities and angular rates: f : learned from data. Obtain position and orientation from numerical integration. Shortcomings From physics we have:
E N D
S T A N F O R D Models in Prior Work • Predict velocities and angular rates: • f: learned from data. • Obtain position and orientation from numerical integration. Shortcomings • From physics we have: • Body coordinate frame is different at every time step. This makes inertia highly non-linear in the state and very difficult to capture/learn from data. • For most physical systems, forces and torques have a fairly simple relation to inputs and current state. This simplicity is lost by the change of coordinate frame. First Autonomous Funnel • Aerobatic maneuver. • Method: model-based reinforcement learning. • Simulator: • Acceleration prediction. • Longer time-scale criterion. • Acknowledgments: control is joint work with Adam Coates, Ben Tse. (Paper forthcoming.) Rotation between body coordinate frames at times t and t+1 Accelerations Video available. Simulator Accuracy • Our acceleration prediction model • Predict accelerations: • f: learned from data. • Obtain velocity, angular rates, position and orientation from numerical integration. Advantages • No need to learn inertia from data. Constraints from physics are incorporated explicitly. • The relation between state, inputs and accelerations is not cluttered by the change of coordinate frame, and thus easier to learn from data. • Standard learning criteria • Frequency domain fitting: requires a linear model, used in CIFER (industry standard). • Minimize one-step prediction error: • For f linear in state s and inputs u: f can be found by linear regression. Longer time-scale criterion • Accuracy of simulation over longer time-scales is important for control. The following longer time-scale criterion was suggested in [Abbeel & Ng, 2004]: (H: time-scale of interest) • EM-algorithm for maximization is expensive in our continuous state-action space setting. We present a simple and fast algorithm for (approximately) minimizing the average squared error over a certain duration. • Sketch of algorithmic idea (see paper for full algorithm) • Model: • One step prediction at time t: • One step prediction at time t+1: • Two step prediction at time t: • Therefore, can approximate multiple-step dynamics by linear combination of one-step dynamics. • Our algorithm iterates the following two steps: • Compute estimate of st+1 given st, ut, ut+1 for current model A,B. • Estimate Bergen Industrial Twin XCell Tempest • Observations • Acceleration prediction model significantly better. Reasons: • Captures gravity exactly. • Captures inertia, thus side-slip effects in the data. • Longer time scale criterion outperforms CIFER, which in turn outperforms the one-step criterion. • Differences more significant for Tempest than for Bergen, since Bergen data is mostly around hover. Legend Linear model, one-step prediction error. Linear model, frequency domain fit with CIFER. Linear model, longer time scale prediction error. Acceleration model, one-step prediction error. Acceleration model, longer time scale prediction error. Helicopter State and Inputs • 12-D state: • 8-D state: • u1, u2: The longitudinal (front-back) and latitudinal (left-right) cyclic pitch controls cause the helicopter to pitch forward/backward or roll sideways. • u3: The tail rotor collective pitch control affects tail rotor thrust, and can be used to yaw (turn) the helicopter. • u4: The main rotor collective pitch control affects the main rotor's thrust. Orientation: roll, pitch, yaw Angular rates Position Velocity Encode symmetries using body (=robot-centric) coordinates Body coordinate frame attached to helicopter S T A N F O R D Learning Vehicular Dynamics, with Application to Modeling Helicopters Overview • Model-based reinforcement learning has been very successful. • State-of-the-art: • Reinforcement learning returns policies that fly well in simulation. • Remaining helicopter failures typically caused by inaccurate simulation. • Key technical challenge: Building an accurate simulator. • Our approach: • Encode all constraints known from physics. (Gravity, inertia, etc.) Learn only parts of model not determined by physics. • Explicitly learn simulation that is predictive at long time-scales. • Result • Significantly improved helicopter model. • First autonomous funnel (aerobatic maneuver) using our model. RC Helicopters Bergen Industrial Twin Pieter Abbeel, Varun Ganapathi, Andrew Y. Ng XCell Tempest Conclusion • Key technical challenge for model-based reinforcement learning applied to helicopters: building an accurate simulator. • Our approach • By using acceleration-based approach, we can encode all constraints known from physics. (Gravity, inertia, etc.) Learn only parts of model not determined by physics. • Explicitly learn simulation that is predictive at long time-scales. • Result • Significantly improved helicopter model. • First autonomous funnel (aerobatic maneuver) using our model.