图书信息:

丛 书 名:IEEE Press Series on Computational Intelligence
书  名:Reinforcement Learning and Approximate Dynamic Programming for Feedback Control
作  者:Frank L. Lewis, Derong Liu
出 版 社:Wiley-IEEE Press
出版日期:2012年12月
语  种:英文
I S B N:9781118104200
页  数:648

内容简介:   

  Reinforcement learning (RL) and adaptive dynamic programming (ADP) has been one of the most critical research fields in science and engineering for modern complex systems. This book describes the latest RL and ADP techniques for decision and control in human engineered systems, covering both single player decision and control and multi-player games. Edited by the pioneers of RL and ADP research, the book brings together ideas and methods from many fields and provides an important and timely guidance on controlling a wide variety of systems, such as robots, industrial processes, and economic decision-making.

英文目录:
Preface
Contributors
Part I: Feedback Control Using RL and ADP
Chapter 1: Reinforcement Learning and Approximate Dynamic Programming (RLADP)—Foundations, Common Misconceptions, and the Challenges Ahead
  1.1 Introduction
  1.2 What is RLADP?
  1.3 Some Basic Challenges in Implementing ADP
Chapter 2: Stable Adaptive Neural Control of Partially Observable Dynamic Systems
  2.1 Introduction
  2.2 Background
  2.3 Stability Bias
  2.4 Example Application
Chapter 3: Optimal Control of Unknown Nonlinear Discrete-Time Systems Using the Iterative Globalized Dual Heuristic Programming Algorithm
  3.1 Background Material
  3.2 Neuro-Optimal Control Scheme Based on the Iterative ADP Algorithm
  3.3 Generalization
  3.4 Simulation Studies
  3.5 Summary
Chapter 4: Learning and Optimization in Hierarchical Adaptive Critic Design
  4.1 Introduction
  4.2 Hierarchical ADP Architecture with Multiple-Goal Representation
  4.3 Case Study: The Ball-and-Beam System
  4.4 Conclusions and Future Work
Chapter 5: Single Network Adaptive Critics Networks—Development, Analysis, and Applications
  5.1 Introduction
  5.2 Approximate Dynamic Programing
  5.3 SNAC
  5.4 J-SNAC
  5.5 Finite-SNAC
  5.6 Conclusions
Chapter 6: Linearly Solvable Optimal Control
  6.1 Introduction
  6.2 Linearly Solvable Optimal Control Problems
  6.3 Extension to Risk-Sensitive Control and Game Theory
  6.4 Properties and Algorithms
  6.5 Conclusions and Future Work
Chapter 7: Approximating Optimal Control with Value Gradient Learning
  7.1 Introduction
  7.2 Value Gradient Learning and BPTT Algorithms
  7.3 A Convergence Proof for VGL(1) for Control with Function Approximation
  7.4 Vertical Lander Experiment
  7.5 Conclusions
Chapter 8: A Constrained Backpropagation Approach to Function Approximation and Approximate Dynamic Programming
  8.1 Background
  8.2 Constrained Backpropagation (CPROP) Approach
  8.3 Solution of Partial Differential Equations in Nonstationary Environments
  8.4 Preserving Prior Knowledge in Exploratory Adaptive Critic Designs
  8.5 Summary
Chapter 9: Toward Design of Nonlinear ADP Learning Controllers with Performance Assurance
  9.1 Introduction
  9.2 Direct Heuristic Dynamic Programming
  9.3 A Control Theoretic View on the Direct HDP
  9.4 Direct HDP Design with Improved Performance Case 1—Design Guided by a Priori LQR Information
  9.5 Direct HDP Design with Improved Performance Case 2—Direct HDP for Coorindated Damping Control of Low-Frequency Oscillation
  9.6 Summary
Chapter 10: Reinforcement Learning Control with Time-Dependent Agent Dynamics
  10.1 Introduction
  10.2 Q-Learning
  10.3 Sampled Data Q-Learning
  10.4 System Dynamics Approximation
  10.5 Closing Remarks
Chapter 11: Online Optimal Control of Nonaffine Nonlinear Discrete-Time Systems without Using Value and Policy Iterations
  11.1 Introduction
  11.2 Background
  11.3 Reinforcement Learning Based Control
  11.4 Time-Based Adaptive Dynamic Programming-Based Optimal Control
  11.5 Simulation Result
Chapter 12: An Actor-Critic-Identifier Architecture for Adaptive Approximate Optimal Control
  12.1 Introduction
  12.2 Actor-Critic-Identifier Architecture for HJB Approximation
  12.3 Actor-Critic Design
  12.4 Identifier Design
  12.5 Convergence and Stability Analysis
  12.6 Simulation
  12.7 Conclusion
Chapter 13: Robust Adaptive Dynamic Programming
  13.1 Introduction
  13.2 Optimality Versus Robustness
  13.3 Robust-ADP Design for Disturbance Attenuation
  13.4 Robust-ADP for Partial-State Feedback Control
  13.5 Applications
  13.6 Summary
Part II: Learning and Control in Multiagent Games
Chapter 14: Hybrid Learning in Stochastic Games and Its Application in Network Security
  14.1 Introduction
  14.2 Two-Person Game
  14.3 Learning in NZSGs
  14.4 Main Results
  14.5 Security Application
  14.6 Conclusions and future works
Chapter 15: Integral Reinforcement Learning for Online Computation of Nash Strategies of Nonzero-Sum Differential Games
  15.1 Introduction
  15.2 Two-Player Games and Integral Reinforcement Learning
  15.3 Continuous-Time Value Iteration to Solve the Riccati Equation
  15.4 Online Algorithm to Solve Nonzero-Sum Games
  15.5 Analysis of the Online Learning Algorithm for NZS Games
  15.6 Simulation Result for the Online Game Algorithm
  15.7 Conclusion
Chapter 16: Online Learning Algorithms for Optimal Control and Dynamic Games
  16.1 Introduction
  16.2 Optimal Control and the Continuous Time Hamilton–Jacobi–Bellman Equation
  16.3 Online Solution of Nonlinear Two-Player Zero-Sum Games and Hamilton-Jacobi-Isaacs Equation
  16.4 Online Solution of Nonlinear Nonzero-Sum Games and Coupled Hamilton–Jacobi Equations
Part III: Foundations in MDP and RL
Chapter 17: Lambda-Policy Iteration: A Review and a New Implementation
  17.1 Introduction
  17.2 Lambda-Policy Iteration without Cost Function Approximation
  17.3 Approximate Policy Evaluation Using Projected Equations
  17.4 Lambda-Policy Iteration with Cost Function Approximation
  17.5 Conclusions
Chapter 18: Optimal Learning and Approximate Dynamic Programming
  18.1 Introduction
  18.2 Modeling
  18.3 The Four Classes of Policies
  18.4 Basic Learning Policies for Policy Search
  18.5 Optimal Learning Policies for Policy Search
  18.6 Learning with a Physical State
Chapter 19: An Introduction to Event-Based Optimization: Theory and Applications
  19.1 Introduction
  19.2 Literature Review
  19.3 Problem Formulation
  19.4 Policy Iteration for EBO
  19.5 Example: Material Handling Problem
  19.6 Conclusions
Chapter 20: Bounds for Markov Decision Processes
  20.1 Introduction
  20.2 Problem Formulation
  20.3 The Linear Programming Approach
  20.4 The Martingale Duality Approach
  20.5 The Pathwise Optimization Method
  20.6 Applications
  20.7 Conclusion
Chapter 21: Approximate Dynamic Programming and Backpropagation on Timescales
  21.1 Introduction: Timescales Fundamentals
  21.2 Dynamic Programming
  21.3 Backpropagation
  21.4 Conclusions
Chapter 22: A Survey of Optimistic Planning in Markov Decision Processes
  22.1 Introduction
  22.2 Optimistic Online Optimization
  22.3 Optimistic Planning Algorithms
  22.4 Related Planning Algorithms
  22.5 Numerical Example
Chapter 23: Adaptive Feature Pursuit: Online Adaptation of Features in Reinforcement Learning
  23.1 Introduction
  23.2 The Framework
  23.3 The Feature Adaptation Scheme
  23.4 Convergence Analysis
  23.5 Application to Traffic Signal Control
  23.6 Conclusions
Chapter 24: Feature Selection for Neuro-Dynamic Programming
  24.1 Introduction
  24.2 Optimality Equations
  24.3 Neuro-Dynamic Algorithms
  24.4 Fluid Models
  24.5 Diffusion Models
  24.6 Mean Field Games
  24.7 Conclusions
Chapter 25: Approximate Dynamic Programming for Optimizing Oil Production
  25.1 Introduction
  25.2 Petroleum Reservoir Production Optimization Problem
  25.3 Review of Dynamic Programming and Approximate Dynamic Programming
  25.4 Approximate Dynamic Programming Algorithm for Reservoir Production Optimization
  25.5 Simulation Results
  25.6 Concluding Remarks
Chapter 26: A Learning Strategy for Source Tracking in Unstructured Environments
  26.1 Introduction
  26.2 Reinforcement Learning
  26.3 Light-Following Robot
  26.4 Simulation Results
  26.5 Experimental Results
  26.6 Conclusions and Future Work
References
Index


  控制理论专业委员会 ©2011-2019 版权所有

中国自动化学会 控制理论专业委员会
电话:86-10-82541403;Email:tcct@iss.ac.cn