内容简介:
Reinforcement learning (RL) and adaptive dynamic programming (ADP) has been one of the most critical research fields in science and engineering for modern complex systems. This book describes the latest RL and ADP techniques for decision and control in human engineered systems, covering both single player decision and control and multi-player games. Edited by the pioneers of RL and ADP research, the book brings together ideas and methods from many fields and provides an important and timely guidance on controlling a wide variety of systems, such as robots, industrial processes, and economic decision-making.
英文目录:
Preface
Contributors
Part I: Feedback Control Using RL and ADP
Chapter 1: Reinforcement Learning and Approximate Dynamic Programming (RLADP)—Foundations, Common Misconceptions, and the Challenges Ahead
1.1 Introduction
1.2 What is RLADP?
1.3 Some Basic Challenges in Implementing ADP
Chapter 2: Stable Adaptive Neural Control of Partially Observable Dynamic Systems
2.1 Introduction
2.2 Background
2.3 Stability Bias
2.4 Example Application
Chapter 3: Optimal Control of Unknown Nonlinear Discrete-Time Systems Using the Iterative Globalized Dual Heuristic Programming Algorithm
3.1 Background Material
3.2 Neuro-Optimal Control Scheme Based on the Iterative ADP Algorithm
3.3 Generalization
3.4 Simulation Studies
3.5 Summary
Chapter 4: Learning and Optimization in Hierarchical Adaptive Critic Design
4.1 Introduction
4.2 Hierarchical ADP Architecture with Multiple-Goal Representation
4.3 Case Study: The Ball-and-Beam System
4.4 Conclusions and Future Work
Chapter 5: Single Network Adaptive Critics Networks—Development, Analysis, and Applications
5.1 Introduction
5.2 Approximate Dynamic Programing
5.3 SNAC
5.4 J-SNAC
5.5 Finite-SNAC
5.6 Conclusions
Chapter 6: Linearly Solvable Optimal Control
6.1 Introduction
6.2 Linearly Solvable Optimal Control Problems
6.3 Extension to Risk-Sensitive Control and Game Theory
6.4 Properties and Algorithms
6.5 Conclusions and Future Work
Chapter 7: Approximating Optimal Control with Value Gradient Learning
7.1 Introduction
7.2 Value Gradient Learning and BPTT Algorithms
7.3 A Convergence Proof for VGL(1) for Control with Function Approximation
7.4 Vertical Lander Experiment
7.5 Conclusions
Chapter 8: A Constrained Backpropagation Approach to Function Approximation and Approximate Dynamic Programming
8.1 Background
8.2 Constrained Backpropagation (CPROP) Approach
8.3 Solution of Partial Differential Equations in Nonstationary Environments
8.4 Preserving Prior Knowledge in Exploratory Adaptive Critic Designs
8.5 Summary
Chapter 9: Toward Design of Nonlinear ADP Learning Controllers with Performance Assurance
9.1 Introduction
9.2 Direct Heuristic Dynamic Programming
9.3 A Control Theoretic View on the Direct HDP
9.4 Direct HDP Design with Improved Performance Case 1—Design Guided by a Priori LQR Information
9.5 Direct HDP Design with Improved Performance Case 2—Direct HDP for Coorindated Damping Control of Low-Frequency Oscillation
9.6 Summary
Chapter 10: Reinforcement Learning Control with Time-Dependent Agent Dynamics
10.1 Introduction
10.2 Q-Learning
10.3 Sampled Data Q-Learning
10.4 System Dynamics Approximation
10.5 Closing Remarks
Chapter 11: Online Optimal Control of Nonaffine Nonlinear Discrete-Time Systems without Using Value and Policy Iterations
11.1 Introduction
11.2 Background
11.3 Reinforcement Learning Based Control
11.4 Time-Based Adaptive Dynamic Programming-Based Optimal Control
11.5 Simulation Result
Chapter 12: An Actor-Critic-Identifier Architecture for Adaptive Approximate Optimal Control
12.1 Introduction
12.2 Actor-Critic-Identifier Architecture for HJB Approximation
12.3 Actor-Critic Design
12.4 Identifier Design
12.5 Convergence and Stability Analysis
12.6 Simulation
12.7 Conclusion
Chapter 13: Robust Adaptive Dynamic Programming
13.1 Introduction
13.2 Optimality Versus Robustness
13.3 Robust-ADP Design for Disturbance Attenuation
13.4 Robust-ADP for Partial-State Feedback Control
13.5 Applications
13.6 Summary
Part II: Learning and Control in Multiagent Games
Chapter 14: Hybrid Learning in Stochastic Games and Its Application in Network Security
14.1 Introduction
14.2 Two-Person Game
14.3 Learning in NZSGs
14.4 Main Results
14.5 Security Application
14.6 Conclusions and future works
Chapter 15: Integral Reinforcement Learning for Online Computation of Nash Strategies of Nonzero-Sum Differential Games
15.1 Introduction
15.2 Two-Player Games and Integral Reinforcement Learning
15.3 Continuous-Time Value Iteration to Solve the Riccati Equation
15.4 Online Algorithm to Solve Nonzero-Sum Games
15.5 Analysis of the Online Learning Algorithm for NZS Games
15.6 Simulation Result for the Online Game Algorithm
15.7 Conclusion
Chapter 16: Online Learning Algorithms for Optimal Control and Dynamic Games
16.1 Introduction
16.2 Optimal Control and the Continuous Time Hamilton–Jacobi–Bellman Equation
16.3 Online Solution of Nonlinear Two-Player Zero-Sum Games and Hamilton-Jacobi-Isaacs Equation
16.4 Online Solution of Nonlinear Nonzero-Sum Games and Coupled Hamilton–Jacobi Equations
Part III: Foundations in MDP and RL
Chapter 17: Lambda-Policy Iteration: A Review and a New Implementation
17.1 Introduction
17.2 Lambda-Policy Iteration without Cost Function Approximation
17.3 Approximate Policy Evaluation Using Projected Equations
17.4 Lambda-Policy Iteration with Cost Function Approximation
17.5 Conclusions
Chapter 18: Optimal Learning and Approximate Dynamic Programming
18.1 Introduction
18.2 Modeling
18.3 The Four Classes of Policies
18.4 Basic Learning Policies for Policy Search
18.5 Optimal Learning Policies for Policy Search
18.6 Learning with a Physical State
Chapter 19: An Introduction to Event-Based Optimization: Theory and Applications
19.1 Introduction
19.2 Literature Review
19.3 Problem Formulation
19.4 Policy Iteration for EBO
19.5 Example: Material Handling Problem
19.6 Conclusions
Chapter 20: Bounds for Markov Decision Processes
20.1 Introduction
20.2 Problem Formulation
20.3 The Linear Programming Approach
20.4 The Martingale Duality Approach
20.5 The Pathwise Optimization Method
20.6 Applications
20.7 Conclusion
Chapter 21: Approximate Dynamic Programming and Backpropagation on Timescales
21.1 Introduction: Timescales Fundamentals
21.2 Dynamic Programming
21.3 Backpropagation
21.4 Conclusions
Chapter 22: A Survey of Optimistic Planning in Markov Decision Processes
22.1 Introduction
22.2 Optimistic Online Optimization
22.3 Optimistic Planning Algorithms
22.4 Related Planning Algorithms
22.5 Numerical Example
Chapter 23: Adaptive Feature Pursuit: Online Adaptation of Features in Reinforcement Learning
23.1 Introduction
23.2 The Framework
23.3 The Feature Adaptation Scheme
23.4 Convergence Analysis
23.5 Application to Traffic Signal Control
23.6 Conclusions
Chapter 24: Feature Selection for Neuro-Dynamic Programming
24.1 Introduction
24.2 Optimality Equations
24.3 Neuro-Dynamic Algorithms
24.4 Fluid Models
24.5 Diffusion Models
24.6 Mean Field Games
24.7 Conclusions
Chapter 25: Approximate Dynamic Programming for Optimizing Oil Production
25.1 Introduction
25.2 Petroleum Reservoir Production Optimization Problem
25.3 Review of Dynamic Programming and Approximate Dynamic Programming
25.4 Approximate Dynamic Programming Algorithm for Reservoir Production Optimization
25.5 Simulation Results
25.6 Concluding Remarks
Chapter 26: A Learning Strategy for Source Tracking in Unstructured Environments
26.1 Introduction
26.2 Reinforcement Learning
26.3 Light-Following Robot
26.4 Simulation Results
26.5 Experimental Results
26.6 Conclusions and Future Work
References
Index