TCCT通讯 Newsletter 2019年第3期

内容简介：　　　

　　Reinforcement learning (RL) and adaptive dynamic programming (ADP) has been one of the most critical research fields in science and engineering for modern complex systems. This book describes the latest RL and ADP techniques for decision and control in human engineered systems, covering both single player decision and control and multi-player games. Edited by the pioneers of RL and ADP research, the book brings together ideas and methods from many fields and provides an important and timely guidance on controlling a wide variety of systems, such as robots, industrial processes, and economic decision-making.

英文目录：
Preface
Contributors
Part I: Feedback Control Using RL and ADP
Chapter 1: Reinforcement Learning and Approximate Dynamic Programming (RLADP)—Foundations, Common Misconceptions, and the Challenges Ahead
　　1.1 Introduction
　　1.2 What is RLADP?
　　1.3 Some Basic Challenges in Implementing ADP
Chapter 2: Stable Adaptive Neural Control of Partially Observable Dynamic Systems
　　2.1 Introduction
　　2.2 Background
　　2.3 Stability Bias
　　2.4 Example Application
Chapter 3: Optimal Control of Unknown Nonlinear Discrete-Time Systems Using the Iterative Globalized Dual Heuristic Programming Algorithm
　　3.1 Background Material
　　3.2 Neuro-Optimal Control Scheme Based on the Iterative ADP Algorithm
　　3.3 Generalization
　　3.4 Simulation Studies
　　3.5 Summary
Chapter 4: Learning and Optimization in Hierarchical Adaptive Critic Design
　　4.1 Introduction
　　4.2 Hierarchical ADP Architecture with Multiple-Goal Representation
　　4.3 Case Study: The Ball-and-Beam System
　　4.4 Conclusions and Future Work
Chapter 5: Single Network Adaptive Critics Networks—Development, Analysis, and Applications
　　5.1 Introduction
　　5.2 Approximate Dynamic Programing
　　5.3 SNAC
　　5.4 J-SNAC
　　5.5 Finite-SNAC
　　5.6 Conclusions
Chapter 6: Linearly Solvable Optimal Control
　　6.1 Introduction
　　6.2 Linearly Solvable Optimal Control Problems
　　6.3 Extension to Risk-Sensitive Control and Game Theory
　　6.4 Properties and Algorithms
　　6.5 Conclusions and Future Work
Chapter 7: Approximating Optimal Control with Value Gradient Learning
　　7.1 Introduction
　　7.2 Value Gradient Learning and BPTT Algorithms
　　7.3 A Convergence Proof for VGL(1) for Control with Function Approximation
　　7.4 Vertical Lander Experiment
　　7.5 Conclusions
Chapter 8: A Constrained Backpropagation Approach to Function Approximation and Approximate Dynamic Programming
　　8.1 Background
　　8.2 Constrained Backpropagation (CPROP) Approach
　　8.3 Solution of Partial Differential Equations in Nonstationary Environments
　　8.4 Preserving Prior Knowledge in Exploratory Adaptive Critic Designs
　　8.5 Summary
Chapter 9: Toward Design of Nonlinear ADP Learning Controllers with Performance Assurance
　　9.1 Introduction
　　9.2 Direct Heuristic Dynamic Programming
　　9.3 A Control Theoretic View on the Direct HDP
　　9.4 Direct HDP Design with Improved Performance Case 1—Design Guided by a Priori LQR Information
　　9.5 Direct HDP Design with Improved Performance Case 2—Direct HDP for Coorindated Damping Control of Low-Frequency Oscillation
　　9.6 Summary
Chapter 10: Reinforcement Learning Control with Time-Dependent Agent Dynamics
　　10.1 Introduction
　　10.2 Q-Learning
　　10.3 Sampled Data Q-Learning
　　10.4 System Dynamics Approximation
　　10.5 Closing Remarks
Chapter 11: Online Optimal Control of Nonaffine Nonlinear Discrete-Time Systems without Using Value and Policy Iterations
　　11.1 Introduction
　　11.2 Background
　　11.3 Reinforcement Learning Based Control
　　11.4 Time-Based Adaptive Dynamic Programming-Based Optimal Control
　　11.5 Simulation Result
Chapter 12: An Actor-Critic-Identifier Architecture for Adaptive Approximate Optimal Control
　　12.1 Introduction
　　12.2 Actor-Critic-Identifier Architecture for HJB Approximation
　　12.3 Actor-Critic Design
　　12.4 Identifier Design
　　12.5 Convergence and Stability Analysis
　　12.6 Simulation
　　12.7 Conclusion
Chapter 13: Robust Adaptive Dynamic Programming
　　13.1 Introduction
　　13.2 Optimality Versus Robustness
　　13.3 Robust-ADP Design for Disturbance Attenuation
　　13.4 Robust-ADP for Partial-State Feedback Control
　　13.5 Applications
　　13.6 Summary
Part II: Learning and Control in Multiagent Games
Chapter 14: Hybrid Learning in Stochastic Games and Its Application in Network Security
　　14.1 Introduction
　　14.2 Two-Person Game
　　14.3 Learning in NZSGs
　　14.4 Main Results
　　14.5 Security Application
　　14.6 Conclusions and future works
Chapter 15: Integral Reinforcement Learning for Online Computation of Nash Strategies of Nonzero-Sum Differential Games
　　15.1 Introduction
　　15.2 Two-Player Games and Integral Reinforcement Learning
　　15.3 Continuous-Time Value Iteration to Solve the Riccati Equation
　　15.4 Online Algorithm to Solve Nonzero-Sum Games
　　15.5 Analysis of the Online Learning Algorithm for NZS Games
　　15.6 Simulation Result for the Online Game Algorithm
　　15.7 Conclusion
Chapter 16: Online Learning Algorithms for Optimal Control and Dynamic Games
　　16.1 Introduction
　　16.2 Optimal Control and the Continuous Time Hamilton–Jacobi–Bellman Equation
　　16.3 Online Solution of Nonlinear Two-Player Zero-Sum Games and Hamilton-Jacobi-Isaacs Equation
　　16.4 Online Solution of Nonlinear Nonzero-Sum Games and Coupled Hamilton–Jacobi Equations
Part III: Foundations in MDP and RL
Chapter 17: Lambda-Policy Iteration: A Review and a New Implementation
　　17.1 Introduction
　　17.2 Lambda-Policy Iteration without Cost Function Approximation
　　17.3 Approximate Policy Evaluation Using Projected Equations
　　17.4 Lambda-Policy Iteration with Cost Function Approximation
　　17.5 Conclusions
Chapter 18: Optimal Learning and Approximate Dynamic Programming
　　18.1 Introduction
　　18.2 Modeling
　　18.3 The Four Classes of Policies
　　18.4 Basic Learning Policies for Policy Search
　　18.5 Optimal Learning Policies for Policy Search
　　18.6 Learning with a Physical State
Chapter 19: An Introduction to Event-Based Optimization: Theory and Applications
　　19.1 Introduction
　　19.2 Literature Review
　　19.3 Problem Formulation
　　19.4 Policy Iteration for EBO
　　19.5 Example: Material Handling Problem
　　19.6 Conclusions
Chapter 20: Bounds for Markov Decision Processes
　　20.1 Introduction
　　20.2 Problem Formulation
　　20.3 The Linear Programming Approach
　　20.4 The Martingale Duality Approach
　　20.5 The Pathwise Optimization Method
　　20.6 Applications
　　20.7 Conclusion
Chapter 21: Approximate Dynamic Programming and Backpropagation on Timescales
　　21.1 Introduction: Timescales Fundamentals
　　21.2 Dynamic Programming
　　21.3 Backpropagation
　　21.4 Conclusions
Chapter 22: A Survey of Optimistic Planning in Markov Decision Processes
　　22.1 Introduction
　　22.2 Optimistic Online Optimization
　　22.3 Optimistic Planning Algorithms
　　22.4 Related Planning Algorithms
　　22.5 Numerical Example
Chapter 23: Adaptive Feature Pursuit: Online Adaptation of Features in Reinforcement Learning
　　23.1 Introduction
　　23.2 The Framework
　　23.3 The Feature Adaptation Scheme
　　23.4 Convergence Analysis
　　23.5 Application to Traffic Signal Control
　　23.6 Conclusions
Chapter 24: Feature Selection for Neuro-Dynamic Programming
　　24.1 Introduction
　　24.2 Optimality Equations
　　24.3 Neuro-Dynamic Algorithms
　　24.4 Fluid Models
　　24.5 Diffusion Models
　　24.6 Mean Field Games
　　24.7 Conclusions
Chapter 25: Approximate Dynamic Programming for Optimizing Oil Production
　　25.1 Introduction
　　25.2 Petroleum Reservoir Production Optimization Problem
　　25.3 Review of Dynamic Programming and Approximate Dynamic Programming
　　25.4 Approximate Dynamic Programming Algorithm for Reservoir Production Optimization
　　25.5 Simulation Results
　　25.6 Concluding Remarks
Chapter 26: A Learning Strategy for Source Tracking in Unstructured Environments
　　26.1 Introduction
　　26.2 Reinforcement Learning
　　26.3 Light-Following Robot
　　26.4 Simulation Results
　　26.5 Experimental Results
　　26.6 Conclusions and Future Work
References
Index

	控制理论专业委员会 ©2011-2019 版权所有
中国自动化学会控制理论专业委员会电话：86-10-82541403；Email：tcct@iss.ac.cn