Dissertation > Excellent graduate degree dissertation topics show

Some New Algorithms of Reinforcement Learning and Their Theoretical Study

Author: YinZuoZuo
Tutor: WangHanXing
School: Shanghai University
Course: Operational Research and Cybernetics
Keywords: To stimulate learning Markov Decision Processes Algorithm Risk-sensitive Optimal strategy
CLC: O234
Type: PhD thesis
Year: 2006
Downloads: 243
Quote: 1
Read: Download Dissertation

Abstract


This dissertation consists of two parts on the whole. We give some new algorithms for reinforcement learning in the first part. Our motivation in this part is to develop existing algorithms that are confronted with some problems such as curse of dimension and computational speed. In the second one, we investigate some theory problems that relative to optimal equations and optimal solutions of reinforcement learning based on the conception of risk-sensitive.In this dissertation, firstly, a new reinforcement learning (RL, abbreviation) algorithm is proposed we refer it to the forgetting algorithm of RL. The idea of this algorithm is based on the following considerations: The updating length of state value functions space about existing algorithms for reinforcement learning are all determined only by the temporal remember of which the times of visiting states. Using those methods, the system must remember all of value functions of states whether look-up tables or function approximators are applied. So, this kind of algorithm will bring on needing of remember capacity increasing exponentially, and therefore computational speed of the algorithm will be decreased exponentially. Based on the consideration above, the principles of forgetting in remembering psychology are introduced in this section to investigate the reinforcement learning about value functions, specially for SARSA(λ) algorithm, and we established a class of forgetting algorithms be fit for value functions reinforcement learning.We propose a new algorithm for reinforcement learning based-on utility clustering method. Some conceptions in POMDP are applied in the model of this kind of algorithm. The main problem in U-tree algorithm is the creating of margin nodes with randomness and computational loads for statistical inspection. All of extended methods have not solved this problem above. My dissertation advance a new reinforcement learning based-on utility clustering method called after U-Clustering. This algorithm does not need to creating and inspection for margin nodes, instead of that it clusters the instances firstly according to values of observed actions of the instances chain, and then selects the features for every cluster, finally, compresses those features. The compressed features will become the new leaf nodes.This dissertation contributes the dynamic merge algorithm for solving partially observed Markov decision processes (POMDP). In this method, the conception of region is applied, and an region system on state space of environment is established. The Agent implements its optimal goals independently on each regions of region system, which likes work parallel in some of Agents. And then, we shall merge these optimal functions of each region in some way to one and get the optimal solution of POMDE In this section, we also analyze the complexity of this new algorithm and do simulation for its results. We apply the famous simulation system, New York Driving, for our algorithm, and the results show that the dynamic merge algorithm is a very effective one to solve large state space problems.This dissertation proposes the generalized average algorithm for reinforcement learning Agent based-on the conception of risk-sensitive Markov decision processes. The motivation of this algorithm is to obtain the robustness for a kind of reinforcement learning algorithms via immolation of optimality potentially. The reason for proposing this algorithm is that the robustness will become a very important problem if non-matching exists between theoretic model and practice physical system, or workability of control actions will change along with the time. Here we investigate the dynamic programming algorithm in reinforcement learning through substituting the maximum or supremum operator to generalized average operator and also discuss its convergence. The goal of this idea is to improve robustness for a kind of reinforcement learning algorithms.We contribute a new idea to the risk-sensitive evolution policy iteration algorithm for solving reinforcement learning problem and discuss the optimality of polices for this algorithm. This idea comes from the appearance of the curse of dimension in computational process, for example, in Markov decision processes, it’s not practical for improving policy computation using the general policy iteration or value iteration method. The problem in this algorithm here we focus on is how to obtain optimal or good policies when the action space is very large and the state space small relatively. In the processing of policy improving, this algorithm deal with the policy problem directly via the method of policy transition, instead of having no use for maximizing computation to value functions on whole action space.In the end of this dissertation, we investigate the optimal equations for solving multi-time risk-sensitive Markov decision processes problem. It is more and more important to practical applications with large-scale control problems because of needing that the Agent can accommodate more complex environment. A more complex and practical environment is employed in this section we refer it to be multi-time Markov decision processes model. Moreover, the conception of risk-sensitive is applied and the conception of multi-time risk-sensitive Markov decision processes is proposed firstly. This is a new kind of problems to solving reinforcement learning algorithm based-on Markov models. We use the conceptions of utility function and risk-sensitive to discuss the problems of two-time scale risk-sensitive Markov decision control case. And then we obtain the formal Bellman’s optimal control equation for two-time scale risk-sensitive Markov decision processes and prove that the optimal value function can be satisfied this Bellman’s optimal control equation. Meanwhile, we also give some relative results and their proofs.

Related Dissertations

  1. Research on Scheduling of Whole-set Orders in JSP Based on Differential Evolution Algorithm,F273
  2. Research on Graph-Based Algorithm for Tagsnps Selection,Q78
  3. Research and Realization on Synchronization Technology of High Sensitivity GNSS Software Receiver,P228.4
  4. Development of the Platform for Compressor Optimization Design and Aerodynamic Optimization Design in the Transonic Compressor,TH45
  5. Effectiveness Evaluation on the Jointed Combat of the Multiple Missiles and Research on Combinatorial Optimization Algorithm,TJ760.1
  6. The Inductive Load Based Vehicle Body Network Control System,U463.6
  7. Reseach on Optimal Control of Elevator Group Based upon Ant Colony Algorithm,TU857
  8. Research on Temprature Controling Technology of Laser Diode with Thermoelectric Cooler,TN248.4
  9. The AES Algorithm and Its Implementation in DSP,TN918.1
  10. Research on UWB Location Technology Using UWB Radio Signal,TN929.5
  11. The Algorithm of DFT with a Subset of Output Points Based on TS101 and Its Software Implementation,TN911.72
  12. Study on Estimation of Two Dimensional Direction of Arrival Using a DBF Receiver,TN851
  13. Optimizing and Realising Research on Vedio Compression in TV Guidance System,TN919.81
  14. Study on Channel Coding Algorithm in IEEE802.16e,TN911.22
  15. Research on Decoding Algorithm for LDPC Codes,TN911.22
  16. Research on Parallel Frequent Graph Pattern Mining,TP311.13
  17. The Fatigue State Recognition of the Driver Based on Eye Detection,TP391.41
  18. Research and Implementation on Content-Based Clothing Image Retrieval,TP391.41
  19. Research on Removal Algorithm of Shadows in Image Segmentation,TP391.41
  20. Research on Navigation System Related Technology for Moving Objects under Dynamic Environment,TP301.6
  21. Research and Implementation for Method of AUTOSAR System Modeling,TP311.52

CLC: > Mathematical sciences and chemical > Mathematics > Control theory, information theory ( mathematical theory ) > Machine learning theory
© 2012 www.DissertationTopic.Net  Mobile