|
Adaptive Behavior, 6 (2) |
||||||||||||
|
Adaptive BehaviorVolume 6, Number 2Fall 1997Table of ContentsJuan C. Santamaría, Richard S. Sutton, and Ashwin RamExperiments with Reinforcement Learning in Problems with Continuous State and Action SpacesAdaptive Behavior, 6 (2), 163-217.Marco Wiering and Jürgen SchmidhuberHQ-LearningAdaptive Behavior, 6 (2), 219-246.Philip Goetz and Deborah WaltersThe Dynamics of Recurrent Behavior NetworksAdaptive Behavior, 6 (2), 247-283.Ezequiel A. Di PaoloAn Investigation into the Evolution of CommunicationAdaptive Behavior, 6 (2), 285-324.Nick JakobiEvolutionary Robotics and the Radical Envelope-of-Noise HypothesisAdaptive Behavior, 6 (2), 325-368.Experiments with Reinforcement Learning in Problems with Continuous State and Action SpacesBy Juan C. Santamaría, Richard S. Sutton, and Ashwin RamAbstractA key element in the solution of reinforcement learning problems is the value function. The purpose of this function is to measure the long-term utility or value of any given state. The function is important because an agent can use this measure to decide what to do next. A common problem in reinforcement learning when applied to systems having continuous states and action spaces is that the value function must operate with a domain consisting of real-valued variables, which means that it should be able to represent the value of infinitely many state and action pairs. For this reason, function approximators are used to represent the value function when a close-form solution of the optimal policy is not available. In this article, we extend a previously proposed reinforcement learning algorithm so that it can be used with function approximators that generalize the value of individual experiences across both state and action spaces. In particular, we discuss the benefits of using sparse coarse-coded function approximators to represent value functions and describe in detail three implementations: cerebellar model articulation controllers, instance-based, and case-based. Additionally, we discuss how function approximators having different degrees of resolution in different regions of the state and action spaces may influence the performance and learning efficiency of the agent. We propose a simple and modular technique that can be used to implement function approximators with nonuniform degrees of resolution so that the value function can be represented with higher accuracy in important regions of the state and action spaces. We performed extensive experiments in the double-integrator and pendulum swing-up systems to demonstrate the proposed ideas.Key Words reinforcement learning; function approximation; memory-based methods; continuous domains; optimal control; resource preallocation
HQ-LearningBy Marco Wiering and Jürgen SchmidhuberAbstractHQ-learning is a hierarchical extension of Q(lambda)-learning designed to solve certain types of partially observable Markov decision problems (POMDPs). HQ automatically decomposes POMDPs into sequences of simpler subtasks that can be solved by memoryless policies learnable by reactive subagents. HQ can solve partially observable mazes with more states than those used in most previous POMDP work.Key Words reinforcement learning; hierarchical Q-learning; POMDPs; non-Markov; subgoal learning
The Dynamics of Recurrent Behavior NetworksBy Philip Goetz and Deborah WaltersAbstractIf behavior networks, which use spreading activation to select actions, are analogous to connectionist methods of pattern recognition, then we suggest that recurrent behavior networks, which use energy minimization, are analogous to Hopfield networks. Hopfield networks memorize patterns by making them attractors. We argue that, similarly, each behavior of a recurrent behavior network should be an attractor of the network, to inhibit fruitless, repeated switching between different behavior in response to small changes in the environment and in motivations. We demonstrate that the performance in a test domain of the Do the Right Thing recurrent behavior network is improved by redesigning it to create desirable attractors and basins of attraction. We further show that this performance increase is correlated with an increase in persistence and a decrease in undesirable behavior switching.Key Words action selection; decision making; attractors; behavior networks; pattern recognition; nonlinearity
An Investigation into the Evolution of CommunicationBy Ezequiel A. Di PaoloAbstractThis article presents a theoretical criticism of current approaches to the study of the evolution of communication. In particular, two very common preconceptions about the subject are analyzed: the role of natural selection in the definition of the phenomenon of communication and the metaphor of communication as information exchange. An alternative characterization is presented in terms of autopoietic theory, which avoids the mentioned preconceptions. In support of this view, the evolution of coordinated activity is studied in a population of artificial agents playing an interactional game. Dynamical modeling of this evolutionary process based on game-theoretical considerations shows the existence of an evolutionarily stable strategy in the total lack of coordinated activity which, however, may be unreachable due to the presence of a periodic attractor. In a computational model of the same game, action coordination evolves even with individual costs against it, due to the presence of spatial structuring processes. A detailed explanation of this phenomenon, which does not require kin selection, is presented. In an extended game, recursive coordination evolves nontrivially when the participants share all the relevant information, demonstrating that the metaphor of information exchange can be misleading. It is shown that agents engaged in this sort of interaction are able to perform beyond their individual capabilities.Key Words evolution of communication; autopoiesis; action coordination; spatiotemporal constraints
Evolutionary Robotics and the Radical Envelope-of-Noise HypothesisBy Nick JakobiAbstractFor several years now, various researchers have endeavored to apply artificial evolution to the automatic design of control systems for real robots. One of the major challenges they face concerns the question of how to assess the fitness of evolving controllers when each evolutionary run typically involves hundred of thousands of such assessments. This article outlines new ways of thinking about and building simulations upon which such assessments can be performed. It puts forward sufficient conditions for the successful transfer of evolved controllers from simulation to reality and develops a potential methodology for building simulations in which evolving controllers are forced to satisfy these conditions if they are to be reliably fit. It is hypothesized that as long as simulations are built according to this methodology, it does not matter how inaccurate or incomplete they are: Controllers that have evolved to be reliably fit in simulation still will transfer into reality. Two sets of experiments are reported, both of which involve minimal look-up table-based simulations built according to these guidelines. In the first set, controllers were evolved that allowed a Khepera robot to perform a simple memory task in the real world. In the second set, controllers were evolved for the Sussex University gantry robot that were able to distinguish visually a triangle from a square, under extremely noisy real-world conditions, and to steer the robot toward the triangle. In both cases, controllers that were reliably fit in simulation displayed extremely robust behavior when downloaded into reality.Key Words simulations; evolutionary robotics; robot-environment interactions; neural networks
back to TOC, back to top |
||||||||||||
|
|
|||||||||||||
18:18 GMT; 22/03/08 |
Comments or Questions? Contact Us.. Copyright © 2008, ISAB. All rights reserved. |
||||||||||||