|
Adaptive Behavior, 4 (1) |
||||||||||||
|
Adaptive BehaviorVolume 4, Number 1Summer 1995Table of ContentsJean-Louis DeneubourgEditorialMance E. Harmon, Leemon C. Baird III, and A. Harry KlopfReinforcement Learning Applied to a Differential GameAdaptive Behavior, 4 (1), 3-28.Federico Cecconi, Filippo Menczer, and Richard K. BelewMaturation and the Evolution of Imitative Learning in Artificial OrganismsAdaptive Behavior, 4 (1), 29-50.Maja J. Mataric´Designing and Understanding Adaptive Group BehaviorAdaptive Behavior, 4 (1), 51-80.Inman HarveyRelearning and Evolution in Neural NetworksPages 1-2 EditorialBy Jean-Louis DeneubourgReinforcement Learning Applied to a Differential GameBy Mance E. Harmon, Leemon C. Baird III, A. Harry KlopfAbstractAn application of reinforcement learning to a linear-quadratic, differential game is presented. The reinforcement learning system uses a recently developed algorithm, the residual-gradient form of advantage updating. The game is a Markov decision process with continuous time, states, and actions, linear dynamics, and a quadratic cost function. The game consists of two players, a missile and a plane; the missile pursues the plane and the plane evades the missile. Although a missile and plane scenario was the chosen test bed, the reinforcement learning approach presented here is equally applicable to biologically based systems, such as a predator pursuing prey. The reinforcement learning algorithm for optimal control is modified for differential games to find the minimax point rather than the maximum. Simulation results are compared to the analytical solution, demonstrating that the simulated reinforcement learning system converges to the optimal answer. The performance of both the residual-gradient and non-residual-gradient forms of advantage updating and Q-learning are compared, demonstrating that advantage updating converges faster than Q-learning in all simulations.Advantage updating also is demonstrated to converge regardless of the time step duration; Q-learning is unable to converge as the time step duration grows small. Key Wordsreinforcement learning; advantage updating; dynamic programming; differential games
Maturation and the Evolution of Imitative Learning in Artificial OrganismsBy Federico Cecconi, Filippo Menczer, Richard K. BelewAbstractThe traditional explanation of delayed maturation age, as part of an evolved life history, focuses on the increased costs of juvenile mortality due to early maturation. Prior quantitative models of these trade-offs, however, have addressed only morphological phenotypic traits, such as body size. We argue that the development of behavioral skills prior to reproductive maturity also constitutes an advantage of delayed maturation and thus should be included among the factors determining the trade-off for optimal age at maturity. Empirical support for this hypothesis from animal field studies is abundant.This article provides further evidence drawn from simulation experiments. Latent energy environments (LEE) are a class of tightly controlled environments in which learning organisms are modeled by neural networks and evolve according to a type of genetic algorithm. An advantage of this artificial world is that it becomes possible to discount all nonbehavioral costs of early maturity in order to focus exclusively on behavioral consequences. Despite large selective costs imposed on parental fitness due to prolonged immaturity, the optimal age at maturity is shown to be significantly delayed when offspring learn from their parents' behavior via imitation. Key Wordsage at maturity; imitative learning; latent energy environments (LEE); genetic algorithms; neural networks; Baldwin effect; genetic assimilation
Designing and Understanding Adaptive Group BehaviorBy Maja J. Mataric´AbstractThis article proposes the concept of basis behaviors as ubiquitous general building blocks for synthesizing artificial group behavior in multiagent systems and for analyzing group behavior in nature. We demonstrate the concept through examples implemented both in simulation and on a group of physical mobile robots. The basis behavior set we propose, consisting of avoidance, safe-wandering, following, aggregation, dispersion, and homing, is constructed from behaviors commonly observed in a variety of species in nature. The proposed behaviors are manifested spatially but have an effect on more abstract modes of interaction, including the exchange of information and cooperation. We demonstrate how basis behaviors can be combined into higher-level group behaviors commonly observed across species. The combination mechanisms we propose are useful for synthesizing a variety of new group behaviors, as well as for analyzing naturally occurring ones.Key Wordsgroup behavior; robotics; ethology; social interaction; collective intelligence; foraging
Pages 81-84 Relearning and Evolution in Neural NetworksBy Inman Harveyback to TOC, back to top |
||||||||||||
|
|
|||||||||||||
18:18 GMT; 22/03/08 |
Comments or Questions? Contact Us.. Copyright © 2008, ISAB. All rights reserved. |
||||||||||||