Adaptive Behavior, 4 (1)

^ ISAB Home
> SAB '08
> New Login
> Log In
^ Journal
> Conferences
x Members
x News
> Joining ISAB
> ISAB Officers
> Contact ISAB

Adaptive Behavior

Volume 4, Number 1

Summer 1995

Table of Contents

 

Jean-Louis Deneubourg

Editorial

 

Mance E. Harmon, Leemon C. Baird III, and A. Harry Klopf

Reinforcement Learning Applied to a Differential Game

Adaptive Behavior, 4 (1), 3-28.

 

Federico Cecconi, Filippo Menczer, and Richard K. Belew

Maturation and the Evolution of Imitative Learning in Artificial Organisms

Adaptive Behavior, 4 (1), 29-50.

 

Maja J. Mataric´

Designing and Understanding Adaptive Group Behavior

Adaptive Behavior, 4 (1), 51-80.

 

Inman Harvey

Relearning and Evolution in Neural Networks


Pages 1-2

Editorial

By Jean-Louis Deneubourg


Pages 3-28

Reinforcement Learning Applied to a Differential Game

By Mance E. Harmon, Leemon C. Baird III, A. Harry Klopf

Abstract

An application of reinforcement learning to a linear-quadratic, differential game is presented. The reinforcement learning system uses a recently developed algorithm, the residual-gradient form of advantage updating. The game is a Markov decision process with continuous time, states, and actions, linear dynamics, and a quadratic cost function. The game consists of two players, a missile and a plane; the missile pursues the plane and the plane evades the missile. Although a missile and plane scenario was the chosen test bed, the reinforcement learning approach presented here is equally applicable to biologically based systems, such as a predator pursuing prey. The reinforcement learning algorithm for optimal control is modified for differential games to find the minimax point rather than the maximum. Simulation results are compared to the analytical solution, demonstrating that the simulated reinforcement learning system converges to the optimal answer. The performance of both the residual-gradient and non-residual-gradient forms of advantage updating and Q-learning are compared, demonstrating that advantage updating converges faster than Q-learning in all simulations.
Advantage updating also is demonstrated to converge regardless of the time step duration; Q-learning is unable to converge as the time step duration grows small.

Key Words

reinforcement learning; advantage updating; dynamic programming; differential games


Pages 29-50

Maturation and the Evolution of Imitative Learning in Artificial Organisms

By Federico Cecconi, Filippo Menczer, Richard K. Belew

Abstract

The traditional explanation of delayed maturation age, as part of an evolved life history, focuses on the increased costs of juvenile mortality due to early maturation. Prior quantitative models of these trade-offs, however, have addressed only morphological phenotypic traits, such as body size. We argue that the development of behavioral skills prior to reproductive maturity also constitutes an advantage of delayed maturation and thus should be included among the factors determining the trade-off for optimal age at maturity. Empirical support for this hypothesis from animal field studies is abundant.
This article provides further evidence drawn from simulation experiments. Latent energy environments (LEE) are a class of tightly controlled environments in which learning organisms are modeled by neural networks and evolve according to a type of genetic algorithm. An advantage of this artificial world is that it becomes possible to discount all nonbehavioral costs of early maturity in order to focus exclusively on behavioral consequences. Despite large selective costs imposed on parental fitness due to prolonged immaturity, the optimal age at maturity is shown to be significantly delayed when offspring learn from their parents' behavior via imitation.

Key Words

age at maturity; imitative learning; latent energy environments (LEE); genetic algorithms; neural networks; Baldwin effect; genetic assimilation


Pages 51-80

Designing and Understanding Adaptive Group Behavior

By Maja J. Mataric´

Abstract

This article proposes the concept of basis behaviors as ubiquitous general building blocks for synthesizing artificial group behavior in multiagent systems and for analyzing group behavior in nature. We demonstrate the concept through examples implemented both in simulation and on a group of physical mobile robots. The basis behavior set we propose, consisting of avoidance, safe-wandering, following, aggregation, dispersion, and homing, is constructed from behaviors commonly observed in a variety of species in nature. The proposed behaviors are manifested spatially but have an effect on more abstract modes of interaction, including the exchange of information and cooperation. We demonstrate how basis behaviors can be combined into higher-level group behaviors commonly observed across species. The combination mechanisms we propose are useful for synthesizing a variety of new group behaviors, as well as for analyzing naturally occurring ones.

Key Words

group behavior; robotics; ethology; social interaction; collective intelligence; foraging


Pages 81-84

Relearning and Evolution in Neural Networks

By Inman Harvey



back to TOC, back to top

18:18 GMT; 22/03/08
Comments or Questions? Contact Us..                 Copyright © 2008, ISAB.   All rights reserved.