Implementation on a neural network architecture

Coneural > Ioana Goga > PhD research > Implementation on a neural network architecture

 

Visual attention mechanisms

The attentional module consists of a neural network that integrates top-down attentional cues with bottom-up saliency of the objects perceived (see Figure 5).


Figure 5.
Basic computational unit of the attentional module. For each object location, saliency is compu
ted based on the contrast of two low-level features: color and speed. Inhibition of return allows the switch of attention from the currently attended location. The top down cues are: the direction of the demonstrator's gaze and the position of the demonstrator's hand.

The location of the focus of attention results from a process of multiple constraints satisfaction. The constraints are defined based on psychological data showing that: (1) infants follow the gaze of the caretaker, (2) they prefer watching moving objects vs. static objects, and (3) they gaze hands or objects with a high color contrast over other static objects (see Goga and Billard, 2006c for a detailed description of the attentional mechanism implementation). The behavior exhibited by the learner during the demonstration of the seriate cups task can be seen in the demo bellow.


Object recognition module

The object recognition module is responsible for the visual awareness of the objects existent in the environment. An object recognition unit corresponds to a distinct location of each object. A subnetwork is formed if several objects exist at the same location (see Figure 6).


Figure 6.
Object recognition network corresponding to one location in space. Recognition of an object is made upon integration of five features: size, shape, color, speed, and rotation angle. If an object moves along a trajectory, the network architecture does not change.

The weights in this layer learn the representation of the objects existent at some location. The module allows detection of events occurring on any dimension/feature of the system. Whenever a new event is detected, the superior level (see bellow) is called to handle the occurrence in the system of the new object/event.


Cell assembly module

A cell assembly (CA) receives two types of signals: one from the saliency map and one from the object recognition module. When a new event is detected on a location that is on the focus of attention, the cell assembly layer is informed and takes a decision:

  • creates a cell assembly unit to represent the new category or
  • learns the event in the representation of an existent category.

Three set of weights are adapted at this level, corresponding to:

  • learning of the category/CA features
  • learning of the precedence relationships
  • learning of size relations.

For a detailed description of the learning algorithm see Marian & Billard (2004) and Goga and Billard (2006a).