Presented at the 7th Annual Computational Neuroscience Meeting (CNS*98), Santa Barbara (CA), July 26-30th 1998


Figure-ground segregation of coherent motion in V1: A model based on the role of intra-cortical and extra-cortical feedbacks

William H.A.Beaudot, PhD

KYBERVISION Consulting, Research & Development, Montreal, Canada.

Email:

INTRODUCTION

An important use of visual motion is in the segregation of a complex scene into its perceptually coherent and distinct moving components. The recent computational theories have expressed the perception of coherent visual motion as an optimization process constrained by the conflicting demands of integration and segmentation of local motion [1-2]. They tell us however very few about the nature of the cortical mechanisms involved in our spatially accurate perception of coherent motion. Recent anatomical and physiological experiments have however provided strong evidence that V1 processing is under the influence of contextual modulations [3] through intra- or extra-cortical connections [4]. They also suggested its perceptive correlate in terms of figure-ground segregation as soon as area V1 [5], a view clearly opposite to the hierarchical functional organization of the visual system [6]. This work investigates the function of the interactions between visual areas V1 and MT in the figure-ground segregation of coherent motion, with the aim of revealing the nature of the integrative processes performed by area V1 in segmentation of visual motion.


METHODS

A neuromorphic model of the retino-cortical motion processing stream is proposed which incorporates both feedforward and feedback mechanisms. The feedforward stream follows a gradient scheme for local motion estimation and its integration for global motion estimation, while the feedback stream applies local constraints of both smoothness and discontinuity onto the local motion stage. The functional properties of this model were derived analytically, and are illustrated by real-time simulations on image sequences representing real scenes and psychophysical stimuli.

Structure of the model. Each neural layer of the model (6 in retina, 3 x n in V1, n in MT where n is the number of considered orientations) is modeled by a resistive network of leaky integrators. Spatial and temporal characteristics of the neurons are reflected by the coupling between neighbor units, and their membrane properties, respectively. Synaptic interactions are considered as linear for the excitatory or inhibitory ones and as multiplicative for the modulatory ones.

Plausibility of the model. We propose that this model correlates with the following neural organization of the visual pathway:

i) Retinal spatiotemporal processing provides parvo- and magno-cellular inputs with different band-pass characteristics to area V1.

ii) Oriented band-pass filtering of the luminance parvo-cellular signal provides V1 orientation-selective simple cells responses.

iii) These cells interact nonlinearly with the magno-cellular inputs to produce V1 direction-selective simple cells (SDS) responses.

iv) These V1 SDS cells drive the V1 direction-selective complex cells (CDS) which are laterally and excitatory connected.

v) The spatial convergence and temporal integration of V1 CDS signals provide the responses of MT direction-selective cells with large receptive fields.

vi) The reciprocal retinotopic connections from MT modulate the gain of the feedback loop formed by the excitatory intra-cortical connectivity between V1 CDS cells.

Retino-cortical model of the motion processing stream

Simulations.

The striking capabilities of the model are illustrated on the processing of kinetic contours defined by coherent random-dot motion: a spatial shape composed of coherent moving random-dots embedded in a background composed of non coherent moving random-dots is successfully segmented by the V1 CDS cells in presence of the MT feedback, while the deactivation of this feedback leads irremediably to the disappearance of this shape. The feedback onto V1 also increases significantly the signal-to-noise ratio at the level of area MT, and speeds up its activation in presence of coherent motion.

Retinal Stage

The region surrounded by the dashes only delineates the spatial shape formed by 50% of coherently moving random-dots.


Cortical Stage




Sorry for the low quality of figures with arrows representing estimated motion in the different layers, I will improve these as soon as possible, as well as to add some QuickTime movies showing temporal simulations of the model.


Properties

This particular scheme has three important consequences on the representation of visual motion:

1) Area MT imposes a local smoothness constraint onto the representation of the optical flow by V1 CDS cells, and the aperture problem is solved through the selection of the optimal receptive field which integrates the local motion in a way consistent with the global estimation;

2) The overall effect of this space-variant alteration of the receptive fields is an effective enhancement of V1 CDS responses at locations where the local motion is compatible with the global motion, and a dramatic decrease where it is not. This spatially nonlinear mechanism then preserves the discontinuities present in the "true" motion across space, and thus promotes a shape-from-motion segmentation as early as area V1;

3) This figure-ground segregation of coherent motion is the result of an optimization process which converges in presence of stationary motion, and which is supported by the dynamics induced through the reciprocal connections between areas V1 and MT.

The explanation of this result rests on the specific nature of the feedback made by area MT onto area V1 in the proposed model. Indeed, MT feedback onto V1 does not just transmit a re-entrant signal but rather an active signal which modulates the V1 processing according to a confidence measurement of motion coherence. Responses of MT cells increase in presence of coherent motion, and then favour the spatial and temporal integration of local motion through the V1 CDS intra-cortical network. Conversely in absence of coherent motion the weaker responses of MT cells limit such integration to the direct input from V1 SDS cells which carry the local estimation. The enhancement or depression of V1 CDS responses according to the strength of MT responses reflects these changes in the spatiotemporal properties of V1 CDS receptive fields (gain, space and time constants).


RESULTS

We proposed a neuromorphic model of the motion pathway based on these cortical mechanisms. The feedforward stream alone provides a spatially "fine" representation of local motion in area V1 and a spatially "coarse" representation of global motion in area MT. It is however unable to deal with the aperture problem, and consequently cannot support any spatial segmentation based on coherent motion. Reciprocally the feedback from MT onto V1 induces a dynamic competition between the local and global representations of motion information, which promotes the emergence of an intermediate representation of the visual flow compatible with both local and global representations. On convergence of this dynamics, responses of V1 CDS cells provide a fine representation of "true" motion, solving simultaneously the aperture problem and the figure-ground segregation of coherent motion.


CONCLUSION

The perception of coherent visual motion can be expressed as an optimization problem constrained by the conflicting and simultaneous demands of integration and segmentation of local motion, and requires a nonlinear regularization approach [7]. By implementing such a computational scheme, the proposed model accounts for the figure-ground segregation of coherent motion as soon as area V1 through the modulation of its intra-cortical connections by MT feedback. The model is also compatible with the recent anatomical and physiological evidence of contextual modulation inside area V1 [3-5], and: 1) It demonstrates that area V1 is able to subtend the figure-ground segregation of coherent motion with a high resolution spatial information; 2) It suggests, by the simplicity of its neural structure, that the excitatory intra-cortical connectivity and its modulation by a higher cortical signal is a critical feature of the cortical circuitries involved in visual segmentation; 3) Although initially based on a theoretical background and developed in the context of neuromorphic engineering, the model also provides some experimentally testable hypothesis about the nature of the integrative processes performed by area V1 as suggested by the recent theories [8-9].


REFERENCES

1. Poggio, Torre & Koch. Computational vision and regularization theory. Nature 317, 314-319 (1985).
2. Yuille & Grzywacz. A computational theory for the perception of coherent visual motion. Nature 333, 71-74 (1988).
3. Zipser, Lamme & Schiller. Contextual modulation in primary visual cortex. J. Neurosci. 16:7376-7389 (1996).
4. Hupé, James, Girard & Bullier. Feedback connections from V2 modulate intrinsic connectivity within V1. Soc. Neurosci. Abs. 406.15 (1997).
5. Lamme, van Dijk & Spekreijse. Contour from motion processing occurs in primary visual cortex. Nature 363, 541-543 (1993).
6. Bullier & Nowak. Parallel versus serial processing: new vistas on the distributed organization of the visual system. Current Opinion in Neurobiology 5, 497-503 (1995).
7. Schnorr & Sprengel. A nonlinear regularization approach to early vision. Biological Cybernetics 72, 141-149 (1994).
8. Mumford. On the computational architecture of the neocortex: II The role of cortico-cortical loops. Biological Cybernetics 66, 241-251 (1992).
9. Lee, Mumford, Romero & Lamme. The role of the primary visual cortex in higher level vision. Vision Research 38, 2429-2454 (1998).

© 1998 KyberVision Consulting, R&D
E-mail: