A reinforcement learning approach to instrumental contingency degradation in rats

Alain Dutech, Etienne Coutureau, Alain R. Marchand
Journal of Physiology-Paris. 2011-08-01; 105(1-3): 36-44
DOI: 10.1016/j.jphysparis.2011.07.017

PubMed
Read on PubMed



1. J Physiol Paris. 2011 Jan-Jun;
doi:10.1016/j.jphysparis.2011.07.017. Epub 2011 Aug 31.

Goal-directed action involves a representation of action consequences. Adapting
to changes in action-outcome contingency requires the prefrontal region. Indeed,
rats with lesions of the medial prefrontal cortex do not adapt their free operant
response when food delivery becomes unrelated to lever-pressing. The present
study explores the bases of this deficit through a combined behavioural and
computational approach. We show that lesioned rats retain some behavioural
flexibility and stop pressing if this action prevents food delivery. We attempt
to model this phenomenon in a reinforcement learning framework. The model assumes
that distinct action values are learned in an incremental manner in distinct
states. The model represents states as n-uplets of events, emphasizing sequences
rather than the continuous passage of time. Probabilities of lever-pressing and
visits to the food magazine observed in the behavioural experiments are first
analyzed as a function of these states, to identify sequences of events that
influence action choice. Observed action probabilities appear to be essentially
function of the last event that occurred, with reward delivery and waiting
significantly facilitating magazine visits and lever-pressing respectively.
Behavioural sequences of normal and lesioned rats are then fed into the model,
action values are updated at each event transition according to the SARSA
algorithm, and predicted action probabilities are derived through a softmax
policy. The model captures the time course of learning, as well as the
differential adaptation of normal and prefrontal lesioned rats to contingency
degradation with the same parameters for both groups. The results suggest that
simple temporal difference algorithms with low learning rates can largely account
for instrumental learning and performance. Prefrontal lesioned rats appear to
mainly differ from control rats in their low rates of visits to the magazine
after a lever press, and their inability to initially detect weak contingency
changes.

Copyright © 2011. Published by Elsevier Ltd.

DOI: 10.1016/j.jphysparis.2011.07.017
PMID: 21907801 [Indexed for MEDLINE]

Know more about