#

Virginie Fresno, Alain Marchand et al in Scientific Reports

Neurobiological control of exploration during learning, Neurobiological control of exploration during learning

Dopamine blockade impairs the exploration-exploitation trade-off in rats
François Cinotti#, Virginie Fresno#, Nassim Aklil, Etienne Coutureau, Benoît Girard, Alain R. Marchand* & Mehdi Khamassi*.
#Contributed equally as first author
* Contributed equally as senior author
Scientific Reports volume 9, Article number: 6770 (2019)


Exploration is an essential component of trial-and-error learning. In a novel situation, one has to try several options to discover which one is the most advantageous (exploration). But once the situation is familiar, selecting the most profitable option (exploitation) is usually better. Still, exploration may be needed again if the current option becomes less favorable.

Dopamine in the brain is released in response to unexpected rewards. During learning, this signal supports the progressive selection of the best actions in a given situation. However, recent work in collaboration between a neurobiology team in Bordeaux (INCIA/DECAD) and a modelling team in Paris (ISIR) shows that dopamine can also regulate exploration during the course of learning.

In theory, exploration comes in two categories: “directed” exploration, which orients our actions toward uncertain or less known options; and “random” exploration, which means trying a random action from time to time. Although simpler to implement, the latter type has been far less studied. The researchers from INCIA and ISIR made the hypothesis that cerebral dopamine levels could directly affect random exploration.

They tested this hypothesis by inhibiting brain dopamine in rats during a choice task requiring exploration. Rats had to identify among three levers which one yielded the highest probability of reward. But because the best lever regularly changed, the rats needed to explore and relearn often.

The results show that reducing cerebral dopamine increases exploration, in line with the proposed hypothesis. Thus, if rewards are less frequent and dopamine levels decrease in the brain, it could be a trigger to explore novel options. The researchers furthermore analyzed the behavior of the rats using computational models and compared these effects of dopamine on directed and random exploration. In all models, dopamine essentially affected the parameter regulating random exploration, while directed exploration and learning rate did not change.

These results show that simple mechanisms to regulate learning may have evolved in mammals, allowing behavioral flexibility without the need to systematically compute some forms of uncertainty such as those used in directed exploration.

 

Schematics of the task. Left: Choices by the rat are rewarded with different probabilities in easy trials (7/8; 1/16; 1/16) and difficult trials (5/8; 3/16; 3/16). Learning measurement is the proportion of choices on the best lever during 24-trial blocks of learning where best lever and probabilities do not change. Exploration measurement is the proportion of choice shifts after a reward. Difficult trials as well as dopamine blockade decrease performance and increase exploration. Right: A simple Q-learning model with forgetting accurately reproduces the rat’s behavior. Random exploration parameters of the model significantly increase when dopamine is blocked. Learning parameters are not affected.
Schematics of the task. Left: Choices by the rat are rewarded with different probabilities in easy trials (7/8; 1/16; 1/16) and difficult trials (5/8; 3/16; 3/16). Learning measurement is the proportion of choices on the best lever during 24-trial blocks of learning where best lever and probabilities do not change. Exploration measurement is the proportion of choice shifts after a reward. Difficult trials as well as dopamine blockade decrease performance and increase exploration. Right: A simple Q-learning model with forgetting accurately reproduces the rat’s behavior. Random exploration parameters of the model significantly increase when dopamine is blocked. Learning parameters are not affected.

 

08/05/19