An Extended Actor-Critic Architecture with Phasic Behavioral Inhibition: The Case of Dopamine-Serotonin Interaction
Aya Hussein Ahmad Mousa
آية حسين أحمد موسى
The actor-critic architecture based on the temporal difference (TD) algorithms have been playing a critical role in reinforcement learning. The actor represents the policy structure and critic represents the value function. The TD prediction error signal is used as a teaching signal for both the actor and critic modules. Current models of the actor-critic architecture assume that only the unmodified TD signal can serve as a teaching signal for the actor and critic modules. In this thesis, we introduce an extended version of the actor-critic architecture that addresses the effect of two kinds of reinforcement signals; the TD signal and the behavioral inhibition signal. We argue that the role of the behavioral inhibition signal is to produce phasic opposition of the TD signal in order to ascertain the significance learning and fortify consolidation. Based on this logic, we construct a new neurocomputational model of the brain region the basal ganglia. This model addresses the effects of the neurotransmitters dopamine and serotonin in the reinforcement learning process. The dopamine function is represented by a TD prediction error signal, while serotonin is simulated as a behavioral inhibition signal whose role is to phasically inhibit the TD prediction error signal. We utilize major depressive disorder and selective serotonin reuptake inhibitor (SSRI) antidepressants as experimental representations of variable levels of dopamine and serotonin to study their interaction in reinforcement learning. We use three different modeling approaches to simulate experimental reinforcement learning data: (1) TD only model, (2) TD and risk prediction model, and (3) Our proposed TD and behavioral inhibition model. Simulation results show that our proposed model simulated experimental reinforcement learning data from MDD and SSRIs significantly better the other two modeling approaches. This extended actor-critic architecture can have a myriad of applications in robotics as well as neuroscience.