Open Research Lines

Transfer and Meta RL​

The core idea behind transfer and meta-learning techniques is that experience gathered in learning a given task might help in learning or improving performance in related tasks. Sharing knowledge by successfuly transfering information can lead to drastical improvements over a broad spectrum of real-world problems.

Learning from Demonstrations

Learning from Demonstration is a crucial part of RL, both for its wide applicability and because of the peculiar theory and its algorithmic result. This field divides into two sub-fields: in imitation learning we start from an expert that we assume to be very close to the optimal policy and we seek for algorithms to mimic its behaviour. Instead, in inverse RL we only assume the demonstration come from someone who is optimizing a reward function, and we have to discover this reward functions, potentially suprpassing the performance of the demonstrator.

Safe Reinforcement Learning

The purpose of safe RL is to prevent risks and hazards, which is any source of potential damage that may either directly or indirectly be caused by the agent during the learning process or after its deployment. Safe learning is paramount to the development of real-life applications, where agents are employed to take critical decisions.

Non-stationarity, Delays, and Lifelong RL

Unlike the common assumption of stationary enviroments and immediate reward used in RL, many real-world problems experience changes of different scales and patterns in the environment or even delays in geerated rewards. Non-stationary, Delayed and Lifelong learning try to devise new methods in order to cope with these changes and get closer to more challenging and realistic scenarios, even considering (life)longer time horizons.

Sample Efficiency in RL

Sample efficiency refers to the number of interactions with the environment that are required to train a RL agent towards aptimality. Improving the sample efficiency is paramount to apply RL algorithms in the real-world, where collecting interactions is usally costly and time-consuming.

Exploration

Exploration refers to the problem of deciding when to exploit the current information to maximize the agent's performance rather than exploring the environment to gather additional information. It is a core challenge of reinforcement learning that impacts most of its application domains.

Configurable Environments

Environment configurability concerns with the opportunity of altering some parameters of the environment in order to guide the learning experience of an artificial agent. This line of research includes scenarios in which the configuration is aimed to ease the learning for the agent or performed by an external configuratior with diverging interests.

Multi-Agent, Distributed RL

Distributed Reinforcement Learning (DRL) where multiple, and maybe virtual, agents share their experiences to learn from each other and improve their policies. In DRL, the goal is to divide the learning process across multiple agents to improve learning efficiency, scalability, and stability. This can be achieved by dividing the environment into smaller parts, assigning each part to an individual agent, and allowing them to learn locally and communicate their experiences with each other.