A Multi-Objective Approach to Mitigate Negative Side Effects
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence
Main track. Pages 354-361. https://doi.org/10.24963/ijcai.2020/50
Agents operating in unstructured environments often create negative side effects (NSE) that may not be easy to identify at design time. We examine how various forms of human feedback or autonomous exploration can be used to learn a penalty function associated with NSE during system deployment. We formulate the problem of mitigating the impact of NSE as a multi-objective Markov decision process with lexicographic reward preferences and slack. The slack denotes the maximum deviation from an optimal policy with respect to the agent's primary objective allowed in order to mitigate NSE as a secondary objective. Empirical evaluation of our approach shows that the proposed framework can successfully mitigate NSE and that different feedback mechanisms introduce different biases, which influence the identification of NSE.
Agent-based and Multi-agent Systems: Human-Agent Interaction
Humans and AI: Human-AI Collaboration
Planning and Scheduling: Markov Decisions Processes