Combining MORL with Restraining Bolts to Learn Normative Behaviour
Combining MORL with Restraining Bolts to Learn Normative Behaviour
Emery A. Neufeld, Agata Ciabattoni, Radu Florin Tulcan
Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence
Main Track. Pages 4615-4623.
https://doi.org/10.24963/ijcai.2025/514
Normative Restraining Bolts (NRBs) adapt the restraining bolt technique (originally developed for safe reinforcement learning) to ensure compliance with social, legal, and ethical norms. While effective, NRBs rely on trial-and-error weight tuning, which hinders their ability to enforce hierarchical norms; moreover, norm updates require retraining. In this paper, we reformulate learning with NRBs as a multi-objective reinforcement learning (MORL) problem, where each norm is treated as a distinct objective. This enables the introduction of Ordered Normative Restraining Bolts (ONRBs), which support algorithmic weight selection, prioritized norms, norm updates, and provide formal guarantees on minimizing norm violations. Case studies show that ONRBs offer a robust and principled foundation for RL-agents to comply with a wide range of norms while achieving their goals.
Keywords:
Knowledge Representation and Reasoning: KRR: Learning and reasoning
Agent-based and Multi-agent Systems: MAS: Normative systems
AI Ethics, Trust, Fairness: ETF: Values
Machine Learning: ML: Reinforcement learning
