New Method of Machine Learning with Interpretability Constraints

© 2022 EPFL

© 2022 EPFL

Dr. Michael Mark and Prof. Thomas Weber’s most recent paper, on “Optimal Recovery of Unsecured Debt via Interpretable Machine Learning,” published in Machine Learning with Applications, develops a method to incorporate domain knowledge directly into a reinforcement-learning agent. The new tool is tested on the problem of credit collections, for which an optimal solution is known, so that the augmented learning performance can be compared to an established benchmark. A key observation is that interpretability in the form of monotonicity constraints may be imposed without significant loss in performance. The paper was also co-authored with Prof. Naveed Chehrazi (Olin Business School) and Huanxi Liu who participated in an Excellence Research Internship at EPFL, hosted by the Chair of Operations, Economics and Strategy, while pursuing his bachelor’s degree at UC San Diego.

Abstract:

This paper addresses the issue of interpretability and auditability of reinforcement-learning agents employed in the recovery of unsecured consumer debt. To this end, we develop a deterministic policy-gradient method that allows for a natural integration of domain expertise into the learning procedure so as to encourage learning of consistent, and thus interpretable, policies. Domain knowledge can often be expressed in terms of policy monotonicity and/or convexity with respect to relevant state inputs. We augment the standard actor–critic policy approximator using a monotonically regularized loss function which integrates domain expertise into the learning. Our formulation overcomes the challenge of learning interpretable policies by constraining the search to policies satisfying structural-consistency properties. The resulting state-feedback control laws can be readily understood and implemented by human decision makers. This new domain-knowledge enhanced learning approach is applied to the problem of optimal debt recovery, which features a controlled Hawkes process and an asynchronous action–feedback relationship.

Acknowledgement: This research was funded by the Swiss National Science Foundation (grant no. 105218-179175).

References

Mark, M., Chehrazi, N., Liu, H., Weber, T.A. (2022) “Optimal Recovery of Unsecured Debt via Interpretable Machine Learning,” Machine Learning with Applications, Vol. 8, Article 100280. [DOI: 10.1016/j.mlwa.2022.100280; open access]