Develop personalized apps using a combination of Reinforcement Learning and NLP/Chatbots

**Abstract. **We present a Reinforcement Learning (RL) based approach to implement Recommender Systems. The results are based on a real-life Wellness app that is able to provide personalized health / activity related content to users in an interactive fashion. Unfortunately, current recommender systems are unable to adapt to continuously evolving features, e.g. user sentiment, and scenarios where the RL reward needs to computed based on multiple and unreliable feedback channels (e.g., sensors, wearables). To overcome this, we propose three constructs: (i) weighted feedback channels, (ii) delayed rewards, and (iii) reward boosting, which we believe are essential for RL to be used in Recommender Systems.

This paper appears in the proceedings of AAI4H — Advances in Artificial Intelligence for Healthcare Workshop, co-located with the 24th European Conference on Artificial Intelligence (ECAI 2020), Sep 2020 (paper pdf) (ppt)

1 Introduction

Health / Wellness apps have historically suffered from low adoption rates. Personalized recommendations have the potential of improving adoption, by making increasingly relevant and timely recommendations to users. While recommendation engines (and consequently, the apps based on them) have grown in maturity, they still suffer from the ‘cold start’ problem and the fact that it is basically a push-based mechanism lacking the level of interactivity needed to make such apps appealing to millennials.

We present a Wellness app case-study where we applied a combination of Reinforcement Learning (RL) and Natural Language Processing (NLP) / Chatbots to provide a highly personalized and interactive experience to users. We focus on the interactive aspect of the app, where the app is able to profile and converse with users in real-time, providing relevant content adapted to the current sentiment and past preferences of the user.

The core of such chatbots is an intent recognition Natural Language Understanding (NLU) engine, which is trained with hard-coded examples of question variations. When no intent is matched with a confidence level above 30%, the chatbot returns a fallback answer. The user sentiment is computed based on both the (explicit) user response and (implicit) environmental aspects, e.g. location (home, office, market, …), temperature, lighting, time of the day, weather, other family members present in the vicinity, and so on; to further adapt the chatbot response.

RL refers to a branch of Artificial Intelligence (AI), which is able to achieve complex goals by maximizing a reward function in real-time. The reward function works similar to incentivizing a child with candy and spankings, such that the algorithm is penalized when it takes a wrong decision and rewarded when it takes a right one — this is reinforcement. The reinforcement aspect also allows it to adapt faster to real-time changes in the user sentiment. For a detailed introduction to RL frameworks, the interested reader is referred to [1].

Previous works have explored RL in the context of Recommender Systems [2, 3, 4, 5], and enterprise adoption also seems to be gaining momentum with the recent availability of Cloud APIs (e.g. Azure Personalizer [6, 7]) and Google’s RecSim [8]. However, they still work like a typical Recommender System. Given a user profile and categorized recommendations, the system makes a recommendation based on popularity, interests, demographics, frequency and other features. The main novelty of these systems is that they are able to identify the features (or combination of features) of recommendations getting higher rewards for a specific user; which can then be customized for that user to provide better recommendations [9].

Unfortunately, this is still inefficient for real-life systems which need to adapt to continuously evolving features, e.g. user sentiment, and where the reward needs to computed based on multiple and unreliable feedback channels (e.g., sensors, wearables).

The rest of the paper is organized as follows: Section 2 outlines the problem scenario and formulates it as an RL problem. In Section 3, we propose

three RL constructs needed to overcome the above limitations: (i) weighted feedback channels, (ii) delayed rewards, and (iii) reward boosting, which we believe are essential constructs for RL to be used in Recommender Systems.

‘Delayed Rewards’ in this context is different from the notion of Delayed RL [10], where rewards in the distant future are not considered as valuable as immediate rewards. This is very different from our notion of ‘Delayed Rewards’ where a received reward is only applied after its consistency has been validated by a subsequent action. Section 4 concludes the paper and provides directions for future research.

#recommendation-system #data-science #reinforcement-learning #machine-learning #chatbots #reinforcement learning based recommender systems

Reinforcement Learning Based Recommender Systems
1.25 GEEK