Facebook piloted a text-based fantasy role-playing game to improve the conversational models powering things like its chatbots and smart speakers. In a preprint paper, researchers at the company describe a game that iterates between collecting data and retraining models on the collected data, with a metric that evaluates and compares models using players’ continuation rates (i.e., how long they continue playing). The coauthors claim that in experiments, they obtained data at a rate one-fifth the price per utterance of crowdsourcing and that their game provided evidence lifelong dialogue learning is viable.
People learn to use language over the course of their lives from interactions they have with the world and others , yet natural language processing (NLP) research often involves fixed data sets and frozen models. In this paradigm, models are prevented from interacting with humans at training time, a constraint that precludes performance improvements. An alternative is continually retraining the models, but this can be costly; many corpora are collected via crowdsourcing where researchers pay crowdworkers through platforms like Amazon Mechanical Turk to perform tasks. Because the crowdworkers are motivated by pay rather than interest, budget overruns and poor-quality data can result.
The Facebook researchers’ game, then, aims to iteratively learn from conversations with “intrinsically motivated” players. The core piece involves two “agents” — one human player and one AI — in one of 587 locations with descriptions, where each agent is assigned a character out of a pool of 630 with names and backstories. Agents have to role-play their character’s dialogue in the scenario while an automated dungeon master assesses the quality of the player’s role-playing capabilities, rating the likelihood of dialogue in a given context between 1 and 5 stars. These sub-scores are added up and the total score is posted to a leaderboard compared with all other players, and players earn badges representing characters in the game if they collect a certain number of points for a dialogue.
Dialogues in the game are vetted for offensive and gendered language and consist of six turns per agent, or 12 in total. At the end of each, players are presented with three choices:
- Choose to move to a new location, where they will continue to play this character, but meet a new character to converse with.
- Stay in the same room but wait for a new character to arrive to converse with.
- Change to role-play a completely new pair of characters in a new setting.
The Facebook researchers ran advertisements to recruit 13,188 users who played 41,131 rounds of the game altogether, and they evaluated the quality of those players’ exchanges by training models on each individual utterance. The results suggest it was over 8 times cheaper to attain model accuracy of 80.63% with the game compared with crowdsourcing, in part because of the high level of engagement — users chose to continue playing 68% to 75% of the time.
Players generally sought “exciting” conversations involving emotional, action-packed interactions like seeking quests whereas crowdworkers tend to be more even-keeled and willing to discuss dry topics at length, according to the researchers. Players used more words with aggression during dialogues, like “stab” and “kills,” but also overtly friendly actions (“smiles,” “hug”) and slang (“ur,” “yo,” “dude”) as well as emojis. It’s these more “natural” exchanges that lead to models more accurately reflecting human interaction, the researchers assert, because even the lowest-quality data provides a useful signal.
“We find this exciting because this approach shows it is possible to build continually improving models that learn from interacting with humans in the wild (as opposed to experiments with paid crowdworkers),” the coauthors wrote. “This represents a paradigm shift away from the limited static dataset setup that is prevalent in much of the work of the community.”
The researchers plan to make the training code, models, and data sets publicly available in the future.
Notably, the work builds on LIGHT, a research environment in the form of a text-based game within which AI and humans interact as player characters. In November, data scientists at Facebook, the University of Lorraine, and the University College London investigated an approach to creating game worlds similar to those described in this latest preprint paper. Using content from LIGHT, they designed models that could compositionally arrange locations and characters and generate new content on the fly, showing how machine learning algorithms can learn to creatively assemble different elements.