Procedurally generating an interesting video game environment isn’t just challenging — it’s incredibly time-consuming. Tools like Promethean AI, which tap machine learning to generate scenes, promise to ease the design burden somewhat. But barriers remain.
That’s why researchers at Facebook, the University of Lorraine, and the University College London in a preprint research paper investigated an AI approach to creating game worlds. Using content from LIGHT, a fantasy text-based multiplayer adventure, they designed models that could compositionally arrange locations and characters and generate new content on the fly.
“We show how [machine learning] algorithms can learn to assemble … different elements, arranging locations and populating them with characters and objects,” wrote the study’s coauthors. “[Furthermore, we] demonstrate that these … tools can aid humans interactively in designing new game environments.”
By way of refresher, LIGHT — which was proposed in a March paper published by the same team of scientists — is a research environment in the form of a text-based game within which AI and humans interact as player characters. All told, it comprises crowdsourced natural language descriptions of 663 locations based on a set of regions and biomes, along with 3,462 objects and 1,755 characters.
In this latest study, the team built a model to generate game worlds, which entail crafting location names and descriptions including background information. They trained it using example neighboring locations partitioned into test and validation sets, such that the locations were distinct in each set. Two ranking models were considered — one where models had access to the location name only and a second where they had access to the location description information — and architected so that when a new world was constructed at test time, the placed location was the highest scoring candidate of several.
To create a map for a new game, the models predicted the neighboring locations of each existing location, and for each location added, they filled in the surroundings. A location could connect to up to four neighboring locations (though not all connections needed to be filled), and locations couldn’t appear multiple times in one map.
A separate set of models produced objects, or items with which characters could interact. (Each object has a name, description, and a set of affordances that represent object properties, such as “gettable” and “drinkable.”) Using characters and objects associated with locations from LIGHT, the researchers created data sets to train algorithms that placed both objects and characters in locations, as well as objects within objects (e.g., coins inside a wallet).
Yet another family of models that had been fed the corpora from the world construction task created new game elements — either a location, character, or object — by leveraging a Transformer architecture pretrained on 2 billion Reddit comments, which were chosen because of their “closeness to natural human conversation” and because they exhibit “elements of creativity and storytelling.” It predicted a background and description given a location name; a persona and description given an object name; or a description and affordances given an object name.
So how did it all work in concert? First, an empty map grid was initialized to represent the number of possible locations, with a portion of grid positions marked inaccessible to make exploration more interesting. The central location was populated randomly, and the best-performing model iteratively filled in neighboring locations until the entire grid was populated. Then, for each placed location, a model predicted which characters and objects should populate that location before another model predicted if objects should be placed inside existing objects.
The researchers also propose a human-aided design paradigm, where the models could provide suggestions for which elements to place. If human designers enter names of game elements not present in the data set, the generative models would write descriptions, personas, and affordances.
In experiments, the team used their framework to generate 5,000 worlds with a maximum size of 50 arranged locations. Around 65% and 60% of characters and objects in the data set, respectively, were generated after the full 5,000 maps. The most commonly placed location was “the king’s quarters” (in 34% of the generated worlds), while the least commonly placed location was “brim canal,” and 80% of the worlds had more than 30 locations.
Despite the fact that the generative models didn’t tap the full range of entities available to them, the researchers say that the maps they produced were generally cohesive, interesting, and diverse. “These steps show a path to creating cohesive game worlds from crowd-sourced content, both with model-assisted human creation tooling and fully automated generation,” they wrote.