Last December, Pinterest announced the launch of Pinterest Trends, a feature that reveals the past year’s most popular search keywords. Much like Google Trends and Bing’s Keyword Research Tool, Trends spotlights terms that peaked over the past 12 months, using algorithmic data to sort by volume.
Trends became available globally this week in beta, and in the spirit of transparency, Pinterest detailed how the taxonomic system underpinning Trends canvases the over 200 billion ideas across 4 billion boards created by the social network’s over 320 million users. “Because people come to Pinterest to plan, we have unique insight into emerging trends,” wrote Song Cui and Dhananjay Shrouty, software engineers on the Content Knowledge team. “We’re able to gather these insights because Pinterest is fundamentally a different kind of platform where … people from around the world come to save ideas and plan.”
Pinterest taps a taxonomic knowledge management system that enables content-level understanding, according to Cui and Shrouty. It classifies each entity and defines the relationships among them, with the goal of improving the accuracy of AI models on the platform involved in search and classification tasks.
The taxonomy — which supports 17 languages for 20 countries, with more to come — organizes popular topics throughout the platform and curates interests and nodes (Pins) for ads and ongoing campaigns. Interests are grouped together in a hierarchical parent-child tree structure, where each child is a subclass of its single parent, and the top-level taxonomy nodes define broad verticals — e.g., “Women’s Fashion” and “DIY and Crafts — that capture the general interests associated with Pins. (Child nodes up to 11 levels capture more granular topics.)
“Pinterest taxonomy aims to capture the most important and timely topics from Pinterest content,” explained Cui and Shrouty. “Active topics used in various products such as topic feed and shopping are all covered by our taxonomy … These terms are mined from popular annotations used in Pins, board names, and top search queries.”
In this respect, the system builds on Pinterest’s existing work with PinSage, a graph convolutional network containing over 3 billion nodes and 18 billion edges that can learn about things like nearby Pins in web-scale graphs. Pinterest began to use PinSage for ad recommendations in February 2018 and more broadly for things like shopping recommendations in June, and at the time, it claimed it spurred a 25% increase in impressions for Shop the Look (a feature that lets Pinterest users buy clothes seen in Pins) and a 46% performance gain over traditional random graph sampling methods.
A taxonomy wouldn’t be of much use if there wasn’t a mechanism for mapping Pins to said taxonomy. That’s why the Content Engineering team built Pin2Interest (P2I), a content-classifying system that ingests embeddings, text and visual inputs, and board names to create personalized recommendations and ranking features for other AI models. It’s currently being used in production to rank Pins on users’ home feeds and for advertisement targeting.
P2I taps natural language processing techniques like lexical expansion (the creation of new lexical units and patterns) and embedding similarities to map the inputs of images to a list of nodes as prediction candidates. Then it employs a search relevance model to predict and rank the matching score between the aforementioned images and nodes. Pinterest says that more than 99% of images can be mapped to at least one node.
Cui and Shrouty note that the taxonomy hierarchy information is also used as P2I ranking information. Paired with the taxonomy, it allows for the monitoring of the number of images per node and, by extension, topic trending across all of Pinterest. “The granularity and quality of the taxonomy is critical for the P2I accuracy,” they wrote. “If the content of the image belongs to a very particular topic and the taxonomy does not have a similar node to cover this topic, P2I will map this image to a node with a different context and prediction accuracy drops.”
Mapping users and queries
The taxonomy’s usefulness extends beyond trending topic tracking. In point of fact, a system dubbed User2Interest (U2I) uses it to map users to their interests. Pins with which people engage and those Pins’ corresponding interest labels, which are generated by P2I, serve as signals that inform U2I’s predictions in ads targeting, organic recommendations, and user-centric insights on the taxonomy. For instance, it can compute statistics like the number of users per taxonomy node to inform advertisers of shifts in overall interest.
Another system — Query2Interest — is responsible for mapping short text queries to the taxonomy nodes. Its signal is Pintext, a multitask text embedding model that susses out the similarity between the short text and taxonomy nodes, grouping queries with similar categories and meanings to nodes. Q2I is in production across various ads and organic surfaces, Pinterest says, chiefly to glean a better understanding of users’ intents.
Creating and maintaining the taxonomy
Clearly, the interest taxonomy plays a vital role in matching users with content they’re likely to enjoy. But how is it curated? According to Cui and Shrouty, it’s a multi-step process involving what’s called a resource description framework (RDF), use of the open source ontology dev environment WebProtégé, and an engineering workflow that facilitates updates.
RDF is used to create graphs (which comprise nodes and edges that connect to the nodes) while WebProtégé creates visualizations, both of which aid the team of humans tasked with vetting the taxonomy. As for the aforementioned engineering workflow, it sees Pinterest scientists take the RDF graphs in XML format and produce relational database tables for downstream usage.
For every iteration of the taxonomy, Cui, Shrouty, and team develop and extend the taxonomy developed from the previous iteration. When new versions are created, operations like adding a new node, renaming an existing node, deleting a node, and merging two or more nodes are performed with heuristic rules.
Adding to the taxonomy
Before a new topic is added to the taxonomy, the Content Engineering team first sends out candidate terms to its content, legal, and other divisions for review. Then, using an AI system called Neural Taxonomy Expansion (NTE) — which is used in production for taxonomy expansion projects within Pinterest — the likelihoods of the existing node as well as that of the parent candidate terms are predicted. The predicted parents are reviewed manually to ensure the taxonomy is of high quality, after which the nodes are added to the current taxonomy in WebProtégé by taxonomists.
In future work, Cui, Shrouty, and colleagues intend to work toward building new types of relationships among entities automatically in the taxonomy and associate attributes. “Moving forward, we’re excited to keep evolving how we capture and understand trends in a more timely and systematic manner,” they wrote.
Pinterest employs machine learning across its business — not strictly for taxonomic purposes. Last October, the company revealed it leveraged AI that identifies and hides content displaying, rationalizing, or encouraging self-injury to achieve an 88% reduction in reports of such content. Lens, Pinterest’s AI online/offline visual search tool that identifies things captured from Pins or by a smartphone and suggests related themes and products, can now recognize 2.5 billion home and fashion objects. And as early as 2015, Pinterest began using AI to surface Related Pins, or Pins tangentially relevant to those visually above them on the web and mobile.