In a paper published on the preprint server Arxiv.org, scientists at the King’s College London Department of Informatics used natural language to show evidence of pervasive gender and religious bias in Reddit communities. This alone isn’t surprising, but the problem is that data from these communities are often used to train large language models like OpenAI’s GPT-3. That in turn is important because, as OpenAI itself notes, this sort of bias leads to placing words like “naughty” or “sucked” near female pronouns and “Islam” near words like “terrorism.”
The scientists’ approach uses representations of words called embeddings to discover and categorize language biases, which could enable data scientists to trace the severity of bias in different communities and take steps to counteract this bias. To spotlight examples of potentially offensive content on Reddit subcommunities, given a language model and two sets of words representing concepts to compare and discover biases from, the method identifies the most biased words toward the concepts in a given community. It also ranks the words from the least to most biased using an equation to provide an ordered list and overall view of the bias distribution in that community.
Reddit has long been a popular source for machine learning model training data, but it’s an open secret that some groups within the network are unfixably toxic. In June, Reddit banned roughly 2,000 communities for consistently breaking its rules by allowing people to harass others with hate speech. But in accordance with the site’s policies on free speech, Reddit’s admins maintain they don’t ban communities solely for featuring controversial content, such as those advocating white supremacy, mocking perceived liberal bias, and promoting demeaning views on transgender women, sex workers, and feminists.
To further specify the biases they encountered, the researchers took the negativity and positivity (also called “sentiment polarity”) of biased words into account. And to facilitate analyses of biases, they combined semantically related terms under broad rubrics like “Relationship: Intimate/sexual” and “Power, organizing” that they modeled on the UCREL Semantic Analysis System (USAS) framework for automatic semantic and text tagging. (USAS has a multi-tier structure, with 21 major discourse fields subdivided into fine-grained categories like “People,” “Relationships,” or “Power.”)
One of the communities the researchers examined — /r/TheRedPill, ostensibly a forum for the “discussion of sexual strategy in a culture increasingly lacking a positive identity for men” — had 45 clusters of biased words. (/r/TheRedPill is currently “quarantined” by Reddit’s admins, meaning users have to bypass a warning prompt to visit or join.) Sentiment scores indicated that the first two biased clusters toward women (“Anatomy and Physiology,” “Intimate sexual relationships,” and “Judgement of appearance”) carried negative sentiments, whereas most of the clusters related to men contained neutral or positively connotated words. Perhaps unsurprisingly, labels such as “Egoism” and “Toughness; strong/weak” weren’t even present in female-biased labels.
Another community — /r/Dating_Advice — exhibited negative bias toward men, according to the researchers. Biased clusters included the words “poor,” “irresponsible,” “erratic,” “unreliable,” “impulsive,” “pathetic,” and “stupid,” with words like “abusive” and “egotistical” among the most negative in terms of sentiment. Moreover, the category “Judgment of appearance” was more frequently biased toward men than women, and physical stereotyping of women was “significantly” less prevalent than in /r/TheRedPill.
The researchers chose the community /r/Atheism, which calls itself “the web’s largest atheism forum,” to evaluate religious biases. They note that all the mentioned biased labels toward Islam had an average negative polarity except for geographical names. Categories such as “Crime, law and order,” “Judgement of appearance,” and “Warfare, defense, and the army” aggregated words with evidently negative connotations like “uncivilized,” “misogynistic,” “terroristic,” “antisemitic,” “oppressive,” “offensive,” and “totalitarian.” By contrast, none of the labels were relevant in Christianity-biased clusters, and most of the words in Christianity-biased clusters (e.g., “Unitarian,” “Presbyterian,” “Episcopalian,” “unbaptized,” “eternal”) didn’t carry negative connotations.
The coauthors assert their approach could be applied by legislators, moderators, and data scientists to trace the severity of bias in different communities and to take steps to actively counteract this bias. “We view the main contribution of our work as introducing a modular, extensible approach for exploring language biases through the lens of word embeddings,” they wrote. “Being able to do so without having to construct a-priori definitions of these biases renders this process more applicable to the dynamic and unpredictable discourses that are proliferating online.”
There’s a real and present need for tools like these in AI research. Emily Bender, a professor at the University of Washington’s NLP group, recently told VentureBeat that even carefully crafted language data sets can carry forms of bias. A study published last August by researchers at the University of Washington found evidence of racial bias in hate speech detection algorithms developed by Google parent company Alphabet’s Jigsaw. And Facebook AI head Jerome Pesenti found a rash of negative statements from AI created to generate humanlike tweets that targeted Black people, Jewish people, and women.
“Algorithms are like convex mirrors that refract human biases, but do it in a pretty blunt way. They don’t permit polite fictions like those that we often sustain our society with,” Kathryn Hume, Borealis AI’s director of product, said at the Movethedial Global Summit in November. “These systems don’t permit polite fictions. … They’re actually a mirror that can enable us to directly observe what might be wrong in society so that we can fix it. But we need to be careful, because if we don’t design these systems well, all that they’re going to do is encode what’s in the data and potentially amplify the prejudices that exist in society today.”