College of Liberal Arts

Watch Your Mouth: Researching the Effects of Vulgarity in Social Media

Thu, Dec 20, 2018
Eric Holgate, Isabel Cachola and Junyi Jessy Li
Eric Holgate, Isabel Cachola and Junyi Jessy Li

As the presence of social media becomes increasingly abundant in our everyday lives, sordid words, once considered shocking to express, are now seen nearly everywhere we look.

Bad words make up approximately 0.5-0.7 percent of words used in conversational speech, with the most popular terms, such as ‘ass’ and the f-word, considered highly versatile in their use. To understand who was using these words and how, researchers at The University of Texas at Austin looked no further than Twitter. 

“Social media platforms are a medium where people share a wide array of thoughts, from intense opinions to casual status updates,” noted UT Austin researchers. “Additionally, the informal nature of social media results in more frequent usage of vulgar terms.” 

Beyond testing their frequency, researchers also sought to understand the sentiment behind these expressions and what general implications it might have on people reading such Tweets.

“When we first started looking into this issue I was originally interested in sentiment analysis,” said mathematics senior Isabel Cachola. “As I was doing research, I found it pretty surprising that I wasn’t able to find any research utilizing vulgarity features to improve sentiment.”

Interested in learning more, Cachola put together a team of researchers, including UT Austin linguistics graduate student Eric Holgate, UT Austin linguistics assistant professor Junyi Jessy Li and Bloomberg researcher Daniel PreoŇ£iuc-Pietro.

They found that vulgarity is used differently among distinct demographics. Those that are younger, non-religious, or politically liberal tend to use vulgarity at a much more frequent rate, as these groups typically use vulgar terms to express emotion, group identity or emphasis. Additionally, those who report higher incomes or education levels, as well as those who identify as female, are found to use vulgarity less often than their counterparts.

In a paper they presented at the Conference for on Empirical Methods in Natural Language Processing, the researchers discuss how the same vulgar word can be in many ways such as “today is a good ass day” for emphasis, “now this is a group of ass kickers” to signal group identity, or “vesting in equipment is a pain in the ass” to express emotion. 

 To determine the outcome or perceived sentiment of posts containing vulgar language, the researchers asked a group of people to mark Tweets for positive, negative, or neutral sentiments and then describe the pragmatic reasoning for the use of vulgarity within the tweet.

Researchers found that removing a vulgar expression from the context resulted in different ratings of positivity/negativity than were obtained from the original, unedited tweet. In some cases, removing the vulgar terms caused tweets that were originally perceived as positive expressions to flip to be perceived as negative (and vice-versa).

“Vulgarity is used in so many different contexts, across platforms, across demographics, across content. When we’re talking about how to use it in things like sentiment or hate speech detection we really need to find a way to capture those nuances,” Cachola said.

The most challenging part of this research, Li claims, “comes from the interdisciplinary nature of it.”

“Any time you’re working with machines, a lot is dependent upon the quality of data that you put in, so there’s a lot of painstaking effort that goes in to make sure that the annotators that you have are understanding the task right,” said Holgate. “But, it shouldn’t be contrived. We want people’s opinions on this, and to be reflective of how humans understand language. There’s a delicate balance.” 

In studying the topic of vulgarity and how people perceive and categorize it in different contexts and modeling the vulgar content, the group is able to emphasize how it is possible to avoid over-censorship in public forums. 

“A better understanding of the role of vulgarity can lead to better and fairer models aimed at real-world applications,” Li said. 

This research was presented at the 2018 Conference on Computational Linguistics and the 2018 Conference on Empirical Methods in Natural Language Processing.

Bookmark and Share