Thursday, April 25, 2024

AI learns when people are using hate speech, even when they use code words

Share

Why it matters to you

Racist trolls frequently use code words as slurs to get around keyword filters. This smart algorithm is designed to work out what they’re saying.

Anyone who has ever had a safe-for-work website blocked as not safe for work (NSFW) by their work internet filter (or has experienced the potentially embarrassing opposite) knows that programs designed to block out certain pieces of content can often run into problems.

One reason for this is that keyword searches can prove to be overly blunt tools for dealing with something as nuanced, complex, and constantly evolving as language.

This is particularly true when trying to find hateful keywords on social media. For example, last year Alphabet released an algorithm designed to filter out racist words online, only for trolls to start substituting the name of Google products for racial slurs. For a while, it totally outsmarted the software.

A new algorithm, developed by researchers at the University of Rochester, thinks it’s cracked the problem, however. Analyzing Twitter feeds, it can distinguish between phrases like “gas the Skypes” (a substitute for “Jews”) and “I hate Skype” (which hopefully just means Skype) with an impressive 80-percent accuracy.

More: New Yahoo algorithm can spot online abuse in context, not just content

“We have developed an intelligent data analytics algorithm to track the constantly evolving hate codes which are designed to evade detection,” Professor Jiebo Luo, co-author of the paper, told Digital Trends. “We start with a set of known hate codes, retrieve hate messages containing these codes, [and] build a language model using machine learning techniques to recognize hate messages. On the basis of that, we do two things: [firstly], using the language model to detect ongoing hate messages that may contain new hate codes, and [secondly] using detected hate messages to identify hate spreaders whose new messages are used to discover new hate codes.”

The smart insight is working out which words correlate with others, thereby discovering when a stand-in word is being used for something else by adding context. Sure, these can be changed as well, but there’s only so many words a troll can change before rendering their original statement totally unintelligible.

In all, it’s a very smart use of machine learning. Yes, the boundaries about what it’s OK to say online are still being drawn — and are best worked out by private individuals and companies, not algorithms. But when it comes to stopping people from being confronted with hateful rhetoric online, tools like this go way beyond simple keyword searches.

Next up for the project? “We hope to get more data to make our model more robust and accurate,” Luo continued. “Ultimately, we hope the leading social media platforms such as Twitter, Facebook and so on can adopt our technology, which is described in this paper, and likely will be further developed for higher accuracy to clean up social media. It is our ongoing effort to use data science for social good.”

Read more

More News