30th November 2021
By Cooper Gatewood, Shaun Ring and Julia Smirnova
The automated detection of hate speech is a growing field of research and practice with advances in natural language processing (NLP) technology enabling progress over recent years. Despite ongoing constraints and challenges, NLP classification approaches allow us to find potentially harmful content in vast amounts of data that are almost impossible to analyse through manual approaches alone. These types of tools are already widely utilised, including by social media platforms who use them to detect hateful content. Companies’ claims of the accuracy and efficacy of these classifiers, however, deserve scrutiny, as the latest controversy around Facebook’s efforts has shown.
The COVID-19 pandemic has led to a rise of antisemitic conspiracy narratives and hate speech online. An ISD analysis of German-language data from Facebook, Twitter and Telegram showed a thirteen-fold growth in antisemitic comments between January 2020 and March 2021.
ISD and the Centre for Analysis of Social Media (CASM) have developed an interdisciplinary approach to developing NLP algorithms, combining social scientific knowledge with machine learning expertise. These algorithms, which operate in English and French, were used to analyse antisemitic content in Germany-related threads on 4chan, online hateful speech in France, and harassment of politicians in the United States.
Video platforms like YouTube present a greater challenge to NLP approaches than primarily text- and image-based media such as Facebook and Twitter. As a result, there remains a real gap in quantitative evidence about antisemitism and related issues on the platform. This is problematic because extremists, conspiracy theorists and COVID sceptics continue to use YouTube, despite the platform’s efforts to enforce policies banning hate speech and abuse.
Our recent project classifying YouTube comments in German seeks to contribute to research and public debate in several ways:
- It allows us to analyse the volume of antisemitic expressions in the comments section of German-language YouTube videos
- It allows us to identify videos that draw high numbers of antisemitic comments, even when not every antisemitic comment is detected
- It identifies the most difficult linguistic and contextual challenges in automated classification of antisemitism, informing where manual vetting and oversight are most important
Challenges of automated approaches
ISD’s work has been informed by the previous research on the limitations and opportunities of natural language processing. Hate speech is nuanced and notoriously difficult to label even for human annotators. As Vidgen et al point out, the lack of clarity in definitions as well as linguistic difficulties such as use of humour or irony, spelling variations, polysemy (a single word form having several different meanings), and accounting for context, pose challenges for detection of harmful content.
Moreover, language changes over time. Particularly in the case of antisemitic tropes, online users use coded language and allusions (as demonstrated by previous ISD research and reports from the ‘Decoding antisemitism’ project), which makes it difficult to label such content. Other challenges, such as classification biases and decisions on how to construct training data with a representative sample of abusive and non-abusive content, are currently being discussed by the research community. Finally, research ethics questions including potential harm for researchers and annotators must continually be accounted for in the designing and applying these approaches.
Definition and labelling process
For this project we used the definition of antisemitism by the International Holocaust Remembrance Alliance (IHRA): “A certain perception of Jews, which may be expressed as hatred towards Jews. Rhetorical and physical manifestations of antisemitism are directed towards Jewish or non-Jewish individuals and/or their property, towards Jewish community institutions and religious facilities”. Further, we used 11 specific examples provided by the IHRA for different manifestations of antisemitism during our annotation. In particular, we followed the IHRA definitions to distinguish criticism of Israel from antisemitism (the latter manifesting in for example “denying the Jewish people their right to self-determination, e.g., by claiming that the existence of a State of Israel is a racist endeavour” or “holding Jews collectively responsible for actions of the state of Israel”). The content was labelled by two ISD analysts independently from each other. In cases of disagreement or uncertainty, we consulted colleagues with expertise in antisemitism. We made decisions based on joint discussion, which was particularly important for edge cases and increased the consistency of annotation.
To avoid using a dataset with too little abusive content to train the algorithm, we created a training dataset where a substantial proportion of content was antisemitic. We collated a list of German-language videos meeting this criterion from previous ISD research. Using the public YouTube API we gathered all comments from these videos, removing non-German comments to reduce noise.
The training dataset consisted of 46,215 comments posted on 1,753 videos from 2 January 2021 to 30 July 2021.
Relevance to discussions about Jews and Israel
To filter for relevance, we first searched for comments containing specific keywords that were likely to be associated with discussions of interest, without being necessarily antisemitic. This was a broad list of 284 keywords containing words associated with discussions of Israel and Jews, including some antisemitic keywords. Following this keyword-based filter, we trained an algorithm to identify relevant and irrelevant comments. Relevance was defined as comments that related to Judaism, the Jewish people or the state of Israel. This included anything obviously antisemitic, but also discussions about politics, religion, and Jewish communities in different countries.
After coding over 600 individual comments, the algorithm demonstrated a 92% accuracy at identifying relevant content. This level of accuracy was corroborated by analysts in a manual review of a random sample of comments classified as relevant. This process classified 37,552 comments, 81% of the dataset, as relevant.
Related to antisemitism
We applied one further layer of keyword filters to ensure only relevant content was fed into subsequent classifiers. We used a final list of 83 keywords informed by the work on the previous classifier to refine the sample.
Initially, the next step was envisioned to be an algorithm that identified antisemitic content from among relevant comments. However, as analysts began coding the comments, we realised that this would need to be broken out into two individual steps.
First, we needed to identify comments related to antisemitism. This is because our dataset contained a substantial amount of discussion about antisemitism, e.g. calling out antisemitism, or claims about what constitutes antisemitism including statements like “X is an antisemite because he thinks that Israel does not have a right to exist”.
Because the algorithm had difficulties in distinguishing statements like the one mentioned above from explicitly antisemitic ones, we trained it to first identify comments about antisemitism. The definition used for this subset was: any comments discussing what constitutes antisemitism, calling out others for antisemitism, or expressing antisemitism.
After coding nearly 500 individual comments, the algorithm demonstrated a 78% accuracy. Upon manual review of a random sample, analysts confirmed that this accuracy was likely closer to 80%. This algorithm classified 745 comments as pertinent to antisemitism, roughly 1.6% of the overall dataset.
Finally, analysts manually coded over 200 comments to train an algorithm to identify antisemitic content. This included incitement to or promotion of violence toward Jewish people, antisemitic conspiracy theories, and antisemitic attacks aimed at individuals. The algorithm demonstrated an 80% accuracy, and classified 530 comments as explicitly antisemitic, roughly 1.1% of the overall dataset.
Refining classifiers for new data
Following this training, a new sample of data was collected and run through this classification pipeline. To ensure the accuracy of the NLP classifiers was maintained for new data, an additional 20 messages were coded for each classifier. Manual review of the final antisemitic sample indicated that the classifiers remained accurate with this new data. Analysis of this data will be featured in future Digital Dispatches.
This approach for identifying antisemitic content online has two primary limitations. The first is related to data collection. The YouTube API does not allow for mass collection of comments, whether a random sample of all comments on the platform or a collection on the basis of keywords. As such, any analysis done on comments must first identify relevant channels and/or videos, whose comments are then collected. As such, it is nearly impossible to constitute what may be a representative sample of YouTube comments, making generalisable findings difficult to produce. Any findings are limited to the sample in question, though they may be indicative of broader trends on the platform. In addition, some of the more obvious antisemitic expressions may have been removed by YouTube moderators, meaning we were looking at more subtle language and had a much smaller relevant dataset to use in the training.
The second limitation is related to the nature of natural language processing algorithms. No algorithm will be 100% accurate, especially in a field where the definitions can be contested or divisive, as with antisemitism. Indeed, as our analysts were coding individual comments, distinctions between what was antisemitic and what was not were at times difficult to make and agree on. However, this limitation is not exclusive to NLP-based approaches. Manual coding of large datasets can also introduce error and coders may not always agree on the interpretation of operational definitions. Each algorithm in this workflow was around or over 80% accurate, a relatively high standard for this type of work.
Despite all the constraints, the opportunities presented by automated systems of hate speech detection make them a valuable tool for researchers, practitioners and social media platforms alike. A combination of manual and automated approaches to detection of harmful content appears to the most efficient, if classifiers are used in a responsible way and are worked on to make them as precise and adaptable to changing language as possible. When it comes to research, classifiers allow to identify patterns in the spread of harmful content, prolific users, most affected groups and changes in volume over time better than keyword-based approaches alone.
The result of our project is a highly accurate classifier that can be used to analyse German-language antisemitic content on YouTube. Further Digital Dispatches in this series will present the results of initial analysis using the classifier.