Toxic Comment Detection in Russian

As a consequence, the ability to automatically identify and moderate toxic content on the Internet to eliminate the negative consequences is one of the necessary tasks for modern society. This article aims at the automatic detection of toxic comments in the Russian language. As a source of data, we utilized anonymously published Kaggle dataset and additionally validated its annotation quality. To build a classification model, we performed fine-tuning of two versions of Multilingual Universal Sentence Encoder, Bidirectional Encoder Representations from Transformers, and ruBERT. Finetuned ruBERT achieved _F_1 = 92.20%, demonstrating the best classification score. We made trained models and code samples publicly available to the research community.

1. Introduction

Nowadays, social network sites have become one of the key ways to express opinions online. The rapid growth of content has led to the fact that the amount of unverified information is increasing every day. Freedom of expression of various points of view, including toxic, aggressive, and abusive comments, might have a long-term negative impact on people’s opinions and social cohesion. Thus, the ability to automatically identify toxic speech and inappropriate content on the Internet to eliminate the negative consequences is one of the necessary tasks for modern society. A significant amount of studies has already been conducted by large companies [23], [26], [39], [47], however, for social acceptance of such systems limiting the right of Free Speech a good understanding and publicly available research is necessary.

A growing number of evaluation tracks such as [3], [21], [42] were organized in recent years, and the best detection approaches were evaluated. Currently, advanced deep learning techniques tend to be the superior method for this task [1], [35]. While some papers directly examined the detection of toxic language, abusive and hate speech for Russian-language [2], [8], [17], there is only one publicly available dataset of Russian-language toxic comments [5]. This dataset was published at Kaggle without any details about the annotation process, so it can be unreliable to use this dataset in academic and applied projects without deep examination.

This paper focuses on the automatic detection of toxic comments in Russian-language texts. To do this, we performed annotation validation of Russian Language Toxic Comments Dataset [5]. Next, we build classification models by exploring transfer learning of pre-trained Multilingual version of pre-trained Multilingual Universal Sentence Encoder (M-USE) [48], Bidirectional Encoder Representations from Transformers (M-BERT) [13] and ruBERT [22]. The top-performing model ruBERT-Toxic achieved _F_1 = 92.20% in a binary classification task. We made the sample code and fine-tuned M-BERT and M-USE models publicly available at GitHub.

#russian-language #social-media #neural-networks #toxic-comment #machine-learning

1. Introduction

towardsdatascience.com

Toxic Comment Detection in Russian