攻击性语言检测:比较分析(CS CL)
- 2020 年 1 月 14 日
- 筆記
在网络社区里,攻击性行为已经变得很普遍。个人在网络世界中利用匿名的优势,沉迷于他们在现实生活中可能不会考虑的攻击性交流。政府、在线社区、公司等都在投资防止社交媒体上的攻击性行为内容。解决这个谜题最有效的方法之一是使用计算技术来识别攻击性内容并采取行动。目前的工作重点是检测英语推文中的攻击性语言。该实验使用的数据集来自SemEval-2019 Task 6关于社交媒体中攻击性语言的识别和分类(seval)。数据集包含14460条注释的英文tweet。本文提出了一种基于比较分析和随机厨房水槽(RKS)的攻击性语言检测方法。探讨了基于谷歌语句编码器、快速文本、动态模式分解(DMD)特征和随机厨房水槽(RKS)方法对攻击性语言检测的有效性。通过实验和评价,我们观察到RKS与法斯泰科有竞争性的结果。所使用的评价指标为准确性、精密度、召回率、f1-score。
原文题目:Offensive Language Detection: A Comparative Analysis
原文:Offensive behaviour has become pervasive in the Internet community. Individuals take the advantage of anonymity in the cyber world and indulge in offensive communications which they may not consider in the real life. Governments,online communities, companies etc are investing into prevention of offensive behaviour content in social media. One of the most effective solution for tacking this enigmatic problem is the use of computational techniques to identify offensive content and take action. The current work focuses on detecting offensive language in English tweets. The dataset used for the experiment is obtained from SemEval-2019 Task 6 on Identifying and Categorizing Offensive Language in Social Media (OffensEval). The dataset contains 14,460 annotated English tweets. The present paper provides a comparative analysis and Random kitchen sink (RKS) based approach for offensive language detection. We explore the effectiveness of Google sentence encoder, Fasttext, Dynamic mode decomposition (DMD) based features and Random kitchen sink (RKS) method for offensive language detection. From the experiments and evaluation we observed that RKS with fastetxt achieved competing results. The evaluation measures used are accuracy, precision, recall, f1-score.
原文作者:Vyshnav M T, Sachin Kumar S, Soman K P
原文地址:https://arxiv.org/abs/2001.03131