阿塞拜疆语言文本分类: 使用机器学习和嵌入 (CS CompLang)

文本分类系统将有助于解决阿塞拜疆语中的文本聚类问题。针对外语的文本分类应用程序是存在的,但是我们尝试构建一个新开发的系统来解决阿塞拜疆语的这一问题。首先,我们试图找出潜在的实践领域。该系统将在许多领域中有用。它将主要用于新闻提要分类中。新闻网站可以自动将新闻分类为体育,商业,教育,科学等类别。该系统还用于情感分析以进行产品评论。例如,该公司在Facebook上分享新产品的照片,并且该公司收到有关新产品的一千条评论。系统将评论分为肯定或否定类别。该系统还可以应用于推荐的系统,垃圾邮件过滤等。我们已设计了各种机器学习技术,例如朴素贝叶斯(Naive Bayes),支持向量机(SVM)和决策树(Decision Trees),以解决阿塞拜疆语中的文本分类问题。

原文题目:Text Classification for Azerbaijani Language Using Machine Learning and Embedding

原文:Text classification systems will help to solve the text clustering problem in the Azerbaijani language. There are some text-classification applications for foreign languages, but we tried to build a newly developed system to solve this problem for the Azerbaijani language. Firstly, we tried to find out potential practice areas. The system will be useful in a lot of areas. It will be mostly used in news feed categorization. News websites can automatically categorize news into classes such as sports, business, education, science, etc. The system is also used in sentiment analysis for product reviews. For example, the company shares a photo of a new product on Facebook and the company receives a thousand comments for new products. The systems classify the comments into categories like positive or negative. The system can also be applied in recommended systems, spam filtering, etc. Various machine learning techniques such as Naive Bayes, SVM, Decision Trees have been devised to solve the text classification problem in Azerbaijani language.

原文作者:Umid Suleymanov,Behnam Kiani Kalejahi,Elkhan Amrahov,Rashid Badirkhanli

原文地址:https://arxiv.org/abs/1912.13362