[论文清单]Long-tail Graph 长尾的图数据

  • 2021 年 4 月 19 日
  • AI

本论文清单由 @白小鱼 于21年4月17日整理[1]而成,最后更新时间为4月17日。
TODO:需要在论文清单基础上整理阅读笔记。

太长不看版

该交叉领域工作较少,截至目前在正式期刊会议[2] 的相关工作一共7篇,其中2篇A类,3篇B类,1篇C类。

注:B类会议中,CIKM的那两篇论文最切合题目。

正式会议期刊文章

CCF-A

[NAACL-HLT’19]Long-tail Relation Extraction via Knowledge Graph Embeddings and Graph Convolution Networks

We propose a distance supervised relation extraction approach for long-tailed, imbalanced data which is prevalent in real-world settings. Here, the challenge is to learn accurate “few-shot” models for classes existing at the tail of the class distribution, for which little data is available. Inspired by the rich semantic correlations between classes at the long tail and those at the head, we take advantage of the knowledge from data-rich classes at the head of the distribution to boost the performance of the data-poor classes at the tail. First, we propose to leverage implicit relational knowledge among class labels from knowledge graph embeddings and learn explicit relational knowledge using graph convolution networks. Second, we integrate that relational knowledge into relation extraction model by coarse-to-fine knowledge-aware attention mechanism. We demonstrate our results for a large-scale benchmark dataset which show that our approach significantly outperforms other baselines, especially for long-tail relations.

[IJCAI’20]Learning from the Scene and Borrowing from the Rich: Tackling the Long Tail in Scene Graph Generation

Despite the huge progress in scene graph generation in recent years, its long-tail distribution in object relationships remains a challenging and pestering issue. Existing methods largely rely on either external knowledge or statistical bias information to alleviate this problem. In this paper, we tackle this issue from another two aspects: (1) scene-object interaction aiming at learning specific knowledge from a scene via an additive attention mechanism; and (2) long-tail knowledge transfer which tries to transfer the rich knowledge learned from the head into the tail. Extensive experiments on the benchmark dataset Visual Genome on three tasks demonstrate that our method outperforms current state-of-the-art competitors. Our source code is available at github.com/htlsn/issg

知乎视频www.zhihu.com图标

CCF-B

[CIKM’20]Towards Locality-Aware Meta-Learning of Tail Node

中文解读:Meta和GNN的交易 @嬉嬉皮

Network embedding is an active research area due to the prevalence of network-structured data. While the state of the art often learns high-quality embedding vectors for high-degree nodes with abundant structural connectivity, the quality of the embedding vectors for low-degree or tail nodes is often suboptimal due to their limited structural connectivity. While many real-world networks are long-tailed, to date little effort has been devoted to tail node embedding. In this paper, we formulate the goal of learning tail node embeddings as a few-shot regression problem, given the few links on each tail node. In particular, since each node resides in its own local context, we personalize the regression model for each tail node. To reduce overfitting in the personalization, we propose a locality-aware meta-learning framework, called meta-tail2vec, which learns to learn the regression model for the tail nodes at different localities. Finally, we conduct extensive experiments and demonstrate the promising results of meta-tail2vec. (Supplemental materials including code and data are available at github.com/smufang/meta

知乎视频www.zhihu.com图标

[CIKM’20]Graph Prototypical Networks for Few-shot Learning on Attributed Networks

Attributed networks nowadays are ubiquitous in a myriad of highimpact applications, such as social network analysis, financial fraud detection, and drug discovery. As a central analytical task on attributed networks, node classification has received much attention in the research community. In real-world attributed networks, a large portion of node classes only contains limited labeled instances, rendering a long-tail node class distribution. Existing node classification algorithms are unequipped to handle the few-shot node classes. As a remedy, few-shot learning has attracted a surge of attention in the research community. Yet, few-shot node classification remains a challenging problem as we need to address the following questions: (i) How to extract meta-knowledge from an attributed network for few-shot node classification? (ii) How to identify the informativeness of each labeled instance for building a robust and effective model? To answer these questions, in this paper, we propose a graph meta-learning framework – Graph Prototypical Networks (GPN). By constructing a pool of semi-supervised node classification tasks to mimic the real test environment, GPN is able to perform meta-learning on an attributed network and derive a highly generalizable model for handling the target classification task. Extensive experiments demonstrate the superior capability of GPN in few-shot node classification.

知乎视频www.zhihu.com图标

[ICDM’20]CITIES: Contextual Inference of Tail-Item Embeddings for Sequential Recommendation

In the domain of the Dutch cultural heritage various data sets describe different aspects of life during the Dutch Golden Age. These data sets, in the form of RDF graphs, use different standards and contain noise in the values of literal nodes, such as misspelled names and uncertainty in dates. The Golden Agents project aims at answering queries about the Dutch Golden ages using these distributed and independently maintained data sets. A problem in this project, among many other problems, is the identification of persons who occur in multiple data sets but under different URI’s. This paper aims to solve this specific problem and generate a linkset, i.e. a set of pairs of URI’s which are judged to represent the same person. We use domain knowledge in the application of an existing node context generation algorithm to serve as input for GloVe, an algorithm originally designed for embedding words. This embedding is then used to train a classifier on pairs of URI’s which are known duplicates and non-duplicates. Using just the cosine similarity between URI-pairs in embedding space for prediction, we obtain a simple classifier with an F½-score of around 0.85, even when very few training examples are provided. On larger training sets, more complex classifiers are shown to reach an F½-score of up to 0.88.

无等级

[ICCV WORKSHOP’19] Visual Relationships as Functions:Enabling Few-Shot Scene Graph Prediction

Scene graph prediction – classifying the set of objects and predicates in a visual scene – requires substantial training data. The long-tailed distribution of relationships can be an obstacle for such approaches, however, as they can only be trained on the small set of predicates that carry sufficient labels. We introduce the first scene graph prediction model that supports few-shot learning of predicates, enabling scene graph approaches to generalize to a set of new predicates. First, we introduce a new model of predicates as functions that operate on object features or image locations. Next, we define a scene graph model where these functions are trained as message passing protocols within a new graph convolution framework. We train the framework with a frequently occurring set of predicates and show that our approach outperforms those that use the same amount of supervision by 1.78 at recall@50 and performs on par with other scene graph models. Next, we extract object representations generated by the trained predicate functions to train few-shot predicate classifiers on rare predicates with as few as 1 labeled example. When compared to strong baselines like transfer learning from existing state-of-the-art representations, we show improved 5-shot performance by 4.16 recall@1. Finally, we show that our predicate functions generate interpretable visualizations, enabling the first interpretable scene graph model.

上面的文献调研可能有遗漏,如果还有长尾分布的图数据,请推荐相关研究工作!

最后,特别感谢 @嬉嬉皮 推荐的论文。

参考

  1. ^//dblp.org/
  2. ^中国计算机学会推荐国际学术会议和期刊目录 //www.ccf.org.cn/Academic_Evaluation/By_category/