謠言檢測（）《Data Fusion Oriented Graph Convolution Network Model for Rumor Detection》

2022 年 10 月 9 日
筆記
謠言檢測

論文信息

論文標題：Data Fusion Oriented Graph Convolution Network Model for Rumor Detection
論文作者：Erxue Min, Yu Rong, Yatao Bian, Tingyang Xu, Peilin Zhao, Junzhou Huang,Sophia Ananiadou
論文來源：2020,IEEE Transactions on Network and Service Management
論文地址：download
論文代碼：download

1 Introduction

　　本文不僅考慮了用戶的基本信息和文本內容等靜態特徵，還考慮了謠言傳播關係等動態特徵。我們還對特徵融合模塊和池化模塊進行了優化，使模型具有更好的性能。

　　本文貢獻：

- Considering the real dataset from social media, we extract static features such as users』 basic information and text contents, as well as dynamic features such as rumor propagation relations, and propose the data fusion method.
- GCN is introduced into the rumor detection task, which represents the rumor propagation mode. And we propose to select the suitable graph convolution operator to update the node vectors, and improve the feature fusion and pooling module.
- Experiments based on Sina Weibo dataset validate the performance of the propsed GCN-based model for rumor detection.

2 Main

　　整體框架如下：

　　主要包括如下四個模塊：

1. the feature extraction module
2. the feature fusion module
3. the graph convolution module
4. the pooling module

2.1 Feature Extraction Module

2.1.1 Features of User Basic Information

　　常見的 User basic information：

　　加入這些特徵的原因：如 gender 為女的情況下，是謠言的概率更高。

　　特徵預處理：

　　對於 gender 採用 One-hot 向量；

　　對於追隨者特徵，採用的是 Min-Max normalization ，但是這對於普通用戶（如擁有 follower 小的用戶）用以造成大部分的數值為 $0$，所以本文採用 $\text{log}$ 處理，如下：

　　　　$x^{*}=\left\{\begin{array}{ll}\frac{\log x-\log x_{\min }}{\log x_{\max }-\log x_{\min }} & x>0 \\0 & x=0\end{array}\right\} \quad\quad\quad(2)$

　　其中，$x$ 代表歸一化前的追隨者數量，$x^{*} $ 表示標準化值，$x_{\min }$ 和 $x_{\max }$ 表示中的最小和最大追隨者數量。

2.1.2 User Similarity Feature

　　考慮用戶相似性，首先構造一個 user-event matrix $M$，其中 User 有 $N_{1}$ 個，event 有 $N_{2}$ 個，所以 $M \in N_{1} \times N_{2}$ 。可以預見的是 $M$ 是一個稀疏矩陣，所以本文採用 SVD 分解：

　　　　$A=U \Sigma V^{T}\quad\quad\quad(3)$

　　其中 $A$ 為需要分解的矩陣，$U$ 為左奇異值矩陣，$\Sigma$ 為對角矩陣，對角元素為奇異值，$V$ 為右奇異值矩陣。根據奇異值分解在推薦系統中的應用思想，我們可以取前 $N$ 個奇異值，計算 $\Sigma$ 與 $U$ 之間的點積，得到用戶的向量表示，從而實現降維的目的。最後，每個用戶都將有一個 $N$ 維的向量表示。兩個用戶向量之間的距離越近，它們共同參與的事件的數量就越多。基於同樣的思想，還可以構建 users-users 之間的矩陣，矩陣元素表示兩個用戶都參與的事件的數量。然後使用相同的方法為用戶生成另一組向量特徵，並將基於用戶-事件矩陣分解為用戶相似性特徵的向量相結合。

2.1.3 Representation of Text Content

　　使用 $BERT_{base}$ Chinese model 提取文本表示。

2.1.4 Feature Fusion Module

　　直接拼接特徵會導致訓練不穩定，在 Fig. 3 ，$x \in R^{N \times D_{1}}$ 和 $x^{\prime} \in R^{N \times D_{2}}$ 代表兩個不同的特徵向量， $N$ 代表節點的數量，$D_{1}$ 和 $D_{2}$ 代表節點的維度。首先將上述特徵放入一個兩層的 MLP 模塊，然後執行 BN ，

　　　　$\begin{array}{l}\mu \leftarrow \frac{1}{m} \sum\limits_{i=0}^{m} h_{i} \\\sigma^{2} \leftarrow \frac{1}{m} \sum\limits_{i=0}^{m}\left(h_{i}-\mu\right)^{2} \\\hat{h}_{l} \leftarrow \frac{h_{i}-\mu}{\sqrt{\sigma^{2}+\varepsilon}} \\w_{i} \leftarrow \gamma \hat{h}_{i}+\beta\end{array}$

　　其中，$\gamma$ 和 $\beta$ 是可學習參數。

　　最後再執行 concat 。

2.1.5 Graph Convolution Module

　　GCN 可以編碼局部圖的結構和節點特徵。其正向傳播公式如下：

　　　　$H^{(l+1)}=\sigma\left(\tilde{D}^{-\frac{1}{2}} \tilde{A} \tilde{D}^{-\frac{1}{2}} H^{(l)} W^{(l)}\right) \quad\quad\quad(8)$

　　由於 GCN 存在無法識別 multiset 的情況，所以本文使用 GIN backbone ：

　　　　$w_{v}^{k}=N N^{k}\left(\left(1+\varepsilon^{k}\right) \cdot w_{v}^{k-1}+\sum\limits _{u \in N(v)} w_{u}^{k-1}\right)$

　　最後，將通過 GIN 生成的節點向量放入一個 $3$ 層的全連接網絡，並加入殘差結構：

　　　　$\widetilde{w}=w+F(w)$

2.1.6 Pooling Module

　　常見的池化操作包括 average pooling 和 maximum pooling，分別如 $\text{Eq.11}$ $\text{Eq.12}$ 所示：

　　　　$\begin{array}{l}h_{G}=\frac{1}{m} \sum\limits _{i=0}^{m} \widetilde{w_{i}} \\h_{G}=\max \left(\widetilde{w_{0}}, \widetilde{w_{1}}, \ldots, \widetilde{w_{m}}\right)\end{array}$

　　平均池化是為了獲得圖中所有節點的平均向量作為圖向量，最大池化是選擇此維度中所有節點的最大值作為每個維度的輸出。

　　Note：一種新的池化方案，先將節點的表示向量按值降序排列後，選擇頂部的 $k$ 個節點，拼接 $k$ 節點向量後，採用一維卷積法進行特徵壓縮，壓縮後的向量為最終的圖表示。

　　本文採取的池化過程：將GIN 每層的輸入進行concat ，然後使用 Note 中的池化策略。

　　　　$h_{G}=\operatorname{Pooling}\left(\text { Concat }\left(\left\{\widetilde{w_{v}^{k}} \mid k=0,1, \ldots, K\right\}\right) \mid v \in V\right)$

　　最後使用 $h_{G}$ 進行分類：

　　　　$\hat{y}=\operatorname{softmax}\left(F C\left(h_{G}\right)\right)$

3 Experiment

Dataset

Results

Tags: 謠言檢測

謠言檢測（）《Data Fusion Oriented Graph Convolution Network Model for Rumor Detection》

論文信息

1 Introduction

2 Main

2.1.1 Features of User Basic Information

2.1.2 User Similarity Feature

2.1.3 Representation of Text Content

2.1.4 Feature Fusion Module

2.1.5 Graph Convolution Module

2.1.6 Pooling Module

3 Experiment

VirMach 便宜 VPS

QNews

謠言檢測（）《Data Fusion Oriented Graph Convolution Network Model for Rumor Detection》

論文信息

1 Introduction

2 Main

2.1.1 Features of User Basic Information

2.1.2 User Similarity Feature

2.1.3 Representation of Text Content

2.1.4 Feature Fusion Module

2.1.5 Graph Convolution Module

2.1.6 Pooling Module

3 Experiment

分享此文：

Related Posts

Python3基礎語法和基本數據類型

在Vue&Element前端項目中，對於字典列表的顯示處理

計算機保研，maybe this is all you need（普通雙非學子上岸浙大工程師數據科學項目）

第 46 屆 ICPC 國際大學生程序設計競賽亞洲區域賽（瀋陽）

VirMach 便宜 VPS

QNews

熱門搜尋