謠言檢測(ClaHi-GAT)《Rumor Detection on Twitter with Claim-Guided Hierarchical Graph Attention Networks》

論文資訊

論文標題:Rumor Detection on Twitter with Claim-Guided Hierarchical Graph Attention Networks
論文作者:Erxue Min, Yu Rong, Yatao Bian, Tingyang Xu, Peilin Zhao, Junzhou Huang,Sophia Ananiadou
論文來源:2021,EMNLP 
論文地址:download 
論文程式碼:download

Background

  傳播結構為謠言的真假提供了有用的線索,但是現有的謠言檢測方法要麼局限於用戶相應關係,要麼簡化了對話結構。

  本文說的 Claim 代表的是 Source post ,即源帖。

1 Introduction

  如下為一個簡單的 conversation thread 例子:

  

  本文提出的點:考慮兄弟之間的關係,如下圖虛線部分。

  

2 Claim-guided Hierarchical Graph Attention Networks

  總體框架如下:

   

  本文的模型包括兩個注意力模組:

    • A Graph Attention to capture the importance of different neighboring tweets
    • A claim-guided hierarchical attention to enhance post content understanding

2.1 Claim-guided Hierarchical Attention

  對於每個 tweet $x_i$ ,首先使用 Bi-LSTM 獲得 Post 的特徵矩陣 $X=\left[c, x_{1}, x_{2}, \cdots, x_{|\mathcal{V}|-1}\right]^{\top}$ ,其中 $c, x_{i} \in \mathbb{R}^{d}$。

  為加強模型的主題一致性和語義推理:

Post-level Attention

  為了防止主題偏離和丟失 claim 的資訊,本文採用 gate module 決定它應該接受 claim 多少資訊,以更好地指導相關職位的重要性分配。claim-aware representation 具體如下:

    $\begin{array}{l}g_{c \rightarrow x_{i}}^{(l)} &=&\operatorname{sigmoid}\left(W_{g}^{(l)} h_{x_{i}}^{(l)}+U_{g}^{(l)} h_{c}^{(l)}\right) \\\tilde{h}_{x_{i}}^{(l)} &=&g_{c \rightarrow x_{i}}^{(l)} \odot h_{x_{i}}^{(l)}+\left(1-g_{c \rightarrow x_{i}}^{(l)}\right) \odot h_{c}^{(l)}\end{array}$

  其中,$g_{c \rightarrow x_{i}}^{(l)}$ 是一個 gate vector,$W_{g}^{(l)}$ 和 $U_{g}^{(l)}$ 是可學習參數。

  然後,將 claim-aware representation 與 original representation 拼接起來,作為 $\text{Eq.1}$ 的輸入去計算注意力權重:
    $\begin{array}{l}\hat{h}_{x_{i}}^{(l)}=\left[\tilde{h}_{x_{i}}^{(l)} \| h_{x_{i}}^{(l)}\right] \\\hat{\alpha}_{i, j}^{(l)}=\operatorname{Atten}\left(\hat{h}_{x_{i}}^{(l)}, \hat{h}_{x_{j}}^{(l)}\right)\end{array}$

2.2 Graph Attention Networks

  為了編碼結構資訊,本文使用 GAT encoder:
  輸入:$H^{(l)}=\left[h_{c}^{(l)}, h_{x_{1}}^{(l)}, h_{x_{2}}^{(l)}, \ldots, h_{x_{|\mathcal{V}|-1}}^{(l)}\right]^{\top}$
  過程
    ${\large \begin{aligned}\alpha_{i, j}^{(l)} &=\operatorname{Atten}\left(h_{x_{i}}^{(l)}, h_{x_{j}}^{(l)}\right) \\&=\frac{\exp \left(\phi\left(a^{\top}\left[W^{(l)} h_{x_{i}}^{(l)} \| W^{(l)} h_{x_{j}}^{(l)}\right]\right)\right)}{\sum_{j \in \mathcal{N}_{i}} \exp \left(\phi\left(a^{\top}\left[W^{(l)} h_{x_{i}}^{(l)} \| W^{(l)} h_{x_{j}}^{(l)}\right]\right)\right)}\end{aligned}} $

    $h_{x_{i}}^{(l+1)}=\operatorname{Re} L U\left(\sum\limits_{j \in \mathcal{N}_{i}} \alpha_{i, j}^{(l)} W^{(l)} h_{x_{j}}^{(l)}\right)$

  考慮多頭注意力:

    $h_{x_{i}}^{(l+1)}=\|_{k=1}^{K} \operatorname{ReLU}\left(\sum\limits _{j \in \mathcal{N}_{i}} \alpha_{i, j}^{(l, k)} W_{k}^{(l)} h_{x_{j}}^{(l)}\right)$

  替換輸出層的表示向量:

    ${\large h_{x_{i}}^{(L)}=\operatorname{Re} L U\left(\frac{1}{K} \sum\limits _{k=1}^{K} \sum\limits_{j \in \mathcal{N}_{i}} \alpha_{i, j}^{\left(l^{\prime}, k\right)} W_{k}^{\left(l^{\prime}\right)} h_{x_{j}}^{\left(l^{\prime}\right)}\right)} $

  輸出:圖表示

    $\bar{s}=\text { mean-pooling }\left(H^{(L)}\right)$

Event-level Attention

  出發點:獲得圖表示的時候採用的 平均池化並不是一定有意義的,可能存在某些節點對於圖分類來說更準確。

  受到 Natural Language Inference (NLI) 的影響,本文考慮對 GAT 最後一層的 $h_{c}^{(L)}$ 和  $\left.h_{x_{i}}^{(L)}: 1\right)$  做如下處理 :

    1)concatenation $\left[h_{c}^{(L)} \| h_{x_{i}}^{(L)}\right]$

    2)element-wise product $h_{\text {prod }}^{(L)}=h_{c}^{(L)} \odot h_{x_{i}}^{(L)}$

    3)absolute element-wise difference $h_{\text {diff }}^{(L)}=\left|h_{c}^{(L)}-h_{x_{i}}^{(L)}\right|$

  接著獲得一個聯合表示:

    $h_{x_{i}}^{c}=\tanh \left(F C\left(\left[h_{c}^{(L)}\left\|h_{x_{i}}^{(L)}\right\| h_{\text {prod }}^{(L)} \| h_{\text {diff }}^{(L)}\right]\right)\right)$

  通過使用該聯合表示計算 Event-level Attention :

    ${\large \begin{array}{l}b_{i} &=&\tanh \left(F C\left(h_{x_{i}}^{c}\right)\right) \\\beta_{i} &=&\frac{\exp \left(b_{i}\right)}{\sum_{i} \exp \left(b_{i}\right)} \\\hat{s} &&=\sum_{i} \beta_{i} h_{x_{i}}^{(L)}\end{array}} $

  最後將其 $\hat{S}$ 與 GAT 最後一層的平均池化圖表示 $\bar{s}$ 拼接作為最終圖表示,並進行分類:

    $\hat{y}=\operatorname{softmax}(F C([\hat{s} \| \bar{s}]))$

3 Experiments

3.1 Datasets

  

3.2 Rumor Classifification Performance 

TWITTER15 分類結果: 

   

PHEME 分類結果:

  

3.3 Ablation Study

  1) ClaHi-GAT/DT: Instead of the undirected interaction graph, we use the directed trees as the model input.

  2) GAT+EA+SC: We simply concatenate the features of the claim with the node features at each GAT layer, to replace the claim-aware representation.

  3) w/o EA: We discard the event-level (inference-based) attention as presented.

  4) w/o PA: We neglect the post-level (claim-aware) attention by leaving out the gating module introduced.

  5) GAT: The backbone model.

  6) GCN: The vanilla graph convolutional networks with no attention.

  

3.4 Evaluation of Undirected Interaction Graphs 

  1. ClaHi-GAT/DT Utilize the directional tree applied in past influential works as the modeling way instead of our proposed undirected interaction graph.
  2. ClaHi-GAT/DTS Based on the directional tree structure similar to ClaHi-GAT/DT but the explicit interactions between sibling nodes are taken into account.
  3. ClaHi-GAT/UD The modeling way is our undirected interaction topology but without considering the explicit correlations between sibling nodes that reply to the same target.
  4. ClaHi-GAT In this paper, we propose to model the conversation thread as an undirected interaction graph for our claim-guided hierarchical graph attention networks.

  

3.5 Early Rumor Detection

   

  關鍵點:隨著 claim 的傳播,或多或少會產生更多的語義資訊和雜訊,所以使用 claim 的資訊至關重要。

  舉例說明:false claim 的注意力分數得分圖如下:

  

  言下之意:錯誤的 post $x_2$ 會被賦予較小的權重,這就是為什麼該模型早期謠言檢測比較穩定的原因。