Downstream Model Design of Pre-trained Language Model for Relation Extraction Task 論文筆記

Background

  • 前情概要

    Overlapping relations

    • Normal: All relations in the sample is normal.
    • EPO: there are at least two relations overlapped in the same entity-pair in the sample.
    • SEO: there are at least two relations sharing a single entity.

    Multiple relations

    • Single: only one relation appears in the sample
    • Double: two
    • Multiple: no less than three
  • 論文動機

    1. 當前的PLMs沒有專門針對關係抽取這個任務設置
    2. long-distance relation/single sentence with multiple relations/overlapped relations三大問題現有的方法都沒有很好的解決。
  • 論文貢獻

    1. 使用預訓練語言模型替代之前的傳統的編碼器。
    2. 使用參數化非對稱核內積矩陣(A parameterized asymmetric kernel inner product matrix)計算頭實體和尾實體embedding(the head and tail embeddings of each token in a sequence),此矩陣可以被視為 the tendency score to indicate a certain relation.
    3. 使用Sigmoid分類器取代Softmax分類器,使用average probability as the final probability.這樣做好的好處在於可以使得預測在同一實體間的多種關係。
    4. 兩個創新點(Network structure; Loss function)

Model

image-20200721111530162

Encoder:得到給定文本下的三種embedding.(==經過微調訓練後,具有較高注意力得分的單詞在某種程度上對應於某種關係的謂詞==)

尾實體計算:E_{p}=Transformer(E_{w})

E_{p} is the last output vector,E_{w}是BERT的倒數第二層 output vector。

頭實體計算:E_{b}=E_{w}+E_{a}

E_{a}: BERT’s CLS embedding is added in order to capture the overall context information.

Relation Computing Layer

計算E_{b}和E_{p}之間的相似度。

\boldsymbol{S}_{i}=F_{i}\left(\boldsymbol{E}_{b}, \boldsymbol{E}_{p}\right)

其中,F_{i}(\boldsymbol{X}, \boldsymbol{Y})=\boldsymbol{X} \boldsymbol{W}_{h i} \cdot\left(\boldsymbol{Y} \boldsymbol{W}_{t i}\right)^{T}

W_{hi}和W_{ti}分別是頭實體和尾實體在第i種關係下的transformation matrices.

S_{i}是一個方形矩陣,可以被視為非歸一化概率分數 between all tokens in i-th relation,也就是說S_{i mn}表示==在位置(m,n)處的token存在關係i的可能性==

來整個歸一化~:
image-20200721165315257

Loss Calculation

NOTE:上述的P_{i}所描述的關係是between tokens,not entities.使用entity-mask matrix解決這個問題。

假設文本T中所有的實體對為集合S\mathbb{S}=\{(x, y)\}. #排列組合#

我們構造 A mask matrix \boldsymbol{M}, \boldsymbol{M} \in \mathbb{R}^{l \times l}, l是 text length l. (B_{x},E_{x})是實體x位置index開始和結束。==Use this mask matrix to reserve the predicted probabilities of every entity pair from \boldsymbol{P_{i}} .==

image-20200721170531920

其中,m,n是matrix element的下標,同時我們構造一個 lable matrix \boldsymbol{Y_{i}} (Ground Truth), \boldsymbol{Y} \in \mathbb{R}^{l \times l}

image-20200721170551484

\boldsymbol{Y_{i}} is the labeled i-th relation set of entity pairs from the input text T

Then use average Binary Cross Entropy(二分類交叉熵)

L_{i}=B C E_{a v g}\left(\boldsymbol{P}_{i} * \boldsymbol{M}, \boldsymbol{Y}_{i}\right)

image-20200721173443156

LOSS:

image-20200721173820784

i是relation 的index

image-20210131203604605

Experiment

image-20200721214140689

image-20200721173918237

image-20200721173942255