Downstream Model Design of Pre-trained Language Model for Relation Extraction Task 論文筆記
Background
-
前情概要
Overlapping relations
- Normal: All relations in the sample is normal.
- EPO: there are at least two relations overlapped in the same entity-pair in the sample.
- SEO: there are at least two relations sharing a single entity.
Multiple relations
- Single: only one relation appears in the sample
- Double: two
- Multiple: no less than three
-
論文動機
- 當前的PLMs沒有專門針對關係抽取這個任務設置
- long-distance relation/single sentence with multiple relations/overlapped relations三大問題現有的方法都沒有很好的解決。
-
論文貢獻
- 使用預訓練語言模型替代之前的傳統的編碼器。
- 使用參數化非對稱核內積矩陣(A parameterized asymmetric kernel inner product matrix)計算頭實體和尾實體embedding(the head and tail embeddings of each token in a sequence),此矩陣可以被視為 the tendency score to indicate a certain relation.
- 使用Sigmoid分類器取代Softmax分類器,使用average probability as the final probability.這樣做好的好處在於可以使得預測在同一實體間的多種關係。
- 兩個創新點(Network structure; Loss function)
Model
Encoder:得到給定文本下的三種embedding.(==經過微調訓練後,具有較高注意力得分的單詞在某種程度上對應於某種關係的謂詞==)
尾實體計算:E_{p}=Transformer(E_{w}),
E_{p} is the last output vector,E_{w}是BERT的倒數第二層 output vector。
頭實體計算:E_{b}=E_{w}+E_{a}
E_{a}: BERT’s CLS embedding is added in order to capture the overall context information.
Relation Computing Layer
計算E_{b}和E_{p}之間的相似度。
\boldsymbol{S}_{i}=F_{i}\left(\boldsymbol{E}_{b}, \boldsymbol{E}_{p}\right)
其中,F_{i}(\boldsymbol{X}, \boldsymbol{Y})=\boldsymbol{X} \boldsymbol{W}_{h i} \cdot\left(\boldsymbol{Y} \boldsymbol{W}_{t i}\right)^{T}
W_{hi}和W_{ti}分別是頭實體和尾實體在第i種關係下的transformation matrices.
S_{i}是一個方形矩陣,可以被視為非歸一化概率分數 between all tokens in i-th relation,也就是說S_{i mn}表示==在位置(m,n)處的token存在關係i的可能性==
來整個歸一化~:
Loss Calculation
NOTE:上述的P_{i}所描述的關係是between tokens,not entities.使用entity-mask matrix解決這個問題。
假設文本T中所有的實體對為集合S,\mathbb{S}=\{(x, y)\}. #排列組合#
我們構造 A mask matrix \boldsymbol{M}, \boldsymbol{M} \in \mathbb{R}^{l \times l}, l是 text length l. (B_{x},E_{x})是實體x位置index開始和結束。==Use this mask matrix to reserve the predicted probabilities of every entity pair from \boldsymbol{P_{i}} .==
其中,m,n是matrix element的下標,同時我們構造一個 lable matrix \boldsymbol{Y_{i}} (Ground Truth), \boldsymbol{Y} \in \mathbb{R}^{l \times l}
\boldsymbol{Y_{i}} is the labeled i-th relation set of entity pairs from the input text T
Then use average Binary Cross Entropy(二分類交叉熵)
L_{i}=B C E_{a v g}\left(\boldsymbol{P}_{i} * \boldsymbol{M}, \boldsymbol{Y}_{i}\right)
LOSS:
i是relation 的index