Downstream Model Design of Pre-trained Language Model for Relation Extraction Task 论文笔记

Background

  • 前情概要

    Overlapping relations

    • Normal: All relations in the sample is normal.
    • EPO: there are at least two relations overlapped in the same entity-pair in the sample.
    • SEO: there are at least two relations sharing a single entity.

    Multiple relations

    • Single: only one relation appears in the sample
    • Double: two
    • Multiple: no less than three
  • 论文动机

    1. 当前的PLMs没有专门针对关系抽取这个任务设置
    2. long-distance relation/single sentence with multiple relations/overlapped relations三大问题现有的方法都没有很好的解决。
  • 论文贡献

    1. 使用预训练语言模型替代之前的传统的编码器。
    2. 使用参数化非对称核内积矩阵(A parameterized asymmetric kernel inner product matrix)计算头实体和尾实体embedding(the head and tail embeddings of each token in a sequence),此矩阵可以被视为 the tendency score to indicate a certain relation.
    3. 使用Sigmoid分类器取代Softmax分类器,使用average probability as the final probability.这样做好的好处在于可以使得预测在同一实体间的多种关系。
    4. 两个创新点(Network structure; Loss function)

Model

image-20200721111530162

Encoder:得到给定文本下的三种embedding.(==经过微调训练后,具有较高注意力得分的单词在某种程度上对应于某种关系的谓词==)

尾实体计算:E_{p}=Transformer(E_{w})

E_{p} is the last output vector,E_{w}是BERT的倒数第二层 output vector。

头实体计算:E_{b}=E_{w}+E_{a}

E_{a}: BERT’s CLS embedding is added in order to capture the overall context information.

Relation Computing Layer

计算E_{b}和E_{p}之间的相似度。

\boldsymbol{S}_{i}=F_{i}\left(\boldsymbol{E}_{b}, \boldsymbol{E}_{p}\right)

其中,F_{i}(\boldsymbol{X}, \boldsymbol{Y})=\boldsymbol{X} \boldsymbol{W}_{h i} \cdot\left(\boldsymbol{Y} \boldsymbol{W}_{t i}\right)^{T}

W_{hi}和W_{ti}分别是头实体和尾实体在第i种关系下的transformation matrices.

S_{i}是一个方形矩阵,可以被视为非归一化概率分数 between all tokens in i-th relation,也就是说S_{i mn}表示==在位置(m,n)处的token存在关系i的可能性==

来整个归一化~:
image-20200721165315257

Loss Calculation

NOTE:上述的P_{i}所描述的关系是between tokens,not entities.使用entity-mask matrix解决这个问题。

假设文本T中所有的实体对为集合S\mathbb{S}=\{(x, y)\}. #排列组合#

我们构造 A mask matrix \boldsymbol{M}, \boldsymbol{M} \in \mathbb{R}^{l \times l}, l是 text length l. (B_{x},E_{x})是实体x位置index开始和结束。==Use this mask matrix to reserve the predicted probabilities of every entity pair from \boldsymbol{P_{i}} .==

image-20200721170531920

其中,m,n是matrix element的下标,同时我们构造一个 lable matrix \boldsymbol{Y_{i}} (Ground Truth), \boldsymbol{Y} \in \mathbb{R}^{l \times l}

image-20200721170551484

\boldsymbol{Y_{i}} is the labeled i-th relation set of entity pairs from the input text T

Then use average Binary Cross Entropy(二分类交叉熵)

L_{i}=B C E_{a v g}\left(\boldsymbol{P}_{i} * \boldsymbol{M}, \boldsymbol{Y}_{i}\right)

image-20200721173443156

LOSS:

image-20200721173820784

i是relation 的index

image-20210131203604605

Experiment

image-20200721214140689

image-20200721173918237

image-20200721173942255