Downstream Model Design of Pre-trained Language Model for Relation Extraction Task 论文笔记
Background
-
前情概要
Overlapping relations
- Normal: All relations in the sample is normal.
- EPO: there are at least two relations overlapped in the same entity-pair in the sample.
- SEO: there are at least two relations sharing a single entity.
Multiple relations
- Single: only one relation appears in the sample
- Double: two
- Multiple: no less than three
-
论文动机
- 当前的PLMs没有专门针对关系抽取这个任务设置
- long-distance relation/single sentence with multiple relations/overlapped relations三大问题现有的方法都没有很好的解决。
-
论文贡献
- 使用预训练语言模型替代之前的传统的编码器。
- 使用参数化非对称核内积矩阵(A parameterized asymmetric kernel inner product matrix)计算头实体和尾实体embedding(the head and tail embeddings of each token in a sequence),此矩阵可以被视为 the tendency score to indicate a certain relation.
- 使用Sigmoid分类器取代Softmax分类器,使用average probability as the final probability.这样做好的好处在于可以使得预测在同一实体间的多种关系。
- 两个创新点(Network structure; Loss function)
Model
Encoder:得到给定文本下的三种embedding.(==经过微调训练后,具有较高注意力得分的单词在某种程度上对应于某种关系的谓词==)
尾实体计算:E_{p}=Transformer(E_{w}),
E_{p} is the last output vector,E_{w}是BERT的倒数第二层 output vector。
头实体计算:E_{b}=E_{w}+E_{a}
E_{a}: BERT’s CLS embedding is added in order to capture the overall context information.
Relation Computing Layer
计算E_{b}和E_{p}之间的相似度。
\boldsymbol{S}_{i}=F_{i}\left(\boldsymbol{E}_{b}, \boldsymbol{E}_{p}\right)
其中,F_{i}(\boldsymbol{X}, \boldsymbol{Y})=\boldsymbol{X} \boldsymbol{W}_{h i} \cdot\left(\boldsymbol{Y} \boldsymbol{W}_{t i}\right)^{T}
W_{hi}和W_{ti}分别是头实体和尾实体在第i种关系下的transformation matrices.
S_{i}是一个方形矩阵,可以被视为非归一化概率分数 between all tokens in i-th relation,也就是说S_{i mn}表示==在位置(m,n)处的token存在关系i的可能性==
来整个归一化~:
Loss Calculation
NOTE:上述的P_{i}所描述的关系是between tokens,not entities.使用entity-mask matrix解决这个问题。
假设文本T中所有的实体对为集合S,\mathbb{S}=\{(x, y)\}. #排列组合#
我们构造 A mask matrix \boldsymbol{M}, \boldsymbol{M} \in \mathbb{R}^{l \times l}, l是 text length l. (B_{x},E_{x})是实体x位置index开始和结束。==Use this mask matrix to reserve the predicted probabilities of every entity pair from \boldsymbol{P_{i}} .==
其中,m,n是matrix element的下标,同时我们构造一个 lable matrix \boldsymbol{Y_{i}} (Ground Truth), \boldsymbol{Y} \in \mathbb{R}^{l \times l}
\boldsymbol{Y_{i}} is the labeled i-th relation set of entity pairs from the input text T
Then use average Binary Cross Entropy(二分类交叉熵)
L_{i}=B C E_{a v g}\left(\boldsymbol{P}_{i} * \boldsymbol{M}, \boldsymbol{Y}_{i}\right)
LOSS:
i是relation 的index