医学案例报告命名实体:语料库和经验

我们提出了一个新的语料库,其中包括源自PubMed Central开放获取资料库案例报告中的医学实体释文。我们标注了案例报告中的案例、条件、发现、因素和否定修饰语。此外,若适合,我们还标注了这些实体之间的关系。正因如此,新语料库是首个可供科学团体使用的英文语料库。利用新语料库,我们可以通过命名实体识别、关系提取和(句子/段落)关联检测等类似任务对案例报告中的自动信息提取功能进行初步调查。此外,我们还提出了四个通过已命名数据集可以使用的医学实体检测用强大基线系统。

原文标题:Named Entities in Medical Case Reports: Corpus and Experiments

We present a new corpus comprising annotations of medical entities in case reports, originating from PubMed Central's open access library. In the case reports, we annotate cases, conditions, findings, factors and negation modifiers. Moreover, where applicable, we annotate relations between these entities. As such, this is the first corpus of this kind made available to the scientific community in English. It enables the initial investigation of automatic information extraction from case reports through tasks like Named Entity Recognition, Relation Extraction and (sentence/paragraph) relevance detection. Additionally, we present four strong baseline systems for the detection of medical entities made available through the annotated dataset.

原文作者:Sarah Schulz, Jurica Ševa, Samuel Rodriguez, Malte Ostendorff, Georg Rehm

原文链接:https://arxiv.org/abs/2003.13032