醫學案例報告命名實體:語料庫和經驗
- 2020 年 4 月 3 日
- 筆記
我們提出了一個新的語料庫,其中包括源自PubMed Central開放獲取資料庫案例報告中的醫學實體釋文。我們標註了案例報告中的案例、條件、發現、因素和否定修飾語。此外,若適合,我們還標註了這些實體之間的關係。正因如此,新語料庫是首個可供科學團體使用的英文語料庫。利用新語料庫,我們可以通過命名實體識別、關係提取和(句子/段落)關聯檢測等類似任務對案例報告中的自動資訊提取功能進行初步調查。此外,我們還提出了四個通過已命名數據集可以使用的醫學實體檢測用強大基準線系統。
原文標題:Named Entities in Medical Case Reports: Corpus and Experiments
We present a new corpus comprising annotations of medical entities in case reports, originating from PubMed Central's open access library. In the case reports, we annotate cases, conditions, findings, factors and negation modifiers. Moreover, where applicable, we annotate relations between these entities. As such, this is the first corpus of this kind made available to the scientific community in English. It enables the initial investigation of automatic information extraction from case reports through tasks like Named Entity Recognition, Relation Extraction and (sentence/paragraph) relevance detection. Additionally, we present four strong baseline systems for the detection of medical entities made available through the annotated dataset.
原文作者:Sarah Schulz, Jurica Ševa, Samuel Rodriguez, Malte Ostendorff, Georg Rehm
原文鏈接:https://arxiv.org/abs/2003.13032