NeuralCoref: 用指代消解來做一個「能多輪對話的問答對話機器人」

  • 2019 年 10 月 30 日
  • 筆記

版權聲明:本文為部落客原創文章,遵循 CC 4.0 BY-SA 版權協議,轉載請附上原文出處鏈接和本聲明。

本文鏈接:https://blog.csdn.net/blmoistawinde/article/details/81782992

多輪對話對於目前的聊天機器人來說依然是一個難題,比如下面這個:

顯然當我問小冰第二個問題的時候,小冰並不知道我說的「她」指的是楊超越。從回答來看,小冰甚至把「她」當作了Angela Baby? 第三個問題就更不說了。 這裡沒有半點貶低小冰的意思,我也相信未來的某位讀者看到這裡時,小冰已經不會犯這樣的錯誤了。不過就寫作時來說,小冰應該只純粹利用了我當前的問句進行回答,導致多輪對話幾乎不能正常進行。 指代消解是有希望幫助解決這個問題的一個技術,下面就利用我們剛剛學過的NeuralCoref【NeuralCoref: python的共指消解工具,向代詞指代的問題進軍!】來寫一個「能多輪對話的問答對話機器人」demo吧。

上程式碼

import warnings  warnings.filterwarnings("ignore")  import spacy  nlp = spacy.load('en_coref_sm')

為了簡潔地展現共指消解的應用,這裡將把問答系統部分的難度降到最低,直接有了問題-答案的字典映射。

QA = {"Who is Abraham Lincoln?":"An American statesman and lawyer who served as the 16th President of the United States.",        "When was Abraham Lincoln born?":"February 12, 1809.",        "Where is Abraham Lincoln's hometown?":"Hodgenville, Kentucky"}

這些問題沒有辦法應付代詞,然而人在有上下文的對話中使用代詞是再自然不過的事了。用共指消解就可以解決這個問題。我們會把每一次的問答記錄都記錄在上下文中,這樣我們就可以用共指消解把之前提到的對象再搬到後面的代詞里來,使得有代詞的問題也可以與原始模板匹配。

讓我們先實驗一下這個想法是否可行。

para = "Who is Abraham Lincoln? When was he born? Where is his hometown?"  doc = nlp(para)  print(doc._.coref_clusters)  print(doc._.coref_resolved)
[Abraham Lincoln: [Abraham Lincoln, he, his]]  Who is Abraham Lincoln? When was Abraham Lincoln born? Where is Abraham Lincoln hometown?

效果不錯,he和his都識別了出來。然而這邊還有一個問題,就是物主代詞his再被翻譯回來時沒有按照語法規則恢復』s。另外,我們也要能夠把修改好單獨問句再從上下文中「抽出來」。所以我們要自己寫一個函數,用到mention.start_char這些屬性來手動完成替換和考慮些特殊情況。

我們的最終目標是,實現一個直觀的answer(question)函數,直接根據當前的問題給出答案,實現如下:

context = ""
def my_coref(orig_text,to_replace):      left = 0      processed_text = ""      for beg,end,mention in to_replace:          processed_text += orig_text[left:beg] + mention          left = end      processed_text += orig_text[left:]      return processed_text
def answer(question):      global context      start_pos = len(context)      context += (question + " ")      print("context:",context)      if question in QA:          return QA[question]      else:          doc = nlp(context)          if doc._.has_coref:              print(doc._.coref_clusters)              to_replace = []              for clust in doc._.coref_clusters:                  main_mention = clust.main                  for mention in clust.mentions:                      beg, end = mention.start_char - start_pos, mention.end_char - start_pos                      if end > 0:                                     # 是本句中的指代                          if mention.text in ["its","his","her","my","your","our","their"]:                              to_replace.append((beg,end,main_mention.text+"'s"))                          else:                              to_replace.append((beg,end,main_mention.text))              to_replace = sorted(to_replace)                         # 按照起始位置升序排序,為逐個替換做準備              question2 = my_coref(question,to_replace)              print("new question:",question2)              if question2 in QA:                  return QA[question2]        return "I don't know."
answer("Who is Abraham Lincoln?")
context: Who is Abraham Lincoln?            'An American statesman and lawyer who served as the 16th President of the United States.'
answer("When was he born?")
context: Who is Abraham Lincoln? When was he born?  [Abraham Lincoln: [Abraham Lincoln, he]]  new question: When was Abraham Lincoln born?            'February 12, 1809.'
answer("Where is his hometown?")
context: Who is Abraham Lincoln? When was he born? Where is his hometown?  [Abraham Lincoln: [Abraham Lincoln, he, his]]  new question: Where is Abraham Lincoln's hometown?            'Hodgenville, Kentucky'