NeuralCoref: 用指代消解来做一个“能多轮对话的问答对话机器人”

  • 2019 年 10 月 30 日
  • 筆記

版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。

本文链接:https://blog.csdn.net/blmoistawinde/article/details/81782992

多轮对话对于目前的聊天机器人来说依然是一个难题,比如下面这个:

显然当我问小冰第二个问题的时候,小冰并不知道我说的“她”指的是杨超越。从回答来看,小冰甚至把“她”当作了Angela Baby? 第三个问题就更不说了。 这里没有半点贬低小冰的意思,我也相信未来的某位读者看到这里时,小冰已经不会犯这样的错误了。不过就写作时来说,小冰应该只纯粹利用了我当前的问句进行回答,导致多轮对话几乎不能正常进行。 指代消解是有希望帮助解决这个问题的一个技术,下面就利用我们刚刚学过的NeuralCoref【NeuralCoref: python的共指消解工具,向代词指代的问题进军!】来写一个“能多轮对话的问答对话机器人”demo吧。

上代码

import warnings  warnings.filterwarnings("ignore")  import spacy  nlp = spacy.load('en_coref_sm')

为了简洁地展现共指消解的应用,这里将把问答系统部分的难度降到最低,直接有了问题-答案的字典映射。

QA = {"Who is Abraham Lincoln?":"An American statesman and lawyer who served as the 16th President of the United States.",        "When was Abraham Lincoln born?":"February 12, 1809.",        "Where is Abraham Lincoln's hometown?":"Hodgenville, Kentucky"}

这些问题没有办法应付代词,然而人在有上下文的对话中使用代词是再自然不过的事了。用共指消解就可以解决这个问题。我们会把每一次的问答记录都记录在上下文中,这样我们就可以用共指消解把之前提到的对象再搬到后面的代词里来,使得有代词的问题也可以与原始模板匹配。

让我们先实验一下这个想法是否可行。

para = "Who is Abraham Lincoln? When was he born? Where is his hometown?"  doc = nlp(para)  print(doc._.coref_clusters)  print(doc._.coref_resolved)
[Abraham Lincoln: [Abraham Lincoln, he, his]]  Who is Abraham Lincoln? When was Abraham Lincoln born? Where is Abraham Lincoln hometown?

效果不错,he和his都识别了出来。然而这边还有一个问题,就是物主代词his再被翻译回来时没有按照语法规则恢复’s。另外,我们也要能够把修改好单独问句再从上下文中“抽出来”。所以我们要自己写一个函数,用到mention.start_char这些属性来手动完成替换和考虑些特殊情况。

我们的最终目标是,实现一个直观的answer(question)函数,直接根据当前的问题给出答案,实现如下:

context = ""
def my_coref(orig_text,to_replace):      left = 0      processed_text = ""      for beg,end,mention in to_replace:          processed_text += orig_text[left:beg] + mention          left = end      processed_text += orig_text[left:]      return processed_text
def answer(question):      global context      start_pos = len(context)      context += (question + " ")      print("context:",context)      if question in QA:          return QA[question]      else:          doc = nlp(context)          if doc._.has_coref:              print(doc._.coref_clusters)              to_replace = []              for clust in doc._.coref_clusters:                  main_mention = clust.main                  for mention in clust.mentions:                      beg, end = mention.start_char - start_pos, mention.end_char - start_pos                      if end > 0:                                     # 是本句中的指代                          if mention.text in ["its","his","her","my","your","our","their"]:                              to_replace.append((beg,end,main_mention.text+"'s"))                          else:                              to_replace.append((beg,end,main_mention.text))              to_replace = sorted(to_replace)                         # 按照起始位置升序排序,为逐个替换做准备              question2 = my_coref(question,to_replace)              print("new question:",question2)              if question2 in QA:                  return QA[question2]        return "I don't know."
answer("Who is Abraham Lincoln?")
context: Who is Abraham Lincoln?            'An American statesman and lawyer who served as the 16th President of the United States.'
answer("When was he born?")
context: Who is Abraham Lincoln? When was he born?  [Abraham Lincoln: [Abraham Lincoln, he]]  new question: When was Abraham Lincoln born?            'February 12, 1809.'
answer("Where is his hometown?")
context: Who is Abraham Lincoln? When was he born? Where is his hometown?  [Abraham Lincoln: [Abraham Lincoln, he, his]]  new question: Where is Abraham Lincoln's hometown?            'Hodgenville, Kentucky'