使用Web表回答开放域问题(Information retrieval)

  • 2020 年 1 月 13 日
  • 筆記

从web文档中提取的表可用于直接回答许多web搜索查询。以前使用web表回答问题(QA)的工作主要关注于事实查询,即,即可以用短字符串(如人名或数字)回答的。然而,许多使用表可回答的查询本质上是非事实的。在本文中,我们使用web表开发了一种开放域QA方法,这种方法既适用于事实查询,也适用于非事实查询。我们的主要观点是将基于深度神经网络的查询和表之间的语义相似性与量化文档中表的优势以及表中信息的质量的特性结合起来。我们在现实生活中的网络搜索查询实验表明,我们的方法明显优于最先进的基线方法。我们的解决方案在一个主要的商业web搜索引擎的产品中使用,每个月为数千万的真实用户查询提供直接的答案。

原文标题:Information retrieval:Open Domain Question Answering Using Web Tables

Tables extracted from web documents can be used to directly answer many web search queries. Previous works on question answering (QA) using web tables have focused on factoid queries, i.e., those answerable with a short string like person name or a number. However, many queries answerable using tables are non-factoid in nature. In this paper, we develop an open-domain QA approach using web tables that works for both factoid and non-factoid queries. Our key insight is to combine deep neural network-based semantic similarity between the query and the table with features that quantify the dominance of the table in the document as well as the quality of the information in the table. Our experiments on real-life web search queries show that our approach significantly outperforms state-of-the-art baseline approaches. Our solution is used in production in a major commercial web search engine and serves direct answers for tens of millions of real user queries per month.

原文作者:Kaushik Chakrabarti,Zhimin Chen,Siamak Shakeri,Guihong Cao

原文链接:https://arxiv.org/abs/2001.03272