使用Web表回答開放域問題(Information retrieval)

  • 2020 年 1 月 13 日
  • 筆記

從web文檔中提取的表可用於直接回答許多web搜索查詢。以前使用web表回答問題(QA)的工作主要關注於事實查詢,即,即可以用短字元串(如人名或數字)回答的。然而,許多使用表可回答的查詢本質上是非事實的。在本文中,我們使用web表開發了一種開放域QA方法,這種方法既適用於事實查詢,也適用於非事實查詢。我們的主要觀點是將基於深度神經網路的查詢和表之間的語義相似性與量化文檔中表的優勢以及表中資訊的品質的特性結合起來。我們在現實生活中的網路搜索查詢實驗表明,我們的方法明顯優於最先進的基準線方法。我們的解決方案在一個主要的商業web搜索引擎的產品中使用,每個月為數千萬的真實用戶查詢提供直接的答案。

原文標題:Information retrieval:Open Domain Question Answering Using Web Tables

Tables extracted from web documents can be used to directly answer many web search queries. Previous works on question answering (QA) using web tables have focused on factoid queries, i.e., those answerable with a short string like person name or a number. However, many queries answerable using tables are non-factoid in nature. In this paper, we develop an open-domain QA approach using web tables that works for both factoid and non-factoid queries. Our key insight is to combine deep neural network-based semantic similarity between the query and the table with features that quantify the dominance of the table in the document as well as the quality of the information in the table. Our experiments on real-life web search queries show that our approach significantly outperforms state-of-the-art baseline approaches. Our solution is used in production in a major commercial web search engine and serves direct answers for tens of millions of real user queries per month.

原文作者:Kaushik Chakrabarti,Zhimin Chen,Siamak Shakeri,Guihong Cao

原文鏈接:https://arxiv.org/abs/2001.03272