论文笔记 – PRISM: A Rich Class of Parameterized Submodular Information Measures for Guided Subset Selection

2022 年 11 月 6 日
筆記
ML & NLP

Motivation

与 Active Learning 类似，Target Learning 致力于挑选外卖更“感兴趣”的数据，即人为为更重要的数据添加 bias。例如我们当前的任务目标是增强自动驾驶算法的夜间行驶性能，我们就不能单纯从未标注数据集中抽取多样性大的数据，而是要满足黑夜条件的数据。

Guided Summarization 与此类似，在进行 Summarization 的同时，也只抽取用户“感兴趣”感兴趣的内容。例如在各种内容都有的新闻中做体育相关的摘要生成，就要给算法一个与体育相关的 bias。

Guided Summarization 包括两种目标：

query-focused：抽取的内容要和 query 相关；
privacy-preserving：抽取的内容要避免 privacy 相关的内容。

Analysis

提出三种指标：

次模条件增长（Submodular Conditional Gain, CG），越大说明差异越大：

$$f(\mathcal{A}|\mathcal{P})=f(\mathcal{A}\cup\mathcal{P})-f(\mathcal{P})$$

次模交互信息（Submodular Mutual Information, MI），越大说明相似性越大：

$$I_f(\mathcal{A};\;\mathcal{Q})=f(\mathcal{A})+f(\mathcal{Q})-f(\mathcal{A}\cup\mathcal{Q})$$

次模条件交互信息（Submodular Conditional Mutual Information, CMI），上面二者的结合：

$$I_f(\mathcal{A};\;\mathcal{Q}|\mathcal{P})=f(\mathcal{A}\cup\mathcal{P})+f(\mathcal{Q}\cup\mathcal{P})-f(\mathcal{A}\cup\mathcal{Q}\cup\mathcal{P})-f(\mathcal{P})$$

以上三种次模函数 CG、MI、CMI 均为单调（当其中一个作为参数的子集固定）非负，因此可以用贪心算法求解。

1. 三种实例化方案

（1） Log Determinant

（2） Facility Location

MI 有两种变体：FLVMI 和 FLQMI（见上图），FLQMI 的好处在于，假如你已经选择了一个 query-relevant 的数据，仍然会选择其他的 query-relevant 数据仍可以使 MI 有所增长。

（3） GrPaph Cut

Tags: ML & NLP

论文笔记 – PRISM: A Rich Class of Parameterized Submodular Information Measures for Guided Subset Selection

Motivation

Analysis

1. 三种实例化方案

（1） Log Determinant

（2） Facility Location

（3） GrPaph Cut

VirMach 便宜 VPS

QNews

论文笔记 – PRISM: A Rich Class of Parameterized Submodular Information Measures for Guided Subset Selection

Motivation

Analysis

1. 三种实例化方案

（1） Log Determinant

（2） Facility Location

（3） GrPaph Cut

分享此文：

Related Posts

手把手教你IDEA连接码云(Gitee)

【已开源】Flutter 腾讯优量汇广告插件，帮助开发者获利 – FlutterAds

Intel GPU Gen 9 架构

并发和并行

VirMach 便宜 VPS

QNews

熱門搜尋