自动生成高效的稀疏张量格式转换例程(CS MS)

  • 2020 年 1 月 19 日
  • 筆記

本文展示了如何生成代码来有效地转换不同存储格式(数据布局)之间的稀疏张量,如CSR、DIA、ELL等。我们将稀疏张量转换分解为三个逻辑阶段:坐标映射、分析和装配。然后,我们开发了一种语言,精确地描述了不同的格式如何组合在一起,并在内存中对张量的非零进行排序。这使得编译器能够在格式之间转换时发出执行复杂的非零重排序(重设)的代码。此外,我们还开发了一种查询语言,可以提取关于稀疏张量的复杂统计信息,并展示了如何发出计算此类查询的有效分析代码。最后,我们定义了一个抽象的接口,它捕获了如何有效地组合存储一个张量的数据结构,并给出了关于这个张量的具体统计信息。完全不同的格式可以实现这个公共接口,因此编译器可以为各种格式的任意组合发出优化的稀疏张量转换代码,而不需要对任何特定的格式进行硬编码。我们的评估表明,我们的技术生成的稀疏张量转换例程的性能在0.99到2.2倍之间,是两个广泛使用的稀疏线性代数库SPARSKIT和Intel MKL中的手工优化实现的性能。通过发出代码来避免时间的物化,我们的技术在CSC/COO到DIA/ELL的转换上也比这两个库好1.4到3.4倍。

原文题目:Automatic Generation of Efficient Sparse Tensor Format Conversion Routines

原文:This paper shows how to generate code that efficiently converts sparse tensors between disparate storage formats (data layouts) like CSR, DIA, ELL, and many others. We decompose sparse tensor conversion into three logical phases: coordinate remapping, analysis, and assembly. We then develop a language that precisely describes how different formats group together and order a tensor's nonzeros in memory. This enables a compiler to emit code that performs complex reorderings (remappings) of nonzeros when converting between formats. We additionally develop a query language that can extract complex statistics about sparse tensors, and we show how to emit efficient analysis code that computes such queries. Finally, we define an abstract interface that captures how data structures for storing a tensor can be efficiently assembled given specific statistics about the tensor. Disparate formats can implement this common interface, thus letting a compiler emit optimized sparse tensor conversion code for arbitrary combinations of a wide range of formats without hard-coding for any specific one. Our evaluation shows that our technique generates sparse tensor conversion routines with performance between 0.99 and 2.2× that of hand-optimized implementations in two widely used sparse linear algebra libraries, SPARSKIT and Intel MKL. By emitting code that avoids materializing temporaries, our technique also outperforms both libraries by between 1.4 and 3.4× for CSC/COO to DIA/ELL conversion.

原文作者:Stephen Chou, Fredrik Kjolstad, Saman Amarasinghe

原文地址:https://arxiv.org/abs/2001.02609