自動生成高效的稀疏張量格式轉換常式(CS MS)

  • 2020 年 1 月 19 日
  • 筆記

本文展示了如何生成程式碼來有效地轉換不同存儲格式(數據布局)之間的稀疏張量,如CSR、DIA、ELL等。我們將稀疏張量轉換分解為三個邏輯階段:坐標映射、分析和裝配。然後,我們開發了一種語言,精確地描述了不同的格式如何組合在一起,並在記憶體中對張量的非零進行排序。這使得編譯器能夠在格式之間轉換時發出執行複雜的非零重排序(重設)的程式碼。此外,我們還開發了一種查詢語言,可以提取關於稀疏張量的複雜統計資訊,並展示了如何發出計算此類查詢的有效分析程式碼。最後,我們定義了一個抽象的介面,它捕獲了如何有效地組合存儲一個張量的數據結構,並給出了關於這個張量的具體統計資訊。完全不同的格式可以實現這個公共介面,因此編譯器可以為各種格式的任意組合發出優化的稀疏張量轉換程式碼,而不需要對任何特定的格式進行硬編碼。我們的評估表明,我們的技術生成的稀疏張量轉換常式的性能在0.99到2.2倍之間,是兩個廣泛使用的稀疏線性代數庫SPARSKIT和Intel MKL中的手工優化實現的性能。通過發出程式碼來避免時間的物化,我們的技術在CSC/COO到DIA/ELL的轉換上也比這兩個庫好1.4到3.4倍。

原文題目:Automatic Generation of Efficient Sparse Tensor Format Conversion Routines

原文:This paper shows how to generate code that efficiently converts sparse tensors between disparate storage formats (data layouts) like CSR, DIA, ELL, and many others. We decompose sparse tensor conversion into three logical phases: coordinate remapping, analysis, and assembly. We then develop a language that precisely describes how different formats group together and order a tensor's nonzeros in memory. This enables a compiler to emit code that performs complex reorderings (remappings) of nonzeros when converting between formats. We additionally develop a query language that can extract complex statistics about sparse tensors, and we show how to emit efficient analysis code that computes such queries. Finally, we define an abstract interface that captures how data structures for storing a tensor can be efficiently assembled given specific statistics about the tensor. Disparate formats can implement this common interface, thus letting a compiler emit optimized sparse tensor conversion code for arbitrary combinations of a wide range of formats without hard-coding for any specific one. Our evaluation shows that our technique generates sparse tensor conversion routines with performance between 0.99 and 2.2× that of hand-optimized implementations in two widely used sparse linear algebra libraries, SPARSKIT and Intel MKL. By emitting code that avoids materializing temporaries, our technique also outperforms both libraries by between 1.4 and 3.4× for CSC/COO to DIA/ELL conversion.

原文作者:Stephen Chou, Fredrik Kjolstad, Saman Amarasinghe

原文地址:https://arxiv.org/abs/2001.02609