Position-Dependent Arrays and Their Applicationfor High Performance Code Generation
Modern parallel hardware promises unprecedented performance, for the gifted few experts who can program it correctly. Code generators from high-level languages provide an attractive alternative, promising to deliver high performance automatically. Existing projects such as Accelerate, Futhark, Halide, or Lift show that this approach is feasible. Unfortunately, existing efforts focus on computations over tensors: regularly shaped higher dimensional arrays. This limits the expressiveness of these approaches and excludes many interesting data structures that are commonly encoded manually in memory, such as trees or triangular matrices.
This paper presents an extended array type that lifts this restriction. For multidimensional arrays, the size of a nested array might depend on its position in the surrounding arrays, which enables the expression of computations over less regularly shaped data structures. However, these position-dependent arrays bring new challenges for high-performance code generation, as determining the position of the elements in memory becomes more challenging.
This paper shows how these challenges are addressed by extending the existing Lift type system and compiler. The experimental results show that this approach enables the efficient code generation of triangular matrix-vector multiplication, with performance improvements over cuBLAS on an Nvidia GPU by up to 2 times. Furthermore, we show a use case for a low-level optimization for avoiding unnecessary out-of-bound checks in stencils, leading to up to 3 times improvements over already optimized generated stencil codes.
Sun 18 Aug
|15:20 - 15:45|
|15:46 - 16:13|
Federico PizzutiUniversity of Edinburgh, Michel SteuwerUniversity of Glasgow, Christophe DubachUniversity of EdinburghLink to publication DOI Pre-print File Attached
|16:14 - 16:40|