MLIR compilers

This article introduces basics of MILR(Multi-Level Intermediate Representation) compilers framework, which is a set of languages and tools for building compilers. We aim to cover:

MLIR syntax and dialects
Transforming dialects: How to write a pass

Aside from that the compilers are inherently difficult to learn. We also discuss the difficulties in learning MLIR:

lack of systematic tutorials and documentation
C++ api documentation is nonexistent aside from source code
complex build system based on CMake and Ninja
TableGen, another language on top of C++
C++ is a complex language itself
lots of concepts: Operations, Regions, Blocks, Values, Types, Attributes, Dialects, Passes, Patterns, etc.
no clear separation between llvm, MLIR, CIRCT
in addition, there are many algorithms for AI optimizations, hardware synthesis, etc.

Dialects: languages in MLIR

General syntax of MLIR

MLIR is fundamentally based on a graph-like data structure of nodes, called Operations, and edges, called Values. Each Value is the result of exactly one Operation or Block Argument, and has a Value Type defined by the type system. Operations are contained in Blocks and Blocks are contained in Regions.

The syntax for the IR is defined like this:

Operation
    list of Regions
    Region
        list of Blocks
        Block
            list of Block Arguments
            list of Operations

Example MLIR code:

module {
  // a region started  
  func @my_function(%arg0: i32, %arg1: i32) -> i32 {
    // another region started
    %0 = addi %arg0, %arg1 : i32
    return %0 : i32
  }
}

Here module is an Operation called moduleOp, which contains a Region. The Region contains a Block, which contains two Operations: funcOp and returnOp. The funcOp contains another Region, which contains a Block with two Operations: addiOp and returnOp. The %arg0 and %arg1 are Block Arguments, and %0 is a Value produced by the addiOp.

There are several types of regions in MLIR:

graph region: concurrent semantics, not sequential

Dialects in MLIR The MLIR project predefines several dialects, which are different intermediate representation languages for different purposes(optimizations, target hardware, etc).

transforming dialects

Another crucial part of MLIR is letting users transform dialects easily.

C++ API

MLIR only provides C++ API for transforming dialects. The main entry is runOnOperation() :

 1void MyPass::runOnOperation() {
 2  // get the current operation
 3  Operation *op = getOperation();
 4
 5  // traverse all operations in the current operation
 6  op->walk([](Operation *nestedOp) {
 7    // perform transformations on nestedOp
 8  });
 9  // or manually iterate over regions
10  op->getRegions()
11}

A Region does not hold anything other than a list of Blocks. A Block holds a list of arguments and a list of Operations.

traversing with helper functions walk()

MLIR exposes the walk() helper on Operation, Block, and Region. This helper takes a single argument: a callback method that will be invoked for every operation recursively:

1getFunction().walk([](Operation *op) {
2  // perform transformations on op
3});
4// can also pattern-match on specific types of operations
5getFunction().walk([](AddIOp addOp) {
6  // perform transformations on addOp
7});

the walk process can be stopped or continued:

1getFunction().walk([](Operation *op) {
2  if (shouldStop(op)) {
3    return WalkResult::interrupt(); // stop walking
4  }
5  // perform transformations on op
6  return WalkResult::advance(); // continue walking
7});

putting it all together We use the CIRCT project as an example. Compile the CIRCT project after following the instructions here:

 1# Configure the build Debug
 2# DCMAKE_BUILD_TYPE = RelWithDebInfo, Debug, Release
 3cmake -G Ninja llvm/llvm -B build \
 4    -DCMAKE_BUILD_TYPE=Release \ 
 5    -DLLVM_ENABLE_ASSERTIONS=ON \
 6    -DLLVM_TARGETS_TO_BUILD=host \
 7    -DLLVM_ENABLE_PROJECTS=mlir \
 8    -DLLVM_EXTERNAL_PROJECTS=circt \
 9    -DLLVM_EXTERNAL_CIRCT_SOURCE_DIR=$PWD
10    -DLLVM_ENABLE_LLD=ON
11
12# Build the circt-opt tool
13ninja -C build bin/circt-opt

To summarize the steps:

Define the pass in TableGen (.td file).
Generate headers using TableGen.
Implement the pass logic in a C++ (.cpp file).
Register the pass in the build system (CMakeLists.txt).
Build the project to compile the new pass.

Scala API

There's some experimental Scala API for MLIR:

First we introduce the basic usage of Scair.

For an example snippet of MLIR code:

%0 = "arith.constant"() <{value = 5}> : () -> i32
%1 = "arith.constant"() <{value = 5}> : () -> i32
%2 = "arith.addi"(%0, %1) : (i32, i32) -> i32
func.call @print(%2) : (i32) -> ()

We can write a pattern to fold the addition of two constant integers:

1val AddIFold = pattern {
2	case AddI(
3		Owner(Constant(c0: IntegerAttr, _)),
4		Owner(Constant(c1: IntegerAttr, _)),
5		_
6	) =>
7		Constant(c0 + c1, Result(c0.typ))
8}

references：

References and further reading

references: