MLIR compilers
This article introduces basics of MILR(Multi-Level Intermediate Representation) compilers framework, which is a set of languages and tools for building compilers. We aim to cover:
- MLIR syntax and dialects
- Transforming dialects: How to write a pass
Aside from that the compilers are inherently difficult to learn. We also discuss the difficulties in learning MLIR:
- lack of systematic tutorials and documentation
- C++ api documentation is nonexistent aside from source code
- complex build system based on CMake and Ninja
- TableGen, another language on top of C++
- C++ is a complex language itself
- lots of concepts: Operations, Regions, Blocks, Values, Types, Attributes, Dialects, Passes, Patterns, etc.
- no clear separation between llvm, MLIR, CIRCT
- in addition, there are many algorithms for AI optimizations, hardware synthesis, etc.
Dialects: languages in MLIR
General syntax of MLIR
MLIR is fundamentally based on a graph-like data structure of nodes, called Operations, and edges, called Values. Each Value is the result of exactly one Operation or Block Argument, and has a Value Type defined by the type system. Operations are contained in Blocks and Blocks are contained in Regions.
The syntax for the IR is defined like this:
Operation
list of Regions
Region
list of Blocks
Block
list of Block Arguments
list of Operations
Example MLIR code:
module {
// a region started
func @my_function(%arg0: i32, %arg1: i32) -> i32 {
// another region started
%0 = addi %arg0, %arg1 : i32
return %0 : i32
}
}
Here module is an Operation called moduleOp, which contains a Region. The Region contains a Block, which contains two Operations: funcOp and returnOp. The funcOp contains another Region, which contains a Block with two Operations: addiOp and returnOp. The %arg0 and %arg1 are Block Arguments, and %0 is a Value produced by the addiOp.
There are several types of regions in MLIR:
- graph region: concurrent semantics, not sequential
Dialects in MLIR The MLIR project predefines several dialects, which are different intermediate representation languages for different purposes(optimizations, target hardware, etc).
transforming dialects
Another crucial part of MLIR is letting users transform dialects easily.
C++ API
MLIR only provides C++ API for transforming dialects.
The main entry is runOnOperation() :
1void MyPass::runOnOperation() {
2 // get the current operation
3 Operation *op = getOperation();
4
5 // traverse all operations in the current operation
6 op->walk([](Operation *nestedOp) {
7 // perform transformations on nestedOp
8 });
9 // or manually iterate over regions
10 op->getRegions()
11}
A Region does not hold anything other than a list of Blocks. A Block holds a list of arguments and a list of Operations.
traversing with helper functions walk()
MLIR exposes the walk() helper on Operation, Block, and Region. This helper takes a single argument: a callback method that will be invoked for every operation recursively:
1getFunction().walk([](Operation *op) {
2 // perform transformations on op
3});
4// can also pattern-match on specific types of operations
5getFunction().walk([](AddIOp addOp) {
6 // perform transformations on addOp
7});
the walk process can be stopped or continued:
1getFunction().walk([](Operation *op) {
2 if (shouldStop(op)) {
3 return WalkResult::interrupt(); // stop walking
4 }
5 // perform transformations on op
6 return WalkResult::advance(); // continue walking
7});
putting it all together We use the CIRCT project as an example. Compile the CIRCT project after following the instructions here:
1# Configure the build Debug
2# DCMAKE_BUILD_TYPE = RelWithDebInfo, Debug, Release
3cmake -G Ninja llvm/llvm -B build \
4 -DCMAKE_BUILD_TYPE=Release \
5 -DLLVM_ENABLE_ASSERTIONS=ON \
6 -DLLVM_TARGETS_TO_BUILD=host \
7 -DLLVM_ENABLE_PROJECTS=mlir \
8 -DLLVM_EXTERNAL_PROJECTS=circt \
9 -DLLVM_EXTERNAL_CIRCT_SOURCE_DIR=$PWD
10 -DLLVM_ENABLE_LLD=ON
11
12# Build the circt-opt tool
13ninja -C build bin/circt-opt
To summarize the steps:
- Define the pass in TableGen (.td file).
- Generate headers using TableGen.
- Implement the pass logic in a C++ (.cpp file).
- Register the pass in the build system (CMakeLists.txt).
- Build the project to compile the new pass.
Scala API
There's some experimental Scala API for MLIR:
First we introduce the basic usage of Scair.
For an example snippet of MLIR code:
%0 = "arith.constant"() <{value = 5}> : () -> i32
%1 = "arith.constant"() <{value = 5}> : () -> i32
%2 = "arith.addi"(%0, %1) : (i32, i32) -> i32
func.call @print(%2) : (i32) -> ()
We can write a pattern to fold the addition of two constant integers:
1val AddIFold = pattern {
2 case AddI(
3 Owner(Constant(c0: IntegerAttr, _)),
4 Owner(Constant(c1: IntegerAttr, _)),
5 _
6 ) =>
7 Constant(c0 + c1, Result(c0.typ))
8}
references:
References and further reading
references: