Starting with LLVM and Encore
Published:
For my PhD, I need to work on typed optimisations for the Encore language. For fun and a bit for learning how to do this, I am going to describe the steps I take to perform alias analysis using different tools from the LLVM suite.
For instance, let’s say I would like to know if there could be any potential benefit in using the restrict
qualifier in the C code generated by Encore. To do this, I am going to use the PingPong benchmark, and explain how to generate C code from Encore, transform C to LLVM IR, link different files together, perform alias analysis, optimise the code (possibly) and generate an assembly file that can be compiled by the system’s compiler to produce an executable.
Steps:
- Step 1:
encorec Main.enc
transpiles Encore to C - Step 2: For each
*.c
file, compile the file to the LLVM IR (generating files with extension*.ll
):
clang-3.8 -S -emit-llvm -std=gnu11 -Wall -fms-extensions -Wno-format\
-Wno-microsoft -Wno-parentheses-equality -Wno-unused-variable\
-Wno-unused-value -Wno-attributes\
-I /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/swift-migrator/sdk/MacOSX.sdk/usr/include\
-I /Users/kikofernandezreyes/Code/encore/release/inc/ \
-I . -I /Users/kikofernandezreyes/Code/encore/release/inc/ -O3 shared.c
- Step 3: After this step, you need to link all the files LLVM IR together, for which you can use
llvm-link-3.8
:
llvm-link-3.8 -S -v -o runner.ll Main.encore.ll shared.ll PingActor.encore.ll \
PongActor.encore.ll String.encore.ll
With the -S
option, I specify that I would like to produce the output in LLVM IR.
- Step 4: Now that everything has been linked together, we can perform various analysis. In this particular case, we are going to perform aliasing analysis:
opt-3.8 -O3 -S -basicaa -aa -adce -scev-aa -tbaa -aa-eval runner.ll -stats -o runnerOpt.ll
Some of the flags perform different kinds of alias analysis. The output is written to stdout:
===== Alias Analysis Evaluator Report =====
21425 Total Alias Queries Performed
2033 no alias responses (9.4%)
18753 may alias responses (87.5%)
279 partial alias responses (1.3%)
360 must alias responses (1.6%)
Alias Analysis Evaluator Pointer Alias Summary: 9%/87%/1%/1%
126109 Total ModRef Queries Performed
99429 no mod/ref responses (78.8%)
0 mod responses (0.0%)
2345 ref responses (1.8%)
24335 mod & ref responses (19.2%)
Alias Analysis Evaluator Mod/Ref Summary: 78%/0%/1%/19%
===-------------------------------------------------------------------------===
... Statistics Collected ...
===-------------------------------------------------------------------------===
1854 basicaa - Number of times a GEP is decomposed
60 bdce - Number of instructions trivialized (dead bits)
1 cgscc-passmgr - Maximum CGSCCPassMgr iterations on one SCC
6 constmerge - Number of global constants merged
43 deadargelim - Number of unread args replaced with undef
12 early-cse - Number of instructions CSE'd
13 functionattrs - Number of arguments marked nocapture
13 functionattrs - Number of arguments marked readonly
10 globalsmodref-aa - Number of functions that do not access memory
16 globalsmodref-aa - Number of functions that only read memory
11 gvn - Number of instructions deleted
8 gvn - Number of instructions simplified
2 gvn - Number of loads deleted
1 indvars - Number of indvars widened
1 indvars - Number of loop exit tests replaced
1 inferattrs - Number of arguments inferred as readonly
39 inline - Number of functions inlined
93 inline-cost - Number of call sites analyzed
63 instcombine - Number of dead inst eliminated
1 instcombine - Number of instructions sunk
88 instcombine - Number of insts combined
6 instcombine - Number of library calls simplified
36 lcssa - Number of live out of a loop variables
140 loop-simplify - Number of pre-header or exit blocks inserted
45 loop-unswitch - Total number of instructions analyzed
17 loop-vectorize - Number of loops analyzed for vectorization
4 memdep - Number of block queries that were completely cached
114 memdep - Number of fully cached non-local ptr responses
289 memdep - Number of uncached non-local ptr responses
6 reassociate - Number of insts reassociated
36 scalar-evolution - Number of loops with predictable loop counts
11 scalar-evolution - Number of loops without predictable loop counts
14 sccp - Number of instructions removed by IPSCCP
112 simplifycfg - Number of blocks simplified
6 sroa - Number of allocas analyzed for replacement
Step 5: After this step, we know the aliasing percentage of the original version. We are going to modify the C code and add the
restrict
qualifier in the calls that satisfy therestrict
requirements.Step 6: After these changes, repeat steps 2 - 4 and check the output report:
===== Alias Analysis Evaluator Report =====
19228 Total Alias Queries Performed
5748 no alias responses (29.8%)
12841 may alias responses (66.7%)
279 partial alias responses (1.4%)
360 must alias responses (1.8%)
Alias Analysis Evaluator Pointer Alias Summary: 29%/66%/1%/1%
123674 Total ModRef Queries Performed
100143 no mod/ref responses (80.9%)
0 mod responses (0.0%)
2315 ref responses (1.8%)
21216 mod & ref responses (17.1%)
Alias Analysis Evaluator Mod/Ref Summary: 80%/0%/1%/17%
===-------------------------------------------------------------------------===
... Statistics Collected ...
===-------------------------------------------------------------------------===
1854 basicaa - Number of times a GEP is decomposed
60 bdce - Number of instructions trivialized (dead bits)
1 cgscc-passmgr - Maximum CGSCCPassMgr iterations on one SCC
6 constmerge - Number of global constants merged
43 deadargelim - Number of unread args replaced with undef
12 early-cse - Number of instructions CSE'd
13 functionattrs - Number of arguments marked nocapture
13 functionattrs - Number of arguments marked readonly
10 globalsmodref-aa - Number of functions that do not access memory
16 globalsmodref-aa - Number of functions that only read memory
11 gvn - Number of instructions deleted
8 gvn - Number of instructions simplified
2 gvn - Number of loads deleted
1 indvars - Number of indvars widened
1 indvars - Number of loop exit tests replaced
1 inferattrs - Number of arguments inferred as readonly
39 inline - Number of functions inlined
93 inline-cost - Number of call sites analyzed
63 instcombine - Number of dead inst eliminated
1 instcombine - Number of instructions sunk
88 instcombine - Number of insts combined
6 instcombine - Number of library calls simplified
36 lcssa - Number of live out of a loop variables
140 loop-simplify - Number of pre-header or exit blocks inserted
45 loop-unswitch - Total number of instructions analyzed
17 loop-vectorize - Number of loops analyzed for vectorization
4 memdep - Number of block queries that were completely cached
97 memdep - Number of fully cached non-local ptr responses
308 memdep - Number of uncached non-local ptr responses
4184 memory-builtins - Number of arguments with unsolved size and offset
6 reassociate - Number of insts reassociated
36 scalar-evolution - Number of loops with predictable loop counts
11 scalar-evolution - Number of loops without predictable loop counts
14 sccp - Number of instructions removed by IPSCCP
112 simplifycfg - Number of blocks simplified
6 sroa - Number of allocas analyzed for replacement
Step 7: If there was any significant improvement, you can use the generated LLVM IR. Let’s assume there were some optimisations performed.
Step 7.1. After this, we need to convert the LLVM IR generated from the optimiser’s tool (in this case there was no optimisation, just a report) into something that can be executable. For this step, we use
llc
, which is the LLVM static compiler:
llc-3.8 runner.ll
which produces the runner.s
file with the necessary assembly instructions.
- Step 8: Finally, the assembly can be translated into binary code (runnable):
clang -std=gnu11 -Wall -fms-extensions -Wno-format -Wno-microsoft \
-Wno-parentheses-equality -Wno-unused-variable -Wno-unused-value \
-lpthread -ldl -lm -Wno-attributes \
/Users/kikofernandezreyes/Code/encore/release/lib/*.a \
-I /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/swift-migrator/sdk/MacOSX.sdk/usr/include \
-I /Users/kikofernandezreyes/Code/encore/release/inc/ \
-I . -I /Users/kikofernandezreyes/Code/encore/release/inc/ \
-I .. /Users/kikofernandezreyes/Code/encore/release/lib/*.a \
/Users/kikofernandezreyes/Code/encore/release/lib/*.a -O3 runner.s
Don’t forget to import dependencies (inclusion and linking to static libraries and header files).
NOTE: It’s important to note that I am not using clang-3.8
in step 8, as I started to run into errors. I believe that clang-3.8
does the work if you specify the required platform and pass more information. I work on a Mac and using the default clang
for compilation and link
ing was easier.
- Step 9: You have now your optimised executable produced with LLVM tools!