Profiling Dune Builds
Co-authored with Louis Roché.
There are multiple ways to know what’s going on in Dune (build system for OCaml). The tool itself includes a few ways that are mentioned at the end of the guide, but they’re not sufficient as of the current date (Feb ‘24). So the most relevant information at this moment is obtained through external tools.
Note that the process explained below with perf
and memtrace
is applicable to any OCaml program, not just Dune itself.
Performance trace with perf
perf
is a tool coming with the Linux kernel. Essentially, it will run any application and take measurements every few milliseconds to get a snapshot of the callstack.
The main subcommand to record a run is perf record
.
perf
reports created with perf record
can be opened with perf report
. But if you prefer more visual ways, Firefox has a profiler that can be used as well.
perf record -F 1000 --call-graph dwarf dune build @all # or any other alias / target
perf script -F +pid > test.perf # converts perf.data to test.perf so it can be opened
Then upload the resulting test.perf
to the online tool in https://profiler.firefox.com/.
Custom version of dune
If you want to profile some custom version of Dune you’re working on locally, remember to replace the path to the dune
binary when recording.
First, build the custom version of dune (run the command from wherever the Dune fork is located in your machine):
dune build bin/main.exe
Then, update the path in perf record
(run the command from the folder where the project build you are trying to profile is located):
perf record -F 1000 --call-graph dwarf ../dune/_build/default/bin/main.exe build @all
Source lines
perf
has an option to include the source files linked to each function call:
perf report -F +srcline
Unfortunately, there does not seem to be a way to view the source lines in Firefox reports.
In case the visualization of source files is needed, we can use Speedscope instead of Firefox profiling tool.
After getting the perf
report using perf record
as explained above, process it with the following commands:
perf script -F +srcline > test.perf
./perf-addlines.pl test.perf > test-speedscope.perf
You can obtain perf-addlines.pl
here.
Now, open test-speedscope.perf
from https://www.speedscope.app/. Remember to select main.exe
thread in top selector:
Memory profiling with memtrace
Sometimes you might encounter a lot of work done in garbage collection functions in the traces, like caml_call_gc
. In these cases, you might want to know more about how memory is being used in Dune.
To do this we can use memtrace. To add it to your custom dune
build:
From the Dune folder
- Install memtrace in the opam switch of your local Dune copy:
opam install memtrace
- Add
memtrace
lib to themain
exe here - Add
Memtrace.trace_if_requested ();
in the main entry point here
From your project folder
- Run your custom version of
dune
with the flag to activate memtrace:MEMTRACE=dune-build-all.ctf ../dune/_build/default/bin/main.exe build @all
Then you can open the generated report using memtrace-viewer
:
opam install memtrace_viewer
memtrace-viewer js_of_ocaml-leaky.ctf
For a more complete guide to memtrace, refer to this link.
Garbage collection stats
As Dune is an OCaml program, we can request the runtime to print information about the garbage collector by setting the v
value in the OCAMLRUNPARAM
environment variable. To get a summary of the garbage collection work for an entire build, set v
to 1024
.
OCAMLRUNPARAM=v=1024 dune build @all
The output looks like this:
allocated_words: 5027692323
minor_words: 4856526503
promoted_words: 869694473
major_words: 1040860293
minor_collections: 18873
major_collections: 14
heap_words: 890589184
heap_chunks: 28
top_heap_words: 890589184
compactions: 0
forced_major_collections: 0
There are many different messages which can be displayed. All the options are described in the OCaml manual.
(verbose) What GC messages to print to stderr. This is a sum of values selected from the following:
1 (= 0x001)
Start and end of major GC cycle.
2 (= 0x002)
Minor collection and major GC slice.
4 (= 0x004)
Growing and shrinking of the heap.
8 (= 0x008)
Resizing of stacks and memory manager tables.
16 (= 0x010)
Heap compaction.
32 (= 0x020)
Change of GC parameters.
64 (= 0x040)
Computation of major GC slice size.
128 (= 0x080)
Calling of finalization functions
256 (= 0x100)
Startup messages (loading the bytecode executable file, resolving shared libraries).
512 (= 0x200)
Computation of compaction-triggering condition.
1024 (= 0x400)
Output GC statistics at program exit.
2048 (= 0x800)
GC debugging messages.
4096 (= 0x1000)
Address space reservation changes.
Dune internal tracing tools
Dune provides at least three ways to inspect what it’s doing behind the scenes.
--trace-file
This flag will generate a json file that can be loaded in Firefox profiling tool or Speedscope, for example:
dune build @all --trace-file dune-build-all.json
Unfortunately, these traces don’t include any of the internal work Dune is doing, only calls to external commands like ocamldep
, ocamlc
or melc
. So it’s only useful if you want to investigate slowness in those parts. If you want to keep up with any updates on that functionality, you can follow ocaml/dune#3862.
--print-metrics
This flag shows some internal measurements taken during the build, both for Memo metrics and also other code. At the moment the latter only includes a single function generic_digest
. Example usage:
$ dune build @all --print-metrics
Memo graph: 2/1/0 nodes/edges/blocked (restore), 531541/2985230/599695
nodes/edges/blocked (compute)
Memo cycle detection graph: 328685/898708/599695 nodes/edges/paths
(21.26s total, 271.9M heap words)
Timers:
generic_digest - time spent = 0.38s, count = 195711
--dump-gc-stats
This flag will print some GC metrics to a given file, e.g.
dune build @all --dump-gc-stats all-gc-stats.txt
The contents in the generated txt
file look like:
((minor_words 2753427060.)
(promoted_words 384483892.)
(major_words 398169012.)
(minor_collections 10718)
(major_collections 19)
(heap_words 271897088)
(heap_chunks 28)
(live_words 200276449)
(live_blocks 43138894)
(free_words 71620633)
(free_blocks 16)
(largest_free 8766464)
(fragments 6)
(compactions 1)
(top_heap_words 271897088)
(stack_size 255))