Thursday, July 3, 2014

GCC LTO call graph generation

Since my first post about gcc link-time optimization, I've been doing more investigation into how it works.  Since I've written some Arduino libraries as a way to help out newbies to embedded development, I follow the Arduino developers list.  At the request of a few developers on the list, LTO was included in the arduino nightly beta.  Due to LTO builds of the USB Host Shield library being extremely large, Christian announced on June 30th that LTO would be disabled until the issue was figured out.  I decided to investigate to see what was happening.

Arduino doesn't use makefiles for building, and instead has a build process which allows for configuration of some of the build flags in a platform.txt file which is read on startup of the IDE.  Testing build flags involves editing the file, starting the IDE, and doing a build.  To speed up the process and to have complete control over the compile, archive, and link process I worked with avr-gcc on the command line, using the temporary files compiled by Arduino.

My first discovery was that when a library's .o files are included on the command line (instead of an archive file), LTO didn't work.  All the code from each .o was included in the binary, even code that was never used.  That's when I started researching how gcc's callgraph generation works.  The gcc internals documentation is good, though it still leaves unanswered questions.

The big question is how does it pick the root of the call graph?  I thought the starting point for the call graph analysis would be the C Runtime Initialization object file, which references main.  From there the linker could look at all the .o's to find main, and build the graph.  After lots of experiments, I discovered LTO may generate multiple call graphs for the same executable - in other words there can be many root nodes.  Every .o file specified on the command line will be used for root node candidates, and if there are only .a files specified, every .o in the first .a will be used for root node candidates.

When all the .o files for an Arduino library are included on the build line, gcc's LTO includes many functions that never get called from main.  Creating an archive with the .o files solves that problem.  That still doesn't result in optimal performance of LTO.  Since the Arduino IDE compiles all of it's core .o files into core.a, which is the first argument in the build line, all the .o files in core.a get included in the build.  Specifying main.cpp.o solves this problem.

$ avr-gcc -mmcu=atmega328p -Wl,-relax -flto -Os main.cpp.o hjrp.a core.a USB_Host_Shield_2.0-master/UHS.a -o USBHIDJ.elf
$ avr-size USBHIDJ.elf
   text    data     bss     dec     hex filename
  12530     194     460   13184    3380 USBHIDJ.elf

Building the same files without main.cpp.o specified (listing core.a first) was almost 3KB larger:

$ avr-size USBHIDJ.elf
   text    data     bss     dec     hex filename
  18152     204     493   18849    49a1 USBHIDJ.elf

For some further details on LTO check out Honza Hubička's Blog:


  1. This is quite strange behavior. GCC basically uses linked plugin feedback to figure out what symbols are needed. Either in your setup you have no linker plugin enabled (try compiling with -fuse-linker-plugin) or something is wrong with the resolution data given by the plugin. You can see the data after copmiling with --save-temps. They end up in file called *.res (often soemting like -lm.res that is anoying) and you can see what symbols are roots of GCC's unreachable code removal.


    1. Thanks for the suggestions. I'm pretty sure the linker plugin is enabled, since I figured out that it *wasn't* enabled in the Atmel toolchain for linux:
      I'll compare with -fuse-linker-plugin just to make sure though. I'm not sure I'll get much deeper than that - I do most of my AVR development with command-line tools and only occasionally contribute to Arduino development since it's a popular tool for newbies. I also tend to get a bit frustrated whenever I delve deeper into gcc and see how some parts of it just seem fucked up, and other parts are cool but so complex my head hurts when I try to understand them.

    2. I tried adding -fuse-linker-plugin, and still get a bloated elf (~82KB) when I specify the .o's instead of an archive:
      $ avr-gcc -mmcu=atmega328p -fuse-linker-plugin -Wl,-relax -flto -Os main.cpp.o hjrp.a core.a USB_Host_Shield_2.0-master/*.o -o USBHIDJ.elf

      I did some more reading on lto and the plugin, and am wondering if the linker needs to support the plugin interface too? When I run avr-ld -plugin I get "unrecognized option '-plugin'".

    3. However the help shows -plugin as one of the options:
      $ avr-ld --help | egrep plugin
      -plugin PLUGIN Load named plugin
      -plugin-opt ARG Send arg to last-loaded plugin