Getting LLVM Bitcode with Clang from Android — Take II

Posted on November 9, 2015 by Marko Dimjašević

A couple of months ago I thought I got it — that I can successfully get LLVM bitcode for a Bluetooth module from the Android Open Source Project (AOSP). As it turns out, that was not the case assuming the approach described in that blog post. The problem is that the output of that compilation process looked like a wasteland: Clang exited with lots of errors, only some bitcode files were generated, but they were definitely not linked together, hence not usable.

Courtesy of daniel.stark

Here is why and how to actually get a linked LLVM bitcode file for the Bluetooth module.

The reason you might want a bitcode version of a code base is because you want to do program analysis on it. Until recently I wasn’t aware of a general approach that would take an arbitrary program or library project written in C/C++ and compile it to LLVM bitcode. One would usually have to tweak the build system of the project in order to make it LLVM bitcode emitting-friendly. Enter Whole Program LLVM. The Whole Program LLVM tool takes the arbitrary project’s build system, runs it, but at the same time except getting native code, in a smart way gets bitcode as well. In essence, it first invokes a compiler as usual, but then invokes it again on the same target file with the same options, but this time with flags for outputting LLVM bitcode. A file system path to the generated bitcode file is stored in the object file generated in the first step, and in the end the Whole Program LLVM tool links together all such bitcode files. Woot!

Unfortunately, following directions from WLLVM’s readme didn’t do when I tried to get LLVM bitcode for the Bluetooth module in Android. The reason is that the GNU make-based build system in Android is quite complex, and simply changing values of CC and CXX environment variables on the command line didn’t do the trick. Here is what I had to do to get it working, i.e. to get bitcode for the Bluetooth module:

  1. Download the Whole Program LLVM tool (wllvm) from https://github.com/travitch/whole-program-llvm.

  2. Add wllvm’s directory to the PATH_variable:

  1. Set the compiler to be used by wllvm to Clang:
  1. For wllvm set a path to LLVM tools to the local prebuilt version distributed with AOSP:
  1. The trick is in not following instructions from the wllvm website, but dirty-hacking AOSP’s build system. In particular, modify its build/core/clang/config.mk such that CLANG and CLANG_CXX have the following values:
  1. Set LOCAL_CLANG:=true in build/core/clear_vars.mk.

  2. Run the build process (Replace aosp_x86-eng with your target). The wllvm tool will likely give bunch of warnings on unknown parameters, but it’s safe to ignore the warnings:

  1. Now all object files have a section called .llvm_bc created by wllvm. The section contains a path to the respective .bc file for the native code object file. This is also true for the shared library of interest (i.e. the Bluetooth module with a JNI interface), namely libbluetooth_jni.so. You can make sure this is the case by executing:
  1. Link all of the bitcode into a single whole-library bitcode file. The extract-bc tool is part of the Whole Program LLVM project:
  1. The resulting file is:

Out of curiosity, I tried to do the same for the whole AOSP code base (i.e. by running make instead of make Bluetooth), but that resulted in errors. I informed authors of the Whole Program LLVM tool about it, so let’s see if they can fix that.