Writing Tests and LLVM-interpreting Hundreds of Programs

Posted on June 17, 2016 by Marko Dimjašević

After writing a library for Firehose data structures in KLEE last week, I thought I should write some unit tests before moving on to integrating the rest of the KLEE code base with the Firehose format. I faced some problems there and that’s when I switched to running KLEE on several hundreds of programs present in Debian GNU/Linux.

To improve its own code quality, the KLEE code base comes with two types of tests: unit tests and system tests. I found only a few files with unit tests in the KLEE code base. That’s why at first I wasn’t sure if KLEE’s documentation is outdated and checked with others on the developers’ list. The thing is, KLEE really has very few unit test files testing not much of KLEE’s functionality and that’s its current state.

KLEE’s unit tests are, unfortunately, written in the Google Test framework. I say unfortunately because for reasons unknown to me, Google Test is supposed to be distributed only in its source form, hence there is no binary package for it in Debian GNU/Linux. In other words, one cannot install it in the following fashion:

Therefore, I had to figure out how to build the framework manually and how to set up environment variables and paths such that KLEE knows where to find it. This was a recipe for a very unproductive day earlier this week. It was only then that I realized that with my efforts to create a Debian package for KLEE I haven’t exercised at all KLEE’s unit testing build target in its debian/rules Makefile; all I exercised so far are system tests.

While I was waiting for help from other KLEE developers with using the Google Test framework, I moved onto the old task — concretely running lots of programs from Debian GNU/Linux with KLEE. With a local mirror of the main archive of Debian Stretch (that’s the testing release at the moment of writing) set up with the debmirror tool and a tiny script of mine to fire off analysis of one package after another, I set out to run with KLEE whatever programs I can from 5000 Debian source packages.

I started 10 instances of the script in parallel — each supposed to run programs from 500 packages — on Thursday evening on a machine in the Emulab testbed and went to bed. I was hoping to see everything done by the next morning. However, I noticed something went wrong as it took unusually long to connect to the machine that morning. What happened was that the machine had ran out of memory, namely out of all of its 128 GB! One or more instances obviously occupied way too much memory, and brought all other instances to halt. So I had to restart the machine to see the status.

Nevertheless, script processes managed to finish 1857 packages before the halt. However, not every Debian source package has C (or C++ or Object-C) code, and only 872 out of the 1857 packages had at least one successfully compiled and linked LLVM IR file. At the moment I’m not trying to distinguish between LLVM IR files resulting from C, C++, and Object-C files, but KLEE can handle only IR files with instructions coming from C code. So, even if I stumbled upon an LLVM IR file compiled from a C++ source file, I would still attempt to run KLEE on it.

From log files of the 872 packages, I believe 514 programs were executed successfully (based on a report on how many instructions were executed from each program). Once again, I ran these programs without any concrete or symbolic arguments, hence KLEE was merely an LLVM interpreter in executing the programs in their LLVM bitcode form. I was interested in errors that KLEE reported so I generated the following summary table:

Type of error Count Ratio
Failed external call 198 43.7%
Unable to load symbol(x) while initializing globals 158 34.8%
Inline assembly is unsupported 39 8.6%
Memory error: out of bound pointer 35 7.7%
Cannot handle –posix-runtime when main() has less than two arguments 18 4.0%
Link with library /usr/lib/klee/runtime/klee-uclibc.bca failed 2 0.4%
Invalid function pointer 1 0.2%
Illegal instruction 1 0.2%
XXX vector instructions unhandled 1 0.2%
KLEE may have run out of file descriptors 1 0.2%
Total 454 100%

Earlier today I got feedback on these results from Cristian Cadar, a professor behind the KLEE tool, on what is the reason for faults #1 and #2, i.e. failed external calls and the inability on KLEE’s side to load symbols — these programs were not LLVM bitcode-linked with all of libraries they depend on. I’ll see what I can do about this, but right now it seems like there’s not much I can do - Debian packages are modularized, i.e. dependencies of a program are not crammed together into a monolithic binary; instead, they are split up into different packages and different binaries present in the Debian OS.

While in earlier testing I ran into some memory errors reported by KLEE that were actually a bug in KLEE (which got patched once I reported it), I will see if this time around the reported memory errors are real bugs in the analyzed programs.

I’ll do a similar experiment soon, but instead of providing no arguments to a program under analysis, I’ll provide one symbolic argument, which will make KLEE do its dynamic symbolic execution magic. In order to make sure I don’t run out of memory again, I’ll learn how to constrain these processes with cgroups.

On a related note, I read a research paper titled “Using Symbolic Evaluation to Understand Behavior in Configurable Software Systems”. The paper is on providing right command line arguments and configuration files with appropriate content to a program under analysis such that a symbolic execution tool can exercise lots of the program’s code. The study presented in the paper is preliminary as it looks at three free software programs only, but it is an interesting read suggesting that by doing a smart selection of parameters used by the program we can symbolically analyze most of its code, which would be computationally infeasible if instead we tried symbolically analyzing the program with all possible combinations of its parameters. I took some notes while reading the paper and I am hoping to build upon that idea from the paper in this work and the Clover project.