We are in the last days of the Google Summer of Code 2016. This post summarizes what I’ve done with regard to my project on supporting KLEE in Debile and where to find code and other I wrote and did during the summer.
There are several code bases I contributed to. Some of the contributions have been merged upstream, and some not.
Debile is a Debian package analysis infrastructure. This is the umbrella project I work on during the Google Summer of Code.
There are a few commits I made that got merged upstream:
One fix I made is related to Debile failing to build. Then there were a few documentation changes. The latest change is what I consider as a contribution to the sbuild project, which I write more about in a separate section.
The main contribution with Debile that I made is in writing a plugin for KLEE, which is in a separate code base:
This is not merged upstream yet as it still needs testing.
What remains to be done with it is to test the plugin properly and fix any remaining bugs. I believe it will not work the way it is. I have quite a few modifications to the analysis environment, including environment variables that need to be passed to sbuild, but at the moment sbuild doesn’t inherit the variables from the host. I’m looking for ways to work around this. Note I got KLEE analyzing Debian packages with pbuilder, which is another Debian package build tool. That is why I was able to run KLEE on over 500 programs from Debian.
sbuild is a Debian package build tool. It uses a chroot environment to build a source packge in isolation of the host system. By default, it uses a clean schroot session to build a package. Several existing Debile plugins, including the new one for KLEE, rely on setting up an schroot session first and then reusing it when building and analyzing a package. However, the upstream version of sbuild does not support reusing an existing schroot session. Therefore, a few years ago Léo Cavaillé extended sbuild to support a command line option for building a package in an existing session. However, the extension was a patch that never got applied upstream and stayed in Debile’s code base only.
In the meantime, sbuild moved on to newer versions and Léo’s patch became outdated, i.e. misaligned with the current version of sbuild. To use the feature Léo added, I had to re-base his patch against the latest version of sbuild, which was 0.69.0 when I did it. Ironically, there was a yet another release of sbuild in the meantime, namely sbuild 0.70.0. If we are unlucky enough, another patch re-basing will be needed.
The patch is in Debile’s repository:
Hopefully after GSoC we will get to submit the updated patch upstream and have it merged.
WLLVM or Whole program LLVM is a tool that facilitates compiling a program or a library written in C/C++ to the LLVM intermediate representation (IR). This has been an important tool in the tool chain as KLEE works on a program’s LLVM IR representation, and not on its source or binary code.
There was an unfortunate situation where a fork of WLLVM diverged from its upstream. I’d like to believe I managed to bring them back as developers from both sides started communicating and working together.
My contributions to WLLVM have been merged upstream:
KLEE is a dynamic symbolic execution (or concolic execution) tool for C programs. My efforts to create a Debian package for KLEE have led to this GSoC project. I’ve been in touch a lot with its developers and I made a number of contributions in terms of issues reported (including bugs) and code that got merged upstream:
I haven’t pushed contributions specific to the GSoC project upstream yet, but they are in my fork of KLEE:
I implemented support for the Firehose XML output format in KLEE by adding a command line option for it. What is missing, though, is providing concrete input values that lead to an error in a program analyzed. This is usually done with the klee-replay tool, but what would need to be done is to re-factor that part of code so it can be used elsewhere. Once I finish that, I will make a pull request upstream.
Bug found, confirmed, and fixed
Thanks to this work, I was able to find a bug in the hostname package. With KLEE I produced a test case that led hostname to read from uninitialized memory. The bug was confirmed for hostname versions 3.15 and 3.17, and a bug fix for it resulted in the hostname 3.18 release. The bug report I made is in the Debian Bug Tracking System.
Blog posts written during the summer
For every week of the Google Summer of Code, I would write a blog post reporting my progress. Sometimes I would also write short notes on what was on my mind. Here is a complete list sorted chronologically from oldest to newest:
- Accepted to GSoC 2016!
- Putting KLEE to Test
- KLEE: It Ain’t Gonna Do Much Without Libraries
- 2016 and C++
- Firehosing KLEE
- Writing Tests and LLVM-interpreting Hundreds of Programs
- Unit Testing Interleaved with Development
- C++ Taking Toll
- Unit Testing, cgroups, and Confirming Bugs
- Almost There with Firehosing KLEE, First Debian Bug Reported
- First Debian Bug Fixed!
- Started integrating KLEE into Debile
- Setting up sbuild Environment for KLEE
- Modifying sbuild
- Learning sbuild and Extracting LLVM IR Files
- First Debile Plugin for KLEE Done!
How is this related to my research project
This GSoC is highly related to my research project at the University of Utah called Clover. The project aims to analyze real-world programs with dynamic symbolic execution or to be more precise, with the KLEE tool in particular. We want to have big experiments in the project and we decided to do it on Debian-packaged programs. That is why I decided to contribute to the Debile software analysis infrastructure. We still have to do research on how to do certain things in symbolic execution, but thanks to the GSoC we have made a lot of progress.
It is true that I don’t have a working implementation out of the GSoC project, but I’ve learned a lot, wrote code for 4 different established software projects plus my research project, and identified remaining problems to be solved. This was all very useful for my PhD thesis proposal that I had on August 17, 2016.
I plan to keep on working on Debile, to finish the KLEE plugin for it, and to extend it with things I will discover in my research project. I really hope this will have a non-marginal impact on software in Debian that is written in C.