Almost There with Firehosing KLEE, First Debian Bug Reported

Posted on July 3, 2016 by Marko Dimjašević

If C++ wasn’t a so beautiful programming language, I guess I would have been done with implementing support for Firehose in KLEE long time ago. This week I learned another nicety of C++, which is undefined static variable initialization order. In spite of it, I am mostly done with implementing Firehose in KLEE. Beside that, I reported a bug in the hostname tool, and I played with linking dependencies in the LLVM bitcode form with a program under analysis in KLEE.

The segmentation fault problem I had last week when writing unit tests for the Firehose library was due to the infamous static variable initialization order in C++. It took me a while to pin down this unintuitive anti-feature of the C++ language. The problem is that the order is not defined in the language, but is left to compiler. I have a dozen or so static variables, almost all of them depending on some other of the static variables to initialize. With their order of initialization undefined in the language, any of them could be tried first, which would lead to a segmentation fault because the dependency is not in memory yet. Fortunately, I managed to wrap the variables into function calls and now I am done with that. I put an example output file at the end of this post, and not here, because it is lengthy.

There is still one thing missing in the implementation of Firehose: listing concrete values that lead a program execution to an error. I made a few attempts in obtaining the values when generating a Firehose output file, but so far I couldn’t get it right. The problem is that when a program is given to KLEE to analyze, the user can provide both concrete and symbolic arguments for the program, including a mix of the two kinds. I asked for help on the KLEE developers’ list.

Thanks to KLEE, with the help of Cristian Cadar I found my first bug in a Debian package. The bug is in the hostname tool (from a source and a binary package of the same name) when it is executed like this:

Because a buffer in its source code is not initialized before reading from it in this specific case, a call of the strlen function on it could read as much from the memory as possible until a zero is hit, which is when strlen stops. Obviously, this was not what the code should be doing so I filed a bug report in the Debian Bug Tracking System. I wanted to check with the Debian security team if this is a big deal first, but they hadn’t replied in a week, so I figured it’s probably not.

KLEE reports another bug in hostname, but this is not a true bug. The bug gets reported because KLEE wasn’t provided with LLVM bitcode of all dependencies of hostname, so it falsely concluded there is a bug when it couldn’t analyze a missing external function. Out of 35 memory errors I found earlier when looking at hundreds of Debian programs, it is possible most of them are false bugs too, because I didn’t provide KLEE with LLVM bitcode of their dependencies.

To deal with false bugs reported by KLEE, but also with failed external calls and unknown symbols, I am looking at pre-compiling every Debian source package to LLVM bitcode, and then when analyzing a program, linking all of the program’s dependencies (libraries) at the bitcode level within KLEE. KLEE has a command-line option -link-llvm-lib for this, i.e. linking libraries before program execution/analysis. Beside that, I will have to make sure a program is compiled against the KLEE-uClibc library, and not the standard GNU C library. All of this is related to the libraries problem I wrote about before.

Reading a research paper that is related to this project has been a weekly task for me. This week I read Dynamic Test Generation to Find Integer Bugs in x86 Binary Linux Programs. The paper is on using symbolic execution at the binary code level of the x86 architecture for finding a specific kind of bugs in GNU/Linux programs. Authors of the paper found several bugs in a handful of analyzed file-processing programs, such as multimedia players and file compression programs. They looked for underflow and overflow bugs. A source code base behind their symbolic execution tool is unmaintained and the project’s main website is not available anymore. An interesting thing they did is when they found a candidate bug, they validated it with Valgrind to be more sure it is a real bug.

<analysis>
<metadata>
<generator name="KLEE" version="1.2.0"/>
</metadata>
<results>
<info info-id="inline-asm">
<message>function "socket" has inline asm</message>
</info>
<info info-id="inline-asm">
<message>function "__libc_connect" has inline asm</message>
</info>
<info info-id="inline-asm">
<message>function "__libc_recvfrom" has inline asm</message>
</info>
<info info-id="inline-asm">
<message>function "__libc_sendto" has inline asm</message>
</info>
<info info-id="undefined-function-reference">
<message>undefined reference to function: freeifaddrs</message>
</info>
<info info-id="undefined-function-reference">
<message>undefined reference to function: getdomainname</message>
</info>
<info info-id="undefined-function-reference">
<message>undefined reference to function: getifaddrs</message>
</info>
<info info-id="undefined-function-reference">
<message>undefined reference to function: klee_posix_prefer_cex</message>
</info>
<info info-id="undefined-function-reference">
<message>undefined reference to function: rindex</message>
</info>
<info info-id="undefined-function-reference">
<message>undefined reference to function: setdomainname</message>
</info>
<info info-id="calling-external">
<message>calling external: syscall(16, 0, 21505, 53929472)</message>
</info>
<info info-id="calling-user-main">
<message>calling __user_main with extra arguments.</message>
</info>
<info info-id="calling-external">
<message>calling external: rindex(41845664, 47)</message>
</info>
<info info-id="calling-external">
<message>calling external: gethostname(44549648, 128)</message>
</info>
<info info-id="other">
<message>sethostname: ignoring (EPERM)</message>
</info>
<info info-id="other">
<message>sethostname: ignoring (EPERM)</message>
</info>
<info info-id="calling-external">
<message>calling external: getifaddrs(70424640)</message>
</info>
<info info-id="other">
<message>sethostname: ignoring (EPERM)</message>
</info>
<issue>
<message>Error: memory error: out of bound pointer.
The error occurs when hostname is executed with the following arguments: TODO</message>
<location>
<file given-path="/tmp/hostname.c"/>
<function name="show_name"/>
<point column="0" line="289"/>
</location>
<trace>
<state>
<location>
<file given-path="/tmp/hostname.c"/>
<function name="main"/>
<point column="0" line="547"/>
</location>
<notes>Call to function: main(argc=3, argv=41840176)</notes>
</state>
<state>
<location>
<file given-path="/tmp/hostname.c"/>
<function name="show_name"/>
<point column="0" line="289"/>
</location>
<notes>Call to function: show_name(type=8)</notes>
</state>
</trace>
</issue>
<info info-id="other">
<message>sethostname: ignoring (EPERM)</message>
</info>
<info info-id="other">
<message>sethostname: ignoring (EPERM)</message>
</info>
<info info-id="other">
<message>sethostname: ignoring (EPERM)</message>
</info>
<info info-id="other">
<message>sethostname: ignoring (EPERM)</message>
</info>
<info info-id="other">
<message>sethostname: ignoring (EPERM)</message>
</info>
<info info-id="other">
<message>sethostname: ignoring (EPERM)</message>
</info>
<info info-id="other">
<message>sethostname: ignoring (EPERM)</message>
</info>
<info info-id="other">
<message>sethostname: ignoring (EPERM)</message>
</info>
<info info-id="other">
<message>sethostname: ignoring (EPERM)</message>
</info>
<info info-id="calling-external">
<message>calling external: getdomainname(47306048, 1025)</message>
</info>
<issue>
<message>Error: memory error: out of bound pointer.
The error occurs when hostname is executed with the following arguments: TODO</message>
<location>
<file given-path="/home/marko/research/klee-uclibc/libc/string/strlen.c"/>
<function name="strlen"/>
<point column="0" line="22"/>
</location>
<trace>
<state>
<location>
<file given-path="/tmp/hostname.c"/>
<function name="main"/>
<point column="0" line="544"/>
</location>
<notes>Call to function: main(argc=3, argv=41840176)</notes>
</state>
<state>
<location>
<file given-path="/tmp/hostname.c"/>
<function name="set_name"/>
<point column="0" line="217"/>
</location>
<notes>Call to function: set_name(type=0, name=44445648)</notes>
</state>
<state>
<location>
<file given-path="/home/marko/research/klee-uclibc/libc/string/strlen.c"/>
<function name="strlen"/>
<point column="0" line="22"/>
</location>
<notes>Call to function: strlen(s=44445648)</notes>
</state>
</trace>
</issue>
</results>
</analysis>