Software testing has been an important, if not prevalent way of checking software correctness. In this article I will tell how have my doctoral dissertation on testing and verification of imperative software as well as my work experience after the studies led me to typed functional programming, which consequently gave me a different perspective on automatic software testing. Furthermore, I’ll explain why functional programming and static type systems are important for software correctness.
Functional software testing is a way of checking if a program yields an expected output for given inputs. It is an important part of software development. I would argue that before a programmer starts implementing a program, they should ask themselves: “Will I be able to test what I am about to implement? Do I have means of testing the program to make sure it does what it is expected to do?” If the only feasible way to test the program is to run it on one or two inputs manually, then it’s clear that the programmer has failed: their program is not testable in a programmatic way. This means the program is poorly written and that the programmer won’t be able to catch at least some bugs in a programmatic way. It also means no likely properties can be inferred. Of course, much stronger guarantees about software correctness come from a significantly harder goal we should nevertheless strive to: to be able to prove the program correct, i.e., for the programmer to be able to apply a formal system to show that the program has certain properties.
Throughout my doctoral studies and experience after it, I came to realise that software development is a complex process. One important part of it is the challenge of developing a correct program. However, it is very rare that programmers think deeply about their programs and what results they should be computing, oftentimes due to rather unrealistic deadlines they are expected to meet. A question to ask then is: what can be done about a programmer’s limited time to write correct software? While there are multiple plausible answers to this question, I will argue that a very effective solution is to use typed functional programming.
For the studies I was researching techniques for testing and verifying imperative software. I would take an existing program or a library and test or verify it in an automatic way, with various goals such as maximising branch coverage, most of the time without going into what the program is supposed to do. One might wonder what can be tested in such a case and I will explain this later. However, sometimes I had to look at the code and understand it. This turned out to be a frustrating experience. I started to develop a distaste for imperative programming and lacking type systems. At the beginning I was not aware this distaste is due to imperative programming and unsatisfactory type systems! On contrary, I perceived Java and JVM to be amazing tools in a programmer’s toolbox. Instead, I developed this understanding and distaste over time. The imperative software I was looking at was convoluted, complicated, hard to read, and most of all, it embodied leaky abstractions (this experience was attested after the studies when I worked professionally on imperative software). For example, imperative software assumes a state: you can’t call a procedure twice in a row with exactly the same arguments and expect the same output because the procedure depends on and updates the state. Yet I was supposed to develop and implement automatic software testing and verification techniques, pretty much treating the imperative software under test as a black box.
One could argue that the software for which its source code is available can’t be given the “black box” label. However, the source code I was looking at was written in Java and C. These languages have rather basic type systems compared to some other programming languages I will discuss later in this article. As a consequence, the kind of automatic functional testing that can be performed for software written in C, Java and similar languages is very low-level with properties along the lines of “does it crash or throw an exception on this and this input”. While it is important that software does not crash or throw an exception to an end-user, gradually I came to realise I am interested in testing and proving higher-level properties of software.
Type systems in programming languages are of great help when working towards writing correct software. For software to be correct, it has to implement a specification, i.e., it has to have certain properties. In a programming language with a static type system, a type is assigned to a construct such as an expression, variable, function and program. What connects type systems and software correctness is the Curry-Howard correspondence. The correspondence is a result in computer science that establishes a connection between computer programs and mathematical proofs. Types can be seen as propositions, i.e., types are properties of expressions, variables, functions and whole programs. Values of these types, e.g., expressions, functions and programs are proofs of corresponding propositions. In other words, if a program has a type and the program type-checks (compiles), the program is proved to have all properties that correspond to the type. For a given input, a program computes an output of its output type and this output is a proof object of the propositions corresponding to the program’s type. This is why you might have noticed that those working in typed functional programming languages are most of the time interested in the type (or signature) of a function and not so much in its implementation.
A very simple and canonical example of the connection between logic and programming is a function that takes a polymorphic argument of type a
and returns a value of that same type a
. In a setting where a
is completely unconstrained, i.e., where we know nothing in advance about this type and operations we can do on this type, the only possible implementation of this function such that it is total and referentially transparent is the identity function, i.e., the only thing we can do in the function is to return the input argument unchanged for the result of the function. Here the type a
restricts what the function can do, thanks to which we have no option of providing an incorrect implementation. There is only one possible implementation for this function. Therefore, once we implement the function, we know it is correct and that it is the identity function. In general using polymorphic types does not always lead to only one possible implementation, but it nevertheless significantly reduces the number of possible incorrect implementations.
For a while I perceived Scala as a good language of choice for writing correct software. After all, it has a relatively advanced type system compared to Java and especially compared to C, meaning more higher-level properties can be expressed. Unfortunately, Scala is an imperative programming language (though it does provide some functional programming capabilities too), which in my view takes away all benefits its type system might provide. Let’s assume a total referentially transparent function in Scala that takes as input an integer and returns an integer. The function might have lots of other undesired and unstated properties other than the desired property of computing an integer from an integer, such as printing to the console. In other words, a Scala compiler is not helping a Scala programmer much: the compiler says their program has properties given by its signature, but it does not say (guarantee) these are the only properties. This boils down to the inability to reason about a function by its signature, which forces the programmer to peek into the implementation of the function and consequently of functions this function calls, often having to go down the rabbit hole. As a result, the ability to abstract over lower-level aspects of a program is significantly impaired. As a side note, the OCaml programming language is in that regard the same as Scala: it has a decent type system, but one can’t conclude what a program is and is not doing by looking at the type (i.e., the signature) of a program due to the language’s imperative nature. This impairment is in direct correspondence to theorems and proofs in mathematics: if a mathematician wouldn’t be able to rely on a smaller theorem (a lemma) in proving a bigger theorem and would instead have to look at a proof of that smaller theorem, what would be the point in having a smaller theorem proved separately?
If Scala is not a good tool for writing correct software, and Scala is a better tool than C and Java, are dynamically typed (uni-typed) programming languages such as Lisp, Python, Ruby and JavaScript a better tool than Scala? Definitely not, because these languages don’t even have a type-checker (compiler) to check at least some properties! Everything in these languages is of the same type, i.e., everything has the same trivial property. Because there is no type-checker, programmers developing software in a dynamically typed programming language are forced to write tests for things that a compiler would have checked if they were developing in a statically typed language. Even with these tests in place, the tests merely check properties on some inputs, compared to having the properties proved by a compiler. I’ve been amazed with what kind of properties can be expressed and proved relatively easily in languages with advanced static type systems; I’ll mention some examples below.
As mentioned earlier, I find it quite difficult to test and reason about imperative software. In functional programming, equational reasoning can be applied towards reasoning about the correctness of software. This reasoning is embedded in a typed functional programming language and it is automatically done by a compiler. In functional programming there is no need for a separate specification language and an external verification tool. In imperative programming, there is no such holistic equivalent. Equational reasoning is related to the Curry-Howard correspondence: proof simplification in logic corresponds to program evaluation in functional programming, and program evaluation is done by substituting equals for equals, i.e., by using equational reasoning. Thanks to totality and referential transparency, in functional programming we can treat functions in the same way we treat functions in mathematics. Overall, the Curry-Howard correspondence gives us a powerful connection between typed functional programming and logic, which we can rely on in writing correct software: types are propositions, programs are proofs and program evaluation (execution) is proof simplification. This also implies a compiler doesn’t only translate a program from one language into another, it also checks program proofs along the way. If a program does not compile, there is an incorrect or unfinished theorem or proof. Edsger Dijkstra had an idea that systematic development of programs has to be done together with their correctness proofs. By developing programs in a typed functional programming language and relying on the Curry-Howard correspondence, a programmer provides correctness proofs by first specifying precise enough types and then giving an implementation for the types.
Expressing properties in a data type itself is called internal verification. For example, a set type guarantees that all values in it are unique. This property is enforced by the type itself; we don’t have to write any proofs for it. However, it is not possible to express all properties via internal verification. For example, how would one state that multiplication for natural numbers is an associative operation within a type for natural numbers? Nevertheless, this property can be expressed and proved via external verification. In external verification one expresses a property of a type via another type, e.g., via an equality type or a dependent pair type. Both verification approaches are helpful when developing software and which one is used depends on a particular situation and style. Both of the approaches will give a compilation error in case a property or its implementation is incorrect or incomplete, which is important feedback to a programmer.
In my professional usage of programming languages, I got from Java, Python and C via Scala to Haskell. Haskell is described as a pure typed functional programming language. To me, Haskell is a big improvement over Scala. For example, Haskell makes a programmer be explicit about things like input-output and changing state: they have to mention such effects in their program’s type. Furthermore, the lack of a “feature” in a language can be an advantage. For instance, one way in which Haskell is better than Scala is not being an object-oriented language, in particular, not being based on a subtyping polymorphism. In brief, this allows for much simpler type-checking (compared to Scala’s type-checking that has to incorporate both subtyping polymorphism and parametric polymorphism), better type inference and it avoids absurdities that arise with subtyping. Still, I am not too satisfied with Haskell: it lacks totality checking, it lazily evaluates by default (which many programmers find hard to work with, e.g., memory leaks happen easily), its module system is simplistic, exceptions are a mess and the language has functions like error
and undefined
that go against the idea of propositions as types.
While Haskell’s type system is powerful, it is missing a whole dimension from the lambda cube: a dimension where types depend on values. There are Haskell libraries and sub-optimal attempts at hiding this defect and there is also an effort to bring dependent types to the language itself. This lack of dependent types means there are so many properties we can’t express easily in Haskell! That’s why I’ve been looking at two functional programming languages that have full support for types depending on values: Idris and Agda. I ran into Idris while still interested in Scala. Agda has had my attention for the last few months because it compiles to Haskell. Both of these languages are pure dependently typed functional programming languages. From my experience, most programmers, including me two years ago, find the idea of types depending on values hard to comprehend. Main stream programming languages don’t have dependent types (if any types at all) and consequently many programmers are not aware of such types. Yet these types can be rather useful: in writing an interpreter for a language, in making sure database accesses follow a schema, in having a compile-time guarantee that array accesses are always within an array’s bounds, in avoiding deadlocks and in ensuring communicating systems follow a protocol, just to name a few. In Agda and Idris, a programmer can use and create types that will guarantee no errors in mentioned examples ever occur while their program is running.
If types give a programmer an amazing ability to express properties and have them automatically checked by the compiler, is there any point in writing tests? The answer is yes! For instance, tests can serve as scaffolding while working on formulating types (theorems) and providing implementations (proofs). Furthermore, sometimes it might be too big an effort to provide advanced types and their implementations. Sometimes a programmer won’t have skills to formulate a specification via types or to provide an implementation. In these situations tests are better than no correctness checking at all. With that said, a programmer should keep one thing in mind: tests are no replacement for proofs. Tests check program correctness on a finite and limited number of inputs out of possibly infinitely many, while types in a total program provide a guarantee that the program works for all well-formed inputs. Edsger Dijkstra has put this idea of software testing as: “Testing shows the presence, not the absence of bugs.”
I’ve been moving away from testing other people’s software and from providing techniques to do so. If someone else in their program did not provide theorems and proofs to state intended properties and to eliminate implementations that do not have these properties (i.e., erroneous implementations), why should I be cleaning up after them by writing tests (and generators in property-based testing) for their program? We should strive towards providing programmers with and encouraging them to use tools that enable them to make fewer mistakes in the first place, unlike hoping to find some mistakes once an unknown number of them got out. I want to be writing types (theorems), i.e., specifications and provide corresponding implementations (proofs) that other programmers can build on top of. In case a programmer wouldn’t be using these types as intended, they would get a compilation error, which would be a clear sign they got something wrong. When pushed far enough, this approach of propositions as types turns a famous programming joke into a serious statement: “If it type-checks, ship it!”