2002 January

Software Engineering Considered Harmful

[Appeared in: CACM, vol 45, no 11, pp 102-4, 2002 November.]

Software Engineering (SWE) was introduced over thirty years ago - surprisingly at a NATO conference instead of, say, by an academic institution - to attack the well known software crisis. In spite of vigorous attempts and developments since then (OO, SW development processes, development process metrics, IDEs, UML, etc.) the crisis is alife and well. For example Denning observes:

   [Users, students and professionals] find poorly designed software,
   complex and and confusing systems, software without warranties,
   begrudging technical support, surly customer service, inter-vendor
   finger-pointing, disregard for privacy, and even poorly-managed,
   investment squandering dot-com companies.

As a consultant during the last years, I can add to his observation: I earned absurd fees by finding obscure bugs, documenting incomprehensible systems, writing large log analyzers for out of control multi-threaded systems, in general: cleaning up a growing mess.

I volunteer, with understandable trepidation, the conjecture that the concept of Software Engineering has not been helpful, and plausibly has deeply exacerbated the software crisis.

Computer Science is the traditional discipline (in the US) under which SWE resides. Here we have an immediate problem. The "science" designation is appropriate indeed for the hardware side of computer systems. It alllows to investigate linkages between the core of Physics and the materialization of data processing components and candidate assemblies. Hypothesis can be validated by concrete testing in the tradition of the Sciences. This supports in turn Computer Engineering as a solid enterprise for the hardware side (which may, as other sciences, use software modeling to advance its aims).

The term "Software Engineering" suggests that there is similarly a "Software Science" discipline that backs up SWE. There is none. Testing, as a key corollary and observed by many, does not give the certainty of the corresponding activity in Science. The results have been staggering as witnessed by the quotation from Denning. The bad news is that things are getting worse. Distributed computation with asynchronous communications yield non-reproducible timing errors, which can take "for ever" to isolate, if ever.

Certainly, I agree that the standard tricks of the trade (OO, abstraction, reviews, rigorous procedures, regression testing, metrics, etc) have been helpful, but we are still dealing with an intrinsically unsatisfactory situation. Especially because there is, I believe, a serious alternative.

Consider a mathematician who after working for many years produces a proof for a conjecture that has a length of several hundred thousand pages of semi-formalized notations. Is the proof correct? Testing the proof by looking at a page here or there would allow to find errors but would not yield a watertight conclusion.

The situation would be similar if the mathematician would have written out the proof in a formal language, say predicate calculus with set theory notations or any other formalism. Just looking at snippets here and there would still not settle the correctness (and one has to worry that the rigorous formalization actually corresponds with the conjecture to be proven, but lets ignore this for now).

The formalized situation is, however, different. The formal proof can be fed into a proof validation program (yes this program must be correct) that encodes logic axioms, inference rules and axioms of set theory and any other theory employed. This time we would have a rigorous confirmation (or not) that the mega proof is correct.

This scenario works because we know, among others, the "ins-and-outs" of logic inference rules. Could this scenario or a similar scenario be applied to software? I see two bottlenecks, which I believe can be addressed by the academic community.

Consider first a scenario whereby a large program, which includes process creation/ destruction, asynchronous communication, etc. is created manually. This program is automaticly correct in the absence of a formal specification of its intended behavior. Hence assume that we have available also a formal description of intended behaviors along the lines of generalized use cases. The key question is now whether we can construct a validator similar to the validator for a mathematians proof. Obviously, such a program must at least know the syntax of the language in which the program has been constructed. In addition the semantics of that language must be known. Hoare logic is, unfortunately, not rich enough for the task at hand. It can deal with simple assignments, conditional expressions, and recursion, but at this point we do not have formal, declarative, off-the-shelf semantics for pointer manipulations, process creation and destruction and asynchronous communication. Based on the work I have done in this area I admit that this is hard stuff. Still I believe that it is doable and given the economic importance of correct software it seems to me that the academic community should see this as a top priority.

This scenario would accomodate software written in low level languages like C and C++. It cannot be denied that writing a system in these languages together with the associated formal specification is a tedious affair. Memory management is especially tricky to formalize. The situation improves when a target system is not "coded" in a programming language but in a design language like UML, lets call it UML2.

A similar scenario would entail a system to be "coded" in UML2 while an UML2-based validator would again ascertain the satisfaction of this design against the generalized use cases. Meaning preserving transformations/ compilations would map such a design into traditional languages. Optimizers would ensure efficient code.

This second scenario also needs attention from the academic community: UML does not yet have watertight formal semantics for this scenario to be currently achievable.

This state of affairs is quite surprising. I would have bet my left hand in 1990 that by now we would not have to "tinker" anymore at the programming language level. I then met people at an OOPSLA workshop that had a prototype implementation of an evaluator for OOAD "code". It seems to that the ruling Engineering approach to software development has precluded progress towards a rigorous Applied Logic approach.

Do we have to wait until two B747s collide in midair as a result of a software glitch, waking up the lawyers, before we become serious about quality?

PS I have been asked who I am addressing. OK, let me be more direct. Software academics have played only a secondary role in the last decades in the development of OOA, UML, Java, etc. They have also not extended Hoare logic (or done any other theoretical development) so that the validation tools that I have described can now not be build. It seems to me that the software theoreticians need to get their act together so that effective validation tools for large systems can be constructed. Here the challenge: I am making available $100 for the first person (or team) that:
- generalizes Hoare logic so that the declarative semantics of the following operations is captured:
-- assignments in nodes of graphs,
-- synchronous and asynchronous inter process/thread communication,
-- process and thread creation and deletion
- these extensions should be compositional; i.e. the semantics of a code can be composed from the components (with the exception of deadlock freeness of multi-threaded systems)
- construct a symbolic code evaluator based on this extended Hoare logic that generates the appropriate proof obligations in a logic for which theorem provers are available
- this symbolic evaluator generates the proper proof obligations for annotated versions of:
-- Robson's graph copying algorithm
-- the Patterson-Wegman linear unification algorithm
-- the Olympic torch algorithm (a token is passed on from left to right through the leaf nodes of a tree where the nodes are processes emanating from a root process node)

The last item on this list should ascertain not only theoretical correctness of the declarative semantics, but also pragmatic adequacy. No ad hoc correctness proof is acceptable!