Authors: Ben Moseley & Peter Mark
- Comment blocks are used to add personal comments. These are usually clarifications, extra info, or guidance.
- This summary includes all quotes and key points only. You can read the full paper here.
- Unless otherwise stated, all claims are made by the authors.
Complexity is the single major difficulty in the successful development of large-scale software systems. Following Brooks we distinguish accidental from essential complexity, but disagree with his premise that most complexity remaining in contemporary systems is essential. We identify common causes of complexity and discuss general approaches which can be taken to eliminate them where they are accidental. To make things more concrete we then give an outline for a potential complexity-minimizing approach based on functional programming and Codd’s relational model of data.
Comment: The authors didn't rigorously define complexity but implied that it's the degree of difficulty in understanding/maintaining software systems. Alas, "difficulty" is subjective.
Complexity hinders understanding — Being able to understand a system (like the effects of changes) is a prerequisite for avoiding problems.
“...we have to keep it crisp, disentangled, and simple if we refuse to be crushed by the complexities of our own making...” - Dijkstra
“The general problem with ambitious systems is complexity.”, “...it's important to emphasize the value of simplicity and elegance, for complexity has a way of compounding difficulties” — Corbato, 1990
“there is a desperate need for a powerful methodology to help us think about programs. ... conventional languages create unnecessary confusion in the way we think about programs” - Backus, 1977
“...there is one quality that cannot be purchased... — and that is reliability. The price of reliability is the pursuit of the utmost simplicity” — Hoare, 1980
“I conclude that there are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult.” - Hoare, 1980
“Those who want really reliable software will discover that they must find means of avoiding the majority of bugs to start with.” — Dijkstra
“Our response to mistakes should be to look for ways that we can avoid making them, not to blame the nature of things.” — O'Keefe
have you performed the right tests?. The only certain answer you will ever get to this question is an answer in the negative — when the system breaks.
“testing is hopelessly inadequate....(it) can be used very effectively to show the presence of bugs but never to show their absence.” - Dijkstra
“From the complexity comes the difficulty of enumerating, much less understanding, all the possible states of the program, and from that comes the unreliability” — Brook
“computers. . . have very large numbers of states. This makes conceiving, describing, and testing them hard. Software systems have orders-of-magnitude more states than computers do.” — Brook
“when you let the nose of the camel into the tent, the rest of him tends to follow”
regardless of its hidden internal state— every time the test is run with those inputs.
Control: the order in which things happen.
Most traditional programming languages force a concern with control — control is based on statements' order.
The difficulty is that when control is an implicit part of the language, then every single piece of the program must be understood in that context — even when the programmer may wish to say nothing about this.
When a programmer is forced (through the use of a language with implicit control flow) to specify the control, they are being forced to specify an aspect of
how the system should work rather than simply
what is desired. Effectively they are being forced to
over-specify the problem.
Consider the pseudo-code example:
1a := b + 32c := d + 23e := f * 4
The programmer is only interested in specifying a
relationship between certain values but has been forced to say more than this by choosing an arbitrary control flow.
This forced order complicates informal reasoning — the person reading the code above must effectively duplicate the work of a compiler — they must (under the definition of the language semantics) start with the assumption that the ordering specified is significant, and then by further inspection determine that it's not.
The problem is not in the text of the program above — after all that does have to be written down in some order — it's solely in the semantics of the imperative language.
The second control-related problem is concurrency which affects:
Shared-state concurrencyis the common concurrency model — where specification for explicit synchronization is provided. This further adds to the number of scenarios that must mentally be considered as the program is read.
“Many of the classic problems of developing software products derive from this essential complexity and its nonlinear increase with size” — Brooks
“It has been suggested that there is some kind of law of nature telling us that the amount of intellectual effort needed grows with the square of program length. But, thank goodness, no one has been able to prove this law. And this is because it need not be true. . . . I tend to the assumption — up till now not disproved by experience — that by suitable application of our
powers of abstraction, the intellectual effort needed to conceive or to understand a program need not grow more than proportional to program length.” — Dijkstra
intensional identity(in contrast with
extensional identityin which things are considered the same if their attributes are the same).
"In a sense,
object identitycan be considered to be a rejection of the “relational algebra” view of the world in which two objects can only be distinguished through differing attributes." — Baker
Value Objectsare used in cases where mutability is not required. They attempt to de-emphasize the original concept of object identity and re-introduce extensional identity. This is usually done using access procedures to check equivalence in a domain-specific sense.
referential transparency— which implies that when supplied with a given set of arguments a function will always return the same result.
map) rather than explicit looping.
1procedure int getNextCounter()2 // ’counter’ is declared and initialized elsewhere in the code3 counter := counter + 14 return counter
1function (int, int) getNextCounter(int oldCounter)2 let int result = oldCounter + 13 let int newCounter = oldCounter + 14 return (newCounter, result)
... if there is any possible way that the team could produce a system that the users will consider correct without having to be concerned with a given type of complexity then that complexity is not essential.
Informal requirementsgets converted directly to
|Data essentiality||Data type||Data mutability||Classification|
7.1and see where they break down in the real world.
“Algorithm = Logic + Control” — Kowalski
Comment: Because the authors claim that state is the prime cause of complexity, they use it interchangeably with complexity.
|Accidental useful complexity||State / Control||Separate|
|Accidental useless complexity||State / Control||Avoid|
“The logic component determines the meaning . . . whereas the control component only affects its efficiency” — Kowalski
Changes to a component depended upon might require changes to the dependent component, but not vice versa. 8. Essential State: This can be seen as the foundation of the system. The specification of the required state is completely self-contained. 9. Essential Logic: aka “business” logic. It does not say anything about how, when, or why the state might change dynamically.
relvars, leading to the terms base
derived relvar, and we shall use this terminology later.
separateas recommended in section 7 and hence gains all the benefits we derive from that.
The example can be found in the paper.