Blog root page
previous post in category
next post in category
I did my first program on a plug board in '57. That was an intersting era where Assembly language was the Silver Bullet that would solve the Software Crisis. Between changing vacuum tubes and programming plug boards I had such a character-building experience that I didn't go back to software development until the late '60s when 3GLs were commonly available and computers were solid state.
The '60s and '70s were known as the Hacker Era. Back then 'hacker' was a complimentary term that described someone who could produce prodigious amounts of code in a short time and who had an almost supernatural ability to get it back up and running when it broke. I had a couple of good years in the era when I hit 100 KLOC. Fortunately that code is long gone and no one will ever see it again.
Alas, by the late '70s people figured out that if the code didn't break in the first place there would be no need for supernatural powers to fix it. Perhaps more important, if the code were written better there would be no need for those indispensible hackers who were the only ones who could decipher the code. That was when the term 'hacker' became a pejorative.
The first systemmatic attempt to eliminate hackers appeared in the form of Structured Programming that provided a collection of good practices for writing 3GL code. That was quickly followed by Structured Design and Structured Analysis, both of which introduced more abstract graphical representations of programs. The dominant design technique became top-down functional decomposition where the solution was started with a very simple and general statement of the problem solution and then one successively decomposed that solution into more detailed levels. Each statement of functionality was collected as a node in an inverted "tree" whose lowest leaves were logically indivisible.
The impact of SA/SD/SP was enormous. Defect rates dropped from 150/KLOC to 5/KLOC. In addition, productivity for large projects where multiple programmers had to coordiante efforts improved greatly. Instead of 1000 programmers working for 10 years to produce 1 MLOC, 200 programmers could do the same job in 2-5 years
Alas, there was still a problem. Writing new code was one thing, but maintaining old code was quite another. Depending on whose data one examined it to 5-20 times more effort to modify existing code than the write it originally. As a result 60-80% of all developer effort was expended in maintaining existing software.
There were a lot of problems that led to the Maintainability Gap but they could be broadly categorized as having two root causes: uncontrolled access to state variables and hierarchical implementation dependencies. State variable access was primarily a defect problem as data was modified in unexpected ways at unexpected times during execution. That resulted in additional test and repair cycle time when one modified existing code because it was difficult to predict how changes would affect untouched code that happened to access the same data.
Hierarchical implementation dependencies resulted in the legnedary "spaghetti code". That was because the leaf nodes in the functional decpomposition tree were at a very fine level of abstraction -- essentially arithmetic or logical operators in the 3GL. It was simply too tedious to cobble together lengthy sequences of such atomic operations to do complex tasks. However, the higher-level nodes in the functional decomposition tree quite conveniently captured such sequences as descendants. Since this nodes were systemmatically derived they had defined functional semantics. That allowed them to be reused (i.e., accessed by "clients" in different parts of the application that happened to need the same sequence of leaf oeprations).
That sort of reuse through accessing higher-level functions was a boon to developers and led to the notion of "procedural development" because it made excellent use of the core characteristic of 3GLs, block structuring around procedures. The problem, though, was that the functional decomposition "tree" now became a lattice where each node potentially had multiple ancestors (clients) as well as multiple descendants. It was that fanout of dependency that led to spaghetti code.
The dependencies existed because in top-down functional decomposition the lower-level functions are extensions of their parent higher-level function. That is, the specification of the higher-level function included the specifcation of the lower-level functions. Thus any contract between the client and the higher-level function dependend upon the specification of the entire descedant tree of functions. So if one changed the specification of a lower-level function, the specification of all of its higher-level ancestors was also changed.
That was no problem so long as the access structure was a pure tree. That's because the change was probably triggered by a need to change the specification of a higher-level function and implementing the fix in the lower-level function was simply the easiest place to do it. However, when one has a lattice, the higher-level functions have multiple clients. If only one client wants the change, the other clients may be broken by the change. Worse, there can be a client at any level of ancestry in the tree, so the change may break clients that are not even direct clients of the original higher-level function. The result was a disaster for maintainability because every change for one client could potentially break a host of other clients. Fixing things to keep all clients happy often resulted in major surgery to the tree or very complex parameterization that complicated the functions.
In the '80s and '90s two very different approaches to software construction evolved to address these problems. One was functional programming, which grew up in the scientific programming arena. In functional programming persistent state variables are completely eliminated; all state is passed as function arguments and resutls. However, the hierarchical structure remained. That was because in scientific arenas algorithms are primariy defined mathematically so they tend to be quite stable. Therefore hierachical dependencies were not very relevant because there were no client changes to accommodate.
In addition functional programming introduced a number of features (e.g., sophiscticated parametric polymorphism, mixins, etc.) that allowed the construction of very compact and elegant programs in a computational environment. Typically functional programs are very intuitive to construct and they are often integer factors more compact that programs employing other construction techniques. So in the rare event that the program does have to be changed, it is no big deal if it is rewritten rather than simply modified. At the same time the problems of global data access are completely eliminated.
The second new contruction approach was OO development. It grew up in IT and R-T/E where requirements are highly volatile. In addition, the problem spaces are not defined with mathematical precision so there is a gap between the customer view of the problem and the computing space view. So OO development sought to address improved mapping between the customer space and the computing space in addition to managing global data and minimizing hierachical dependencies.
The priorities OO development placed on these goals was quite different than those placed by functional programming. Basically the priorities (1 is highest) were:
| functional | OO | ---------------------------+-----------------+--------------+ global data management | 1 | 3 | ---------------------------+-----------------+--------------+ hierarchical dependencies | 2 | 1 | ---------------------------+-----------------+--------------+ customer space mapping | none | 2 | ---------------------------+-----------------+--------------+
These priorities are no surprise, given the quite different problem domains in which they evolved. However, it had profound affects upon the construction paradigms -- to the point that the two apporaches are fundamentally incompatible. Thus any attempt to mix & match features across the approaches is doomed to defeat the benefits of either approach.
The OO approach addresses hierachical dependencies by completely eliminating the tree. One still does functional decomposition but only to identify the leaf nodes. Once the leaf nodes are identified, the tree essentially disappears. This works because several OO features play together...
Logical indivisibility. In OO developement we have a very flexible view of logical indivisibility. There are basically three levels at which it applies: subsystem, class, and reponsibility. A subsystem represents a large scale encapsulation of a single subject matter. A class represents the encapsualtion of a single problem space entity. A responsibility represents the encapsulation of an atomic element of knowledge or behavior.
In all cases the notion of indivisibility depends upon the level of abstraction one needs to solve the problem in hand. For example, it is not uncommon for a single responsibility in one subsystem to expand into an entire class or even an entire subsystem outside the context of the given subsystem's subject matter. This flexibility avoids the box one gets into when the notion if 'indivisible' is tied to something like 3GL arithmetic operators.
Peer-to-peer collaboration. In the OO approach objects collaborate directly with one another rather than through higher-level controllers. IOW, at a given level of abstraction, all entities are peers and communicate directly with one another. Such collaboration is supported by the the notion of relationships between entities. Such relationships are very important in OO development because they provide a static structure on which message addressing is based. That structure is independent of the semantics of individual classes.
Separation of message and method. In OOA/D the message that one object sends is a quite different thing than the method with which the receiving object responds. That allows messages to be generated independently of external context; they simply announce that the sender has done something. It is up to the developer to determine who cares about what happened enough to provide a response. In UML that can be done at the level of Interaction Diagrams, which is a higher level of abstraction that individual object implementations.
[Alas, the OOPLs don't provide a similar separation. That's because they are 3GLs and they have to make compromises with the computational model at that level of abstraction. So the message identifier is also the responding method's identifier because the 3GLs all employ procedural message passing. However, if the OOA/D has been done properly the methods and collaborations will have been defined so that this is benign.]
Encapsulation and Implementation Hiding. These work to ensure that the implementations are properly decoupled. That is, the specification of a responsibility only needs to be defined in terms of the intrinsic rules and policies implicit in the responsibility. In other words, one should be able to exhaustively unit test an object method without implementing any other behaviors. More important, one can be confident that the state of the application will be the same after executing that method in the unit test in situ in the application as it would be executing that method in a unit test.
Asynchronous behavior model. In the OOA/D behavior is assumed to be asynchronous. That is, one assumes there is an arbitrary delay between the time a message is issued and when it is consumed (i.e., a behavior responds). (Because message and method are not separated in the OOPLs, the OOP model is synchronous.) This makes it somewhat more difficult to construct correct OOA/D models, but it yields huge dividends in maintainability and robustness. If one can't count on something happening immediately after issuing an message, one can't very well make the sender count on something specific having happened as it continues to execute.
When combined properly these features all ensure a very high degree of decoupling of implementations that completely eliminates the hierarchical dependencies of spaghetti code. Ironically, if one looks at the method call graph of a typical OO application it looks even worse that the rats' nests from procedural applications. That's because logical indivisibility, spearation of concerns, cohesion, and other OO practices tend to produce a lot of small abstractions with limited individual responsibilities that are highly interconnected due to peer-to-peer messaging. Don't worry about it. Those call graphs represent message traffic, not dependencies. One of the prices one pays for eliminating implementation dependencies is a lot more messages between a lot more entities.
On the global data front the OO approach essentially still allows it. Any public knowledge attribute of any object is available for access by any other object connected to it over some relationship path. So, in effect, all public atributes are global. However, the OO approach does address the issue by providing support for much better management of global data. Again, several features play together...
Encapsulation and implementation hiding. The data has one owner so anyone who wants to modifiy the data must talk to that owner. This raises the level of abstraction of access to that of collaboration. While that may not seem important, it enables other techniques for controlling access, such as the Observer design pattern. The real value, though, is forcing the developer to think about accessing data in terms of entities and their collaborations.
The biggest advantage lies in encapsulation of the rules and policies for modifying data in particular objects. Just as some object "owns" the data, so does some specific object "own" the rules for changing it. Very often this naturally leads to localizing those rules and policies in a single object rather than being littered all over the application. This also separates the issue of who owns the rules from the issue of when they should be executed.
Design by contract. Once the rules for modifying data are encapsulated one must address the issue of when they need to be executed. In the OO approach this comes down to design-by-contract (DbC). Before a behavior can execute some set of preconditions must prevail in the application. One of those is usually that some other behavior had to be executed immediately before the one in hand. This is the classical procedural view where the solution is a sequence of operations.
What OO adds to the pot is the notion that the DbC preconditions include conditions concerning state variables. That is, a precondition of execution is that all of the data that the behavior needs has been properly updated and is consistent. So one gnerates the message to execute the behavior only where one is sure that all the DbC conditions have been fulfilled. That segues to...
Peer-to-peer messaging and logical indivisibility. Because behavior responsibilities are logically indivisible they can be daisy-chained with messages to form sequences. Since that daisy-chaining depends on DbC to ensure the preconditions are satisfied and because OO includes data intergrity and consistency in the precondiitons, one can formally validate that the sequence one constructs is correct. Or, as a practical matter, one can construct the generation of peer-to-peer messages at the UML Interaction Diagram level in a manner that ensures DbC is satisfied.
Synchronous knowledge model. In the OO approach one accesses data on an as-needed basis. This works well with DbC because it simplifies the specification of execution preconditions. It also allows lower-level implementations at the OOD/P level to ensure data integrity when things start to get squirrelly (i.e., when one introduces stuff like connecurrent threads). The scope of integrity and consistency is limited to the scope of the executing method.
The assumption of synchronous access is necessary so the developer can maintain sanity. If there were arbirtrary delays between when data was requested and when it was delivered, trying to deal with data integrity issues would be mind boggling. So in the OOA/D one assumes a synchronous access view. Then if one must deal with actual delays, such as distributed data or paused threads, life is much simpler because it is relatively easy to enforce integrity over method scope as if there were no delay.
[Note that this is a pure methodological constraint. There are situations -- such as snapshots of data streams being collected in parallel -- where one must collect the data first in a consistent manner before invoking the method to process it. So passing knowledge as message data packets (e.g., method arguments) is sometimes necessary. But in a well-formed OO application, the method will always navigate directly to the data owner and extract the data on an as-needed basis unless there is an explicit constraint to the contrary.]
So while the OO approach certainly doesn't eliminate all of the problems of persistent state, it goes a long way towards making them more manageable. Given elimination of spaghetti code and a much better mapping between customer and computing spaces due to problem space abstraction, it is probably a reasonable trade-off.
Blog root page
previous post in category
next post in category