Blog root page
Previous post in category
Next post in category
Before getting into a particular strategy for piecemeal legacy replacement I have to talk about some of the current techniques that are employed for dealing with legacy code. Probably the most common tool employed is reverse engineering. In my opinion this is the most overused and misused tool in today's development environment. Reverse engineering is a tool-of-last-resort for dealing with an undocumented system when no specification of requirements exists and no domain expert is available. That's because reverse engineering causes more problems than it actually solves in most situations.
One problem with reverse engineering is that the path OOA -> OOD -> OOP is one way in practice. That's because for each artifact at the higher level of abstraction there are usually several artifacts at the lower level of abstraction. In addition, there are often multiple alternatives at the lower level of abstraction whose selection is based upon design decisions driven by nonfunctional requirements. Worse yet, one can often implement using a particular low level artifact for several higher level artifacts. When one looks at the OOP code there is no way to determine why a particular artifact was chosen during OOP. For example, it is common to embed one object in another during OOP:
class Customer { private: String address; // the address is an embedded String instance ... }
One might logically infer that Customer and String are related through a UML composition relationship:
* [Customer] <*>--------------- [String]
because the life cycle of the address String is tied to that of the Customer object. However, that would not be correct in all cases (e.g., the same address might be assigned to different tenants over the application execution). The developer might have chosen embedding simply as a low level tactical optimization to avoid the indirection of a pointer reference when the address is accessed many times. In that case there was no notion of composition constraints in the OOD. This creates an inherent ambiguity for any reverse engineering tool attempting to create an OOD model from OOPL code.
Another problem with reverse engineering is the model clutter. In the example above String is simple a knowledge attribute ADT. That is, it is a computing space data holder entity rather than a customer problem space entity. If it is expanded as a full String class with a relationship to Customer in the OOD model, the Class Diagram is going to get very cluttered very quickly. It is very difficult for the reverse engineering tool to make those sorts of decisions because the OOPLs provide no convenient syntax to make the distinction. In other words, the distinction lies in the mind of the developer. The clutter problem is so bad that most reverse engineering tools cop out and refuse to expand any embedded classes in the OOD. However, that presents problems when the embedded class actually abstracts a customer problem space entity.
The corollary is that all reverse engineering can produce is a model for the lowest possible level of OOD. In fact, it is an OOD model that is a gross over specification because it will show artifacts that were only introduced during OOP. Any OOA or high level OOD will be indistinguishable in the model. This is especially troublesome for the OOA since the OOA model isolates the solution for functional requirements. Thus in a reverse engineering OOD one cannot determine whether the artifacts are there to resolve functional or nonfunctional requirements, which introduces serious ambiguities when trying to extract requirements.
The big problem with reverse engineering, though, is that the OOD model is usually a very bad OOD model. That's because one wouldn't need to be doing reverse engineering if the application was well-formed and well documented. Requirements would be directly traceable and their resolution would be properly encapsulated. When the application is neither well-formed nor well-documented, one has unmaintainable software and all the reverse engineering model does is represent that bad software in a different way. In other words, when reverse engineering is necessary one is already in a situation of: Garbage In; Garbage Out. Therefore Lahman's three basic Laws of Reverse Engineering are:
(1) Only use reverse engineering to extract requirements if the system is undocumented and if there is no domain expert available to provide requirements.
(2) Record the extracted requirements in a formal requirements specification and then immediately throw away the reverse engineering models.
(3) Never, ever use a reverse engineering model as a guide for writing new software.
When faced with identifying functionality in a system to be replaced, there are two commonly used techniques. The first is to identify the functionality based upon existing programming unit boundaries in the application. One simply rewrites those program units within their existing interfaces. This is almost always a bad idea when the goal is to improve maintainability (as opposed to simply modifying the functionality for changed or new requirements). The reason is that if the programming unit is not already hemorrhaging cohesion, one probably wouldn't have a maintainability problem to fix. [An exception might be an isolated subsystem that was encoded by a less than competent team.] Once an application is to the point where maintainability has degraded to Average or worse, there will almost always be serious modularity problems. In the extreme, one could "fix" all the modules in this manner and still have an unmaintainable system.
Once an application as a whole (or a subsystem within an application) is deemed unmaintainable, do not attempt to preserve any program unit boundaries within that scope when doing piecemeal replacement.
The second way that is commonly employed to isolate functionality for piecemeal replacement is a technique from Graph Theory: defining the replacement boundary as a minimum cut set of the existing application call graph. The idea is that if one selects along such a boundary, it will be easier to excise the offending code and insert the replacement code. While that is true, this is still a Really Bad Idea. The question to ask is: What is the minimum cut set based upon? The answer is: a spaghetti code call graph. If the call graph reflected a well-formed OO application, then the code wouldn't be unmaintainable, would it? One can also ask: where did the call graph come from? Answer: from reverse engineering. So if it is used to determine partitioning for new code one is violating the Third Law above.
[There is actually a more theoretical reason why using the minimum cut set is not useful for partitioning OO applications. In OO development collaboration is always peer-to-peer within a subsystem so the subsystem call graph always looks like a rat's nest. In addition, objects and responsibilities are abstracted separately in OOA/D from the sequencing of messages between those responsibilities to provide that overall problem solution. Typically any maintainability problems will be related to the nature of those abstractions rather than the messaging. Therefore any OO call graph minimum cut set is more likely due to serendipity than any fundamental structural issues.]
The bottom line here is that the conventional techniques for doing piecemeal legacy replacement are pretty much ka-ka. Worse, they can actually result in the replaced system being as bad, if not worse, than the original system in terms of maintainability. The next post will address a proper approach to piecemeal replacement.
Blog root page
Previous post in category
Next post in category