« Introduction to the weblog | Main

October 16, 2004

4.0 OO vs. Relational Approaches

Blog root page
previous post in category

There are several ways that the OO and relational paradigms differ and those differences profoundly affect the way applications are constructed. The main differences lie in subclassing, identity, and the role of relationships, particularly navigating relationships.

The most obvious difference lies in the way OO subclassing is instantiated. In traditional Data Modeling using Entity Relationship Diagrams each class in a subclassing tree can be instantiated independently (e.g., as a separate table in an RDB). These tables are tied together via special "parent/child" relationships that have particular rules about how the tuples contained in the tables are identified. In contrast, in the OO paradigm the superclasses cannot be instantiated separately. Like Highlanders*, there can be only one. Each object (tuple) created instantiates the union of properties in all classes in a direct line of ascent from leaf subclass to root superclass. In other words, an OO object is really a single point on a Venn Diagram representing a member of the root superclass whose properties are resolved by determining the relevant nested subclasses to which it belongs.

This has a more subtle implication in that one does not think of superclass and subclass as parent and child when doing OO development. Instead one thinks in terms of generalization (superclass) and specialization (subclass) of properties. That is, the focus is very much on set membership and property resolution. (OO inheritance is simply a set of rules for systematically resolving the properties of the entity in hand.) Though both OO and relational approaches end up at the same place -- a convenient organization of groups of properties -- the way one thinks about using a subclassing tree is quite different. That's because in OO there is an interface associated with each class and subclasses can provide different implementations of the same superclass behavior. That allows polymorphic substitution when accessing a superclass property. In OO development one introduces subclassing quite commonly to support inclusion polymorphism. For example, all of the design patterns in Design Patterns by Gamma et al are use subclassing in this manner.

The treatment of identity is also very different. In the relational paradigm tuple identity is always explicit in the form of embedded attribute values. This approach is ideally suited to providing generic access to data when one cannot predict how the data will actually be used. However, OO applications are abstracted from the problem space in a manner that is tailored to a particular problem in hand. Thus there is always a specific context in mind when creating OO abstractions. In addition, software solutions execute in a very unique environment where data already has a unique identity based upon it location in memory. In fact, using memory addresses for identity is the most efficient mechanism for processing data in most computing situations while searches based upon embedded attribute values tends to be comparatively inefficient.

The OO paradigm carries this notion of implicit identity based on location to include things like array indices and relative position in a data aggregate. This has a major impact on the way software is constructed compared to relational development. There is obvious impact at the 3GL level where one employs things like pointers, arrays, and buffer offsets explicitly. But this view also extends back into OOA/D as well. In particular, it is manifested in the way one deals with relationships in OOA/D.

In relational development relationships are tied to ordering of entire tables by identity attributes through an index. That's because, when embedded identity keys are involved, it is much more efficient and generic to update an ordered index once when a tuple is added to a table than to provide ordering as the tuple is accessed. However, that means that all access to tuples is done through an index defined for the entire n-ary relation so all accessing operations, however efficient the index may be, involve a search of all tuples in the relation.

In OO development relationships are instantiated at the level of individual participating objects. Thus if one has:

1 *
[ClassA] ----------------------- [ClassB]

each object in [ClassA] will have its own collection of [ClassB] objects and that collection will include only those members of [ClassB] that are related to the specific [ClassA] object. Now there is no need to search the entire set of [ClassB] objects; every [ClassA] object will be related to exactly the right subset of [ClassB] objects. This difference is also seen in the handling of *:* relationships. In the relational paradigm one has a monolithic, two-way index as the association class. However, in OO development one always reifies *:* relationships into two 1:* relationships that relate to the association class, which is a first class object rather than an index.

[Caveat. If one has a 1:* relationship and seeks some subset of the participants based upon individual property values, one will still have to do a search for that match. However, the searched set is <usually> much smaller that the total members of the class and the developer has much finer control over optimization. (One of the prerogatives of OOP is to optimize that level of searching by choosing the optimal collection class for the relationship and that choice is made for the context of the individual relationship.) As I indicated in the category on relationships, business rules and policies for the specific problem in hand are being enforced during relationship instantiation and that localization for specific context is what reduces the search size in practice.]

The result is that the need for explicitly ordered relationships in OO applications tends to be rather rare. That's because the implied FIFO ordering for 1:* relationships as *-side participants are added is very often exactly the ordering one needs to accessing -- so long as the collection only includes the subset of entities that are relevant to the entity on the 1-side. Obviously this is not always true a priori, but it is certainly true in the vast majority of cases and the developer will often take pains to ensure the whoever instantiates the relationship adds the participants in the desired order whenever possible if ordering is necessary at all. Thus the way the developer instantiates relationships within the problem context is characteristic of OO software construction.

As I mentioned in the category on relationships, the notion of relationship navigation is central to OO software construction. Since all collaborations are peer-to-peer, it is quite common that one must traverse a path over multiple individual binary associations to get to the peer object. Superficially that navigation is quite similar to a relational join. (One an argue if is a relational join between two peer tables.) However, there are two important differences in the way OO relationship navigation works compared to relational joins. The first is that there are usually no implied searches for the reasons above since the relationships are instantiated at the object level and enforce particular business rules and policies. This tends to make navigation much more efficient than for a brute force relational approach.

More important, though, is that OO relationship navigation is done one peer at a time through aspect-like infrastructures supporting relationship navigation. For example, suppose an method requires attributes value from four objects that are members of different classes. In the relational paradigm one would form a single join that would produce a dataset with all of the desired values in a single operation. Since the join is likely to be repeated, one will probably save the join definition and, possibly, the join implementation itself. That is, the join itself is essentially a monolithic artifact that incorporates all of the rules and policies for accessing all of the data in all of the objects over all of the relevant relationships. In effect, there is no equivalent concept in OO development; in fact, one studiously avoids such structures by accessing peers one at a time.

So why does the OO paradigm insist upon the one-at-a-time approach? One reason is that it is more efficient on a context-by-context basis because the developer has more control over the way navigation is done. For example, the developer may know that one path will prune the number of instances earlier because of the nature of the participation in different relationships. A very important reason, though, is maintainability. Though the individual navigations may be more verbose (one can argue even that is not the case compared to the code needed to constructing a complex join), but the granularity allows a high degree of focus on each individual access. So when changes are required they are quite well isolated in the code. Perhaps more important the code required to navigate individual relationship paths is virtually identical because of the aspect-like nature of the way relationships are instantiated. If good naming conventions are employed the developer doesn't even need to look at the class definitions for the classes along the path when writing the navigation code.

The real reason, though, is that monolithic joins are not consistent with the OO paradigm because object collaborations are supposed to be on a one-on-one basis. In the world of RDBs there is no behavior execution and everything is about static data access. (Note that my example above of an OO "join" was limited to accessing knowledge attributes in different objects, not behavior.) However, OO software is primarily concerned with behavior collaboration. The developer has to ensure correct sequencing of messages that invoke behaviors within the overall solution context. Therefore, one-message-at-a-time is an essential part of dealing with that sequencing in a systematic way. [This is especially true when design-by-contract is employed to match receiver preconditions to sender postcondition to determine where to generate messages.] It is not at all desirable to clump together issuing multiple behavior messages; one wants to think about those messages one at a time. Because accessing knowledge is also <conceptually> done with messages, the OO paradigm seeks consistency between knowledge access and behavior access. Just as one invokes getter/setter methods to access knowledge properties syntactically exactly the same way as accessing behavior properties at the OOP level, so does one treat navigation of relationships exactly the same way for both knowledge and behavior at the OOA/D level.


*Duncan MacCloud was the sword & sorcery protagonist in a fantasy TV series and a set of movies entitled Highlander from the ca. 1990. The premise was that the Highlander was one of several immortals who needed to kill each other off except for one, who would then rule the world. The catch was they could only be killed via beheading, which was an excuse for a lot of sword fights. Just before lopping off the head of an antagonist the Immortal would shout, "There can be only One!"

Blog root page
previous post in category

Posted by HS in Persistence in OO applications | Permalink