« 1.0 MDA overview | Main | Introduction to the weblog »

October 03, 2004

Association Classes

Blog root page
Previous post in category

In UML one can qualify a relationship with a special class, the Association Class. Association classes are required for associations with *:* multiplicity and they can be used for any association, even those that are 1:1. They are required for *:* relationships because in the relational data model there is no way to associate multiple referential keys in one association participants. For one thing, the number of participants in each instance of the relationship may vary. In the example below, each Class may have a different number of Students and each Student may register for a different number of classes. That gets messy to define statically in terms of attributes.

Thus the association class is a placeholder for a more complex entity that keeps track of the participants. In an RDB that "entity" would be implemented as one or more table indices. However, in OO development we usually implement association classes from the OOA as multiple relationships in OOD and/or OOP (a process known as reification). This has significant implications for the way one constructs software in an OO fashion compared to a procedural/relational fashion that I will get to in a moment. But first some basics.

The most common reification of a *:* relationship is from:

* | *
[Class] -------------------------- [Student]
* 1 1 *
[Class] --------------- [Registration] ------------------ [Student]

In this case the *:* association has been replaced with two 1:* relationships. There is an instance of [Registration] for each [Class] member and each [Student] member. Each such instance of [Registration] is essentially a collection of handles to the other participants in the association. There are other alternatives for reification, such as:

1 * * 1 [Class] --------------- [Registration] ------------------ [Student]

In this case there is an instance of [Registration] for each combination of Class and Student. This alternative, though, is uncommon because it is relatively expensive in terms of space and performance (heap operations). In addition, the first alternative can usually be implemented using a vanilla template library collection class for [Registration].

Note that in the example [Registration] has some problem space connotation. One should always seek that out when naming Association classes. It should be very rare that one has to resort to something artificial like ClassStudentAssociation. That's because *:* relationships are very rare in most customer problem spaces. The reason is that customers don't want to deal with that level of monolithic complexity so they will have very likely already reified the *:* relationship in some fashion to simplify it into some form that is easier to manage. So all you usually have to do is look at the problem space a bit more carefully to see how they did that. Quite often one uncovers a problem space entity that has additional semantics relevant to the problem besides simply keeping track of participants.

Association classes qualify the association itself when there are complex rules and policies that govern participation. Whenever the participation in an association is complicated, regardless of actual multiplicity, one should look for an Association class in the problem space. This is just an extension of the point above. Customers try to keep things simple, so they may well have already provided some mechanism for dealing with the complexity of the participation. Quite often that simplification can be represented in terms of an Association Class even when the association is not *:*. For example,

1 done with * [MRPAllocation] ------------------ [Delivery] +quantity +quantity | | 1 | used in * [FrameBolt] -------------------------------- [Assembly] +quantity

In this case the *:* association captures the notion that FrameBolts are used in lots of Assemblies on an assembly line. However, things may not be quite that simple. Typically things like frame bolts are not identified individually. In addition, the frame bolts may be delivered to the assembly line several times on an as-needed basis instead of all at once (e.g., as each subassembly is incorporated). The example above neatly deals with these issues through abstracting FrameBolt as a type of inventory item and employing the concept of an MRP allocation as an association class. (Note that quantity has different semantics in each class: quantity in inventory, quantity needed in a single Assembly, and quantity in a particular delivery.)

The first difference between the OO view and the RDB view of the relational data model is that in the OO view the association class is a bona fide first class object that is a peer to other objects abstracted from the problem space. In the RDB view the associated index (or embedded join) is a quite different thing than the tables that it coordinates. That indirectly leads to far more important differences in the way relationships are implemented, instantiated, and navigated.

Note that in the OO view (using the first reification) when one navigates from a particular Class to its registered Students, there is exactly one Registration object and it collects only the Students relevant to that class. Therefore if some client of Class is interested in the Students registered to that Class, they are available in the collection without any selection. Contrast that with the RDB approach where the index is against the entire [Student] table. To get the Students for a particular Class the RDB index infrastructure has to do a search of some type to select the correct Students from the entire table. Thus the OO view is more efficient for collaboration (navigating relationships for message sending).

The price one pays for the OO efficiency is a more complex process for instantiation. When a Student registers for a class, one must update two [Registration] collection instances: the Class must be added to the Student's Registration while the Student must be added to the Class' Registration. While basically the same thing happens for the RDB index, the difference is that the synchronization is hidden by the RDB engine while the synchronization is an explicit developer concern in the OO case. That's because we made the [Registration] a first class object with two explicit 1:* relationships.

This, in turn, has more profound implications for construction that apply to all associations. (I waited until this topic to bring it up because the basis is most clear when talking about association classes.) In OOA/D development we emphasize separation of concerns and encapsulation. That has particular relevance to association classes because we tend to separate the concerns of instantiation from those of collaboration. Thus we encapsulate the rules and policies for instantiating both objects and relationships, often in the form of a dedicated "factory" class as in the GoF design patterns*. One reason is that this provides better control over referential integrity because object and relationship instantiation are often inextricably linked. Thus those concerns are separated from the dynamic solution concerns of collaboration (what messages are sent, when they are sent, who sends them, and who consumes them).

One way this is manifested lies in the role of ordered sets. Sorting is rather common in the RDB world, either directly or through adding indices to a table, but it is relatively rare in OO applications. That's because in OO applications the collections are constructed to contain only the participants we are interested in. A corollary is that very often the order in which we add participants to the collection is sufficient and that comes "for free" because the underlying data structures we implement in OOP preserve that order.

Another way that is manifested lies in the way collaborations are constructed. In the RDB world, which deals only with static data, the notion of 'join' is ubiquitous. However, the closest one comes to that idea in OOA/D lies in the set operations commonly supported by abstract action languages. However, the OO set operations are done on one set (table) at a time. One reason is the separation of concerns for instantiation from navigation above. Another is that OO applications are primarily concerned with behavioral collaborations. Dynamic collaboration is more conveniently expressed in terms of ad hoc, peer-to-peer, as-needed message passing rather than static (predetermined) data relationships.

Since the OO paradigm is primarily concerned with behavior collaboration, that mode is used for both behavior and knowledge collaboration. The justification is that employing the same peer-to-peer paradigm for collaboration for both behavior and knowledge fosters consistency and simplifies the approach. So if one needs knowledge from an object, one sends it a message, navigating relationships in exactly the same way that one would for a behavior collaboration. If one needs data from two different objects, one navigates to and sends them a message individually rather than invoking a join between them. So, for all practical purposes, the concept of RDB 'join' does not exist in OOA/D.

* GoF = Gang of Four; Gamma, Helm, Johnson, and Vlissides who wrote the book "Design Patterns".

Blog root page
Previous post in category

Posted by HS in Relationships | Permalink