Using the phrase ‘subject matter’ to describe cohesion is convenient but the term is one of those that seem to be meaningful but when one drills down into the details it turns out to mean very different things to different people. Unfortunately I do not have a precise definition so all I can do is provide some guidelines that have been useful in the past.
The simplest guideline would be to regard subsystems as analysis classes with rather broad scope. The subject matter of an analysis class is a unique, self-contained, readily identifiable, indivisible entity or concept in the problem domain. The same view applies to subsystems except that they have broader scope and can be further subdivided.
I believe that the most important characteristic of a subject matter is that it represents something in the problem space. When one shows a Package Diagram to a domain expert that expert should have an immediate, high level, intuitive understanding of what each subsystem represents within the context of the problem being solved by the application. (Note that for a complex application there may be multiple domain experts, each providing expertise for only a portion of the application.)
I believe this is important because the gross structure of an application should parallel structure in the problem domain. The reason is that change is inevitable and customers don’t like it any more than software developers. When changes occur in the problem space customers will instinctively adapt to those changes along a path the represents minimum disruption to their existing structures. If the overall application structure parallels the customer’s problem space structures, then the software disruption should be minimized as well.
The second most important characteristic of subject matter is cohesiveness. A subsystem encapsulates a particular collection of knowledge and behavior. The elements of that collection should be intimately related within the context of the problem being solved. In general I would prefer to err on the side of identifying too many subsystems rather than too few when trying to identify the boundaries of cohesiveness. The more narrowly that subject matter is defined, the easier it becomes to provide generic interfaces that will be invariant as the details of the implementation change. As a fairly general rule it is easier to combine trivial subsystems than it is to subdivide complex ones once there is substantial design investment.
To enable good encapsulation the boundaries of the subject matter should be well defined. When you ask the domain expert if a particular responsibility belongs to the subject matter the ideal response should be a quizzical look and, "Are you nuts? No way, Huck! That clearly goes over in XYZ." Of course real life is rarely that clear cut but, like zero defects, it is the direction in which one should be running. In the introductory post I mentioned that half a day for identifying subsystems was nowhere near enough. The time one spends is not so much around identifying subsystems as deciding exactly what the subject matter includes. That definition is essential to defining precise interfaces.
The notions of indivisibility, cohesion, and levels of abstraction are closely related and they depend a great deal on context. For example, at some level of abstraction the concept of a ‘software test’ is indivisible and cohesive. At another, lower level of abstraction concepts like ‘unit test’ and ‘system test’ might each be indivisible and cohesive though they are both software tests. In some contexts ‘unit test’ and ‘system test’ might be specializations of ‘software test’, where certain knowledge or behavior is shared between ‘unit test’ and ‘system test’. In others ‘unit test’ and ‘system test’ might be quite independent but it is convenient to gather them into a ‘software test’ subsystem that presents a common interface to the rest of the application. Whether there is even a need to model ‘software test’ or ‘unit test’ and ‘system test’ as subsystems will depend upon the context of the problem being solved.
Indivisibility, cohesion and levels of abstraction clearly interact in complex ways within a given problem context. Understanding such interactions is important to the overall structure of the application because they provide a basis for partitioning along modular lines. The Package Diagram and related descriptions is an ideal means for documenting that understanding.
As an example, I have seen ATM controller software models in textbooks that had Account classes in them. This has some serious practical implications for the implementation. If both the ATM software and the bank accounting software share knowledge of an Account class, then instances of Account will have to be shared across the network port. This necessarily introduces substantial overhead and complexity to the network connection because one needs something like CORBA or DCOM to handle the object references rather than a simple data transaction interface.
Perhaps more importantly, having the ATM software updating Account instances introduces a whole new suite of data and referential integrity issues. Some of those issues will probably have to be addressed in the ATM software itself. In effect the ATM software will have to understand something about the persistence mechanism of the bank’s accounting software. All these potential maintenance headaches and complexity problems are directly attributable to a high degree of coupling through sharing an implementation abstraction.
The problem here is that at least three different subject matters are evident when one looks closely. The bank itself deals with banking concerns where entities like Account and Customer were paramount. When it processes a message from the ATM Controller, it infers very specific semantics for the information.
The communications were handled by networking software whose subject matter was simply passing messages without the slightest interest in their content. The netwroking software would be deeply concerned with the relevant low-level communication protocols but really doesn't care about the semantics of the data transported. Similarly, the ATM Controler software’s subject matter is concerned with communicating with a customer through a hardware interface (the ATM machine). Those communications were limited to moving data between the ATM hardware and a network port. Unlike the network software it does care slightly about content (e.g., message types and numbers vs. strings) but it needs none of the specific semantic knowledge of what the data represented that the bank’s account software required.
So basically we have (at least!) three distinct subject matters that participate in a customer transaction: accounting, networking, and hardware control. Each has a unique view of the world, a unique mission in life, and -- most important of all -- a very limited set of concerns. By separating those views, missions, and concerns we are able to provide better focus in the software that makes it simpler to implement each subject matter. [That focus also helps to identify invariants, which has a whole other suite of benefits. But that's a story for another blob category...]
Another interesting example of subject matters is related to GUI based applications. An unfortunately common design mistake is to treat the message loop for GUI messages as the central driver for the application (i.e., as a ‘spider web’ subsystem with dependencies radiating out to most of the other subsystems in the application). This causes the software structure to be built up around the GUI paradigm rather than the solution to the user’s problem. It also tends to result in things like window handles being blithely passed throughout the application, making it a nightmare to port.
Recognizing that providing communications with the user is simply a distinct subject matter easily averts this sort of mistake. The GUI hgas a limited mission, talking to the user, and a very well defined paradigm, Window/Control. Identifying the GUI as a separate subject matter allows it to be encapsulated. The information and requests the user provides and the responses of the application merely define a generic interface to a subsystem that happens to contain the message loop. In fact, to solve the user’s problem the application should not care whether the user communicates via a GUI, a command line, or by optical recognition of smoke signals. An obvious example of the benefit of this encapsulation is that porting can be achieved by simply replacing a single subsystem.
More importantly, identifying the GUI as a separate subject matter allows it to be viewed properly as a service to the rest of the application whose requirements are defined by the application’s needs for solving the user’s problem. This allows the application’s main structure to reflect that of the problem in hand rather than the computing space (i.e., display mechanisms).