Blog root page
previous post in category
next post in category
As I indicated in the previous post, the "mission" of a DB Access subsystem subsystem is to map the data needs of a specific problem solution into the mechanisms provided by a particular persistence environment. In this post I will discuss the nature of the DB Access subsystem in more detail. In other words, I will describe what sorts of abstractions populate the DB Access subsystem.
The first situation I will deal with is the trivial one where the mapping of the problem solution is very close to 1:1 between the problem solution view and the database view. This is fairly common in relatively simple applications that use an RDB as the persistence vehicle. That's because the schemas in the database and the Class Diagram on the solution side both model the same problem space and both explicitly employ the relational data model for consistency.
In such situations there may be no special implementation for the DB Access subsystem. That is, there may be nothing to implement. The reason is that if one communicates with the database engine at a high level of abstraction, such as SQL strings, the mapping becomes very simplistic -- essentially just the formation of individual SQL strings to form queries and the parsing of returned datasets. That sort of thing can be handled in the implementation of the Facade design pattern class that acts as the subsystem interface to the rest of the application. Typically that just requires some look-up tables to relate SQL name strings to elements of the message data packet (forming queries) or to relate dataset values to message data packet elements (decoding datasets). However, when the number of tables accessed and the variety of queries increases, this sort of approach tends to get complicated (e.g., lots of look-up tables) and the Facade class grows very quickly. In the long-term this is not very desirable from a maintainability viewpoint because the Facade class itself essentially becomes a "god" class.
When the mapping is close to 1:1 but the using the Facade approach would get messy, then next approach is to take advantage of "canned" layered model infrastructures. One then implements the DB Access subsystem formally in terms of such infrastructures (e.g., DAO objects). The semantics of the abstractions that implement the DB Access subsystem are the "data" entities associated with the lower layers of CRUD/USER layered models. Now the DB Access interface simply dispatches to the relevant "data" entities that, in turn, incorporate the layered model Data Layer infrastructure. One way to think of this is in terms of the traditional CRUD/USER layered model:
--------------------- Presentation Layer --------------------- Business Layer --------------------- DB Access - - - - - - - - - - - Data Layer --------------------- Database Layer
One implements the DB Access subsystem itself as if it were a traditional Data Layer, using whatever "canned" infrastructures one has available. However, the Business Layer objects do not talk directly to the Data Layer. Instead they talk to the more generic ("Here's a pile of data I call X. Store it.") interface provided by DB Access. That interface, in turn, dispatches to the traditional Data Layer interface.
Why bother with this level of indirection? Why not just let the Business Layer objects talk to the Data Layer as they would in CRUD/USER processing? One reason is that in an application solving a problem that is more complicated that CRUD/USER processing, there may be multiple subsystems in the Business Layer that need to be reusable themselves -- perhaps in context where a different database is accessed. Another reason is consistency. The Business Layer objects may also need to talk to the user or hardware and it would be nice if the interfaces for doing that were very similar. Yet another reason is that in complex applications the UI is conceptually not a Presentation Layer; it is a low level service subsystem that is a peer of the DB Access subsystem. So the "pass-through" mechanisms that the "canned" infrastructures provide for the Business Layer are not relevant.
But the most important reason lies is decoupling. No matter how abstract the Data Layer interface may be (e.g., SQL), it still reflects a particular implementation of persistence. (As an obvious example, if one switches to an OODB, all that SQL will be irrelevant.) That isn't a big issue for CRUD/USER processing because the Business Layer is just a pipeline that doesn't have a unique behavioral semantics. However, in a large, complex application it is an invitation to disaster for the Business Layer to know anything at all about the details of persistence. While the mapping may be straight forward today, there is no guarantee that it will be in the future when the business behavior requirements change. One does not want to be in the position of having the DB view drive the way changes are implemented in the Business Layer. In addition, the sheer size of the Business Layer logic makes the risk of changing persistence paradigms significant even though it is unlikely. Finally, one may have to introduce access optimization for larger applications and one doesn't want the Business logic to be affected by that (i.e., one doesn't those optimizations intertwined with the Business logic). So once one has more to do than CRUD/USER processing, one can use the "canned" infrastructures to enhance productivity for the Data Layer but one needs to keep those infrastructures encapsulated away from the Business Layer processing.
The last situation I will discuss is the most likely for large, complex applications: the problem solution view differs substantially from the database view. As I indicated in the previous post, this occurs because the problem solution is organized around behavior collaboration while the database view is organized around static data relationships. In addition, the entire problem solution may be at a different level of abstraction than the database. As a trivial example, consider a telephone number. In most databases a telephone number is treated as a simple domain (i.e., a single-valued attribute). However, in some problem contexts one is interested in the unique semantic elements like country code, area code, and exchange. So these would appear as separate attributes of some class like Customer. In an RDB that would be a violation of Third Normal Form because those elements all depend upon Telephone Number rather the Customer identity (i.e., the same number could be reassigned by the telephone company to a different Customer). However, in a particular application the assumption that the number was associated with a particular Customer identity might be quite valid in the execution context. As a practical matter the object view in a large, complex application will invariably be different than the database view and it will usually vary in ways far more significant than telephone numbers.
In this situation one has the general problem of mapping knowledge contained in a suite of object abstractions on the application side into tables and tuples on the database side (assuming an RDB persistence implementation). One can further generalize the problem by noting that the interfaces we use for subsystems are pure message interfaces that consist of a tuple of {message ID, data packet} where the "message ID" determines a predefined mapping of the "data packet" for both sender and receiver. Now the problem for the DB Access subsystem is to map a
To solve that problem the obvious abstractions are those the database itself: Schema, Table, Tuple, Field, etc. Now the semantics of the problem (Customer, Address, Account, Balance, etc.) are largely irrelevant. Customer and Account simply identify specific instances of Table while Address and Balance identify specific instances of Field that are associated with a particular Table instance. And the message data packet values are simply values of Field that are related to a particular Tuple instance. Thus the Class Diagram for the DB Access looks something like:
1 contains * [Table] --------------- [Field] | 1 | 1 | | defined by | contains | | 0..* | 0..* [Tuple] --------------- [Value] 1 stores *
This model is obviously an oversimplification but it conveys the general idea pretty well and it is surprisingly close to the mark. (The abstractions needed to support instantiation -- factory and specification classes -- may well introduce more logical complexity than modeling the relational data model itself.) The key idea here is that the classes needed to abstract the RDB paradigm are quite abstract and quite independent of the problem space semantics.
The actual mapping of the problem space to these abstractions will be the responsibility of other objects that instantiate and initialize these objects with appropriate relationships when an interface message is processed. [Hint: Table and Field are related to the schema and their instances are usually fixed throughout the execution. They provide a skeleton of definition for the actual data. Meanwhile Tuple and Value represent actual data so they will come into existence on an as-needed basis as messages are processed.] Object and relationship instantiation is easily accomplished through parametric polymorphism, as discussed under that post category. Thus most of the mapping intelligence will lie in that infrastructure of factories and specification objects.
Blog root page
previous post in category
next post in category