Blog root page
previous post in category
next post in category
The implementation of the DB Access subsystem abstracts the database view of the data and converts back and forth between that view and the view in the application solution. The interface to that subsystem is defined in terms of the problem solution's needs for persisting knowledge. However, for the DB Access subsystem to be reusable across applications, the interface also has to be syntactically generic. That means that the application's semantics must be represented in an abstract manner in the subsystem interface.
SQL is an example of such semantic abstraction to provide a generic syntax. One can express any valid read and update queries against any arbitrary RDB using SQL because the identity of tables, tuples, and fields has been abstracted through name strings and other conventions. All of the local data semantics is expressed in terms of names, which allows the SQL syntax to be quite generic. Unfortunately SQL abstracts the RDB view of the data so it is not appropriate for most applications outside the province of CRUD/USER processing. In designing an interface for the DB Access subsystem we need to provide basically the same sort of abstraction as SQL but at a higher level of abstraction that is more suitable for problem solution views.
We start by employing a pure message-based data transfer interface where each message consists of the tuple {message ID, data packet}. The message ID is used for high level dispatch in the receiver subsystem. (Typically the message ID corresponds to an an interface method.) The data packet is a collection of by-value data elements. The data packet may have embedded identity elements that can be used for decoding the data packet. Commonly the data packet is, itself, an object that may have associated methods for extracting the data. For example, if the data is embedded in an XML string, then the data packet object may have methods for parsing the string. (Note the similarity with DAO objects commonly used in CRUD/USER layered model interfaces.)
Generally the granularity of the interface (i.e., the number of different message IDs) will be determined by the application (or group of applications) using the persistence mechanism. This can be fairly arbitrary but there is one overriding constraint: whatever granularity is used must be unambiguous in terms of the relational data model. Basically all this means is that one must be able to construct valid queries for whatever persistence mechanism mechanism is employed. While this is usually trivial to ensure, it does theoretically require some negotiation between the implementors of the two domains.
[For example, it is fairly common for the same data attributes to show up in objects in different subsystems in a large application. That's because quite different behaviors, partitioned in the subsystems, can operate on the same data. That redundancy is usually a no-no in an RDB so a 1:* relationship exists between the RDB value and the application values. That could lead to two different messages from different subsystems accessing the same RDB value, which could lead to synchronization problems. As it happens, the problem solution developers face the same problems in this situation that the DBA faces when an RDB is deliberately denormalized. In solving those problems (e.g., modifying the data in only one subsystem and synchronizing that update with the other subsystems) the solution developer will usually also remove any ambiguities in defining the DB Access interface.]
Fortunately we have greater variety in the way we can describe identity in a DB Access subsystem interface than one has in SQL. Identity can be expressed as name strings, index numbers, enumeration values, or even positionally in the data packet. However, when using the more exotic forms of identity one needs some mechanism for synchronizing the semantics of that identity on both sides of the interface. Since the the communicating subsystems on each side of the interface each have their own view of the message data packet, each needs a different mapping function for identity.
The data packet definition (at the level of value descriptions) is shared by both subsystems. But on each side of the interface the data packet must be encoded and/or decoded according to a mapping function that is unique to the semantic context of the subsystem On the problem solution side data values must be mapped into set of objects' attributes while on the DB Access side they must be mapped into datasets (tables' tuples). Those mapping functions are coordinated through a common identity for the data packet elements. Thus we actually need two different mappings of the same identity and data packet values. We also need an infrastructure in the interface to provide those mappings.+-----------------Identity --------------+ | | | | V V problem solution || Interface || DB ACCESS mapping function || || mapping function | || || | | || || | V || || V [Object] <--------------------> data packet <---------------------> [Tuple] decode encode|| ||encode decode || ||
Enter stage left, smiling: parametric polymorphism. We can define those mappings in terms of configuration data the same way the window-builder tools use a resource file to define a GUI. In effect we can express the mappings as a table look-up where the table is initialized from external configuration data (usually at startup). [This is where the ability to express identity via indices or position comes in handy.]
Note that this is actually very similar to the way the "canned" layered model infrastructures work for CRUD/USER pipeline applications. The mechanisms for bridging the boundary are somewhat different (e.g., parametric polymorphism vs. inheritance, messages vs. DAO Facades, etc.), but they both do fundamentally the same identity-based mappings. So why go through this trouble? The answer lies in the nature of the mapping. This approach is much more versatile because it can easily accommodate mappings that are not 1:1 across the boundary. More important, it provides much better decoupling of the persistence mechanisms from the problem solution. Suppose one decides to completely redo the database and ends up with a different set of names in the new version, even if the tables remain nearly the same. If the access had been hard-coded in embedded SQL statements, the entire application(s) would have to be revised. One can accommodate that here by simply providing a different configuration file for the identity mappings on the DB Access side of the interface; the code does not have to be touched -- on either side.
As a practical matter, one rarely needs a formal mapping on the problem solution side. Once the message IDs are defined, all one really needs is the data packet definitions, which will be defined already in the DB Access subsystem interface. One can then encode the messages directly on the problem solution side, much the same way one might encode embedded SQL statements. One can do a similar direct decoding in the input interface to the problem solution interface (remember, each subsystem has its own interface). So the external configuration data approach is used primarily on the DB Access side. That requires some extra infrastructure code in the DB Access subsystem to do the look-ups. But since the DB Access system itself is reusable, that is a one-time investment. All things considered, the benefits of decoupling and the ability to deal with 1:* or *:* mappings between RDB and problem solution views far outweighs the extra work once one is out of the CRUD/USER realm.
Blog root page
previous post in category
next post in category