VIEW Based Contracts for your RDBMS

We are building our application on a pretty old school standard stack. It consists of a webapplication on top of a RDBMS.

Yesterday I set out to fix a bug and after chasing the bug in the application layer for a long time, I ended up fixing it in the database, which was fairly easy, did not require any deployment appart from the changes to the database, the benefits of this are; it works right away and it is fairly easy to test.

This lead me to think it was about time I wrote a post about this architectural pattern, with which we have had much success.

Our RDBMS consists of your standard RDBMS data model on 3rd. normal form. On top of this we have encapsulated the database in views.

And that is pretty much it, but let me demonstrate with a basic example.

model

Here we have a basic database model consisting of two tables, one containing address data and the other other table with zipcode data.

CREATE TABLE address (
    street TEXT NOT NULL,
    no TEXT NOT NULL,
    floor TEXT NOTNULL,
    door TEXT NOT NULL,
    zipcode TEXT NOT NULL,
    country TEXT NOT NULL,
    FOREIGN KEY (zipcode) REFERENCES zipcode (zipcode)

);

CREATE TABLE zipcode (
    zipcode TEXT PRIMARY KEY,
    city TEXT NOT NULL
);

The example was implemented on a SQLite database for availability if you want try out the provided examples.

The basic concept is to use the data model directly, but only accessing using views. So we add our first view.

CREATE VIEW zipcode_v1 AS SELECT * FROM zipcode;

Do note the naming convention of keeping the name of the encapsulated table, appending a: “_v1”, more on this later.

zipcode_v1

This is one of the easy ones and currently it just seems silly and like a lot of work, and one of the concerns I have heard to this approach was:

“but our database will be full of objects”

Yes this will at least double the amount of objects in your database, but the benefits of this approach outweigh the maintenance. and the objects serve a purpose.

What could the use case be for this view. Well it could be used to populate a dropdown with zipcodes in web application or similar.

The next natural step to the one-to-one view implementation is the more complex datatypes and hence queries. So of you want to query complete addresses, including the city you would have to do something like:

SELECT
    a.street AS street,
    a.no AS no,
    a.floor AS floor,
    a.door AS door,
    a.country AS country,
    z.zipcode AS zipcode,
    z.city AS city
FROM zipcode z, address a
WHERE z.zipcode = a.zipcode;

What if we could just do:

SELECT
    street,
    no,
    floor,
    door,
    country,
    zipcode,
    city
FROM address_v1;

Well we can, just create a view to help us out.

CREATE VIEW fulladdress_v1 AS SELECT
    a.street AS street,
    a.no AS no,
    a.floor AS floor,
    a.door AS door,
    a.country AS country,
    z.zipcode AS zipcode,
    z.city AS city
FROM zipcode z, address a
WHERE z.zipcode = a.zipcode;

And our database now looks like this:

address_v1

And we are slowly assembling contracts with our RDBMS.

As you can read the naming of the views, the one-to-one implementations reuse the name of the encapsulated table, whereas the the views encapsulating more than one view are named by intent or purpose indicating what they aim to serve.

But is that fast enough?

In a modern RDBMS I would expect the performance hit to be insignificant, I do not however not have the numbers to back this up, perhaps another blog post should shed some light on this. Do note that this is an architectural approach, which hold other benefits and it is not aimed at high performance as such, but maintainability and simplicity.

So now we have full encapsulation of our model.

  1. We can change the model as long as the contract is kept intact

Next up is a somewhat awful and bad example, but imagine that somebody wants to change the model. We do not want to sound too american, so zipcode has to be exchanged for postal code.

PRAGMA foreign_keys=off;
BEGIN TRANSACTION;

DROP VIEW fulladdress_v1;

ALTER TABLE address RENAME TO _address_old;

CREATE TABLE address (
    street TEXT NOT NULL,
    no TEXT NOT NULL,
    floor TEXT NOTNULL,
    door TEXT NOT NULL,
    postal_code TEXT NOT NULL,
    country TEXT NOT NULL,
    FOREIGN KEY (postal_code) REFERENCES zipcode (zipcode)
);

INSERT INTO address (street, no, floor, door, postal_code, country)
    SELECT (street, no, floor, door, zipcode, country)
    FROM _address_old;

CREATE VIEW fulladdress_v1 AS SELECT
    a.street AS street,
    a.no AS no,
    a.floor AS floor,
    a.door AS door,
    a.country AS country,
    z.zipcode AS zipcode,
    z.city AS city
FROM zipcode z, address a
WHERE z.zipcode = a.postal_code;

COMMIT;

PRAGMA foreign_keys=on;

Renaming a column in SQLite is bit cumbersome, please bear with me, selecting another implementation than SQLite would have made the above example shorter.

  1. First we drop the view
  2. We rename the old table
  3. We create the new table with the required column name change
  4. We copy the data from the old table to the new table
  5. We re-create the view encapsulating the change and keeping our contract

address_v1-2

If we were to follow through with the renaming, the zipcode table involved via the foreign key would have to be renamed aswell I have not included this is the example, but the pattern is the same and the benefit is the same, it can be renamed, but the encapsulation keeps the contract intact and our applications using the database will not have to be changed.

This can be quite useful in another use case, considering you have an existing model, which you want to expose to some domain specific area. You then can keep your original model and expose the data in views, where data are presented following the naming of the domain specific area.

All in all everything looks honky-dory. But there are some pitfalls, with the whole View Based Contract approach, to name the most prominent ones:

  1. Naming
  2. Transparency
  3. Maintenance
  4. Information leak

So let us go over these.

Naming is hard, for the one-to-one views you are at the mercy of your model, which is okay, but if you already have bad naming in your model this will be reflected in the naming of your views, so one could decide for eliminating bad naming in the encapsulation layer, which brings us to transparency.

For transparency it is recommended to somewhat keep the names from the original model, since the data model, will often be embedded in the users of your database. Do note we implemented the views on top of an existing data model, so people often resorted to relating to the actual model and not the abstraction/encapsulation – it would be nice if we could stick to the abstractions instead of the implementation for some discussions 🙂

Naming for intention is harder, but resembles a proper abstraction more than the one-to-one mapping. We started using the views for our services to begin with, it did however propagate into our batch components, where it proved quite useful.

We would observe the batch components becomming slimmer, because the decision logic was moved into the database contracts. A script for deleting records would simple just work on a view, where all the records qualified for deletion would be available for processing and the records not qualified for deletion would never be presented to the executing application by the view.

When it comes to maintenance and life-cycle, we delete views, when these get obsolete. This is especially for intent based views and we can see that we have several revisions: “_v1”, “_v2” and “_v3”. When we can see that no applications use “_v1” anymore we simply delete it.

The other example of deletion is when a view implements business rules, which no longer apply and hence should not be available.

As described in the beginning of the article, we could do views like the following:

CREATE VIEW zipcode_v1 AS SELECT * FROM zipcode;

Do note that this approach opens for extensions to the model, being exposed via the contract, if you do not want to have this automatical exposure, your views should be restricted in their implementation only offering the fields you have agreed to in your contract.

The bug I mentioned in the beginning of the article was somewhat related to this sort of information leak, a field, was not propably handled by the encapsulation and hence exposed.

A brief example could be, if our database was extended with information on who inserted an actual record, a created by field so to speak, so instead of doing:

CREATE VIEW zipcode_v1 AS SELECT * FROM zipcode;

We should do:

CREATE VIEW public_zipcode_v1 AS SELECT zipcode, city FROM zipcode;

And you could have a similar view for your internal application defined as follows:

CREATE VIEW public_zipcode_v1 AS SELECT zipcode, city FROM zipcode;

So you now have two views named, with intention and not leaking information beyond our contract. There is of course still the issue of object referencing, since our views do as such not restrict access across object ownership/relations, that is a topic for another blog post, but the approach does expose only the data you are interested in serving via your application and not necessarily your full model and dataset.

But we are using an ORM?

Well point your ORM schema/code generator at the views instead of the actual model. Well this also has some pitfalls. Since not all RDBMS support writable views, so if you are a heavy ORM user with a database that does not support writable views, you might not have much luck with this contract approach. A combination with stored procedures or similar could be the way to go, which reminds me, that I have to mention Haktan Bulut my former manager who introduced me to this approach in an architectural design specification.

The concept is pretty simple, but it seems like a lot of work, I have however come to the conclusion that it is saving us quite a lot of work when it is established and as long as our contracts are sane our applications can be trimmed down:

  • It is easier to understand the interaction between the components using a contractual approach
  • We not expose unnecessary data to our applications

It requires time to encapsulate everything and it takes some effort to maintain, but putting changes to data exposure under control is a good thing in my book, since we always have to think about what we do when we extend or restrict the contracts and last but not least, we can optimize the model without breaking our applications.

Advertisements
VIEW Based Contracts for your RDBMS

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s