Sunday, December 30, 2007

Data Partitioning

Those who have been reading my newsletters and white papers know I believe that complexity in IT systems and business processes is best thought of as the amount of disorder in the system in question.

The best way to control this disorder is to partition the system into a small number of subsets. Mathematically, partitioning can be shown to have a huge impact on the amount of disorder in a system, and therefore, its complexity. For those interested in more information on this topic, see my briefing papers on SIP (Simple Iterative Partitions) at the ObjectWatch web site.

Recently, one of my readers sent me an email asking about my philosophy about data sharing. He asked
I understand the general philosophy [of partitioning] but am not sure how that would work at the implementation level for data(say database). Does your solution mean that HR database tables can never participate in queries with say planning module database tables?
In general, data sharing can be represented by the following diagram:

It should be pretty obvious from this diagram that data sharing severly breaks down the boundaries that logically separate applications. If application A decides to change how it stores data, application B will be broken.

Anytime we destabilize the boundaries between two applications, we add tremendously to the overall disorder of the system. Just a little bit of destabilization adds a huge amount to the disorder.

Some organizations attempt to deal with this by creating a common data access layer, but this doesn't significantly change the problem. If the implementation of A changes so that it no longer uses the common data access layer, application B is broken.

In order to ensure strong application boundaries (and therefore minimal system complexity), we take all possible steps to minimize dependencies between our applications. The best way to accomplish this is to insist that all interactions between applications go through intermediaries.

If application A wants to ask application B for some information, it makes the request through an intermediary that wraps A. That intermediary makes the request to an intermediary that wraps B. The intermediary on the requesting side is called an "envoy" and the intermediary on the receiving side is called a "guard".

How the envoy and guard send and receive requests is not the problem of either application, but the most common way of doing so is through the use of messages that are defined by the general family of standards known as web services.

Notice how this protects the two applications from implementation changes on the other side. If the implementation of A changes, it is likely that A's envoy might also need to change. But nothing on B's side needs to change.

In this architecture, we have strong boundaries between the two applications, and these boundaries serve to control the over all disorder of the system .

But still, sometimes application A needs access to application B's data. What do we do then?

There are two possibilities. The first is that application A can ask application B for the data it needs (using, of course, intermediaries). The second is that a third application can be created that manages the shared knowledge of the organization. Let's call this application the KnowledgeManager. When any application makes changes to data that is considered part of this shared knowledge, that application is responsible for letting the KnowlegeManager know about those changes (using intermediaries, of course).

Some may argue that this architecture is inefficient. Why not just go directly to the database for what you want? The answer has to do with our ultimate goals. I believe that our most important goal is controlling IT complexity. Controlling IT complexity is far more important that a few milliseconds of performance.

Complexity is expensive. Disk access is cheap.

Friday, December 28, 2007

Cures for Complexity Lacking

A recent IT Business Canada discussed four approaches used by CIOs to reduce architectural complexity. According to the article, these approaches are:
  • Business Process-Driven Architecture
  • Good Governance
  • Defaulting to Simplicity
  • Continuous Improvement

These ideas all sound good and they might be. But we can't tell what, if anything, they have to do with reducing complexity because the article never defines complexity. There is no model, either explicit or implicit for what complexity looks like in an enterprise architecture, how we might measure it, or how we would know we had eliminated it. How can we eliminate something unless we know what that something is?

Imagine this dialogue:

Sam: My house has way too much wazine.

Dan: What's wazine?

Sam: Wazine is wazine. Everybody knows what wazine is.

Dan: How much wazine is in your house?

Sam: I have no idea.

Dan: How much is a good amount?

Sam: Beats me.

Dan: How do you measure wazine?

Sam: Don't know.

Dan: If you don't know what wazine is and you don't know how much of it you want and you don't know how to measure it, how are you going to get rid of it?

Sam: Simple. I'm going to exercise more, eat a better diet, call my mother twice a week, and balance my checkbook every month.

Dan: That all sounds great. What does that have to do with wazine?

Sam: What's wazine?

Silly, isn't it? And using a word that sounds impressive, like "complexity", doesn't make the conversation any less silly.

Now don't get me wrong. I am all for controlling complexity in enterprise architectures and IT systems. I just believe that you can't eliminate something until you understand what that something is.