Wednesday, April 15, 2009

Factors Driving System Complexity

IT Systems are failing at an alarming rate and the rate of failure is increasing. Approaches that we have used in the past to successfully deliver IT systems are no longer working.

Something has changed: the complexity of the systems. It should be no surprise to anybody in IT that the complexity of the systems we are being asked to build has dramatically increased. The only surprise is that we have not adapted the processes that we use to build IT systems to take into account this increase.

Processes that can drive simple IT systems development do not scale up as the complexity of the systems increases beyond a certain point, a point that we have long passed.

I do not advocate throwing out what we know about building simple systems. Approaches such as Agile Development are great, as long as the complexity of what we are building is manageable. The approach that I advocate is learning how to break large, complex systems that we do not know how to build into smaller, simpler systems that we do know how to build.

I call this process partitioning. It is important to remember that the goal is not just to create smaller systems, but smaller systems that are as simple as possible. The specific process that I advocate for doing this is called SIP, for Simple Iterative Partitions. SIP is designed from the start to generate systems that are both small and simple.

While the goal of partitioning is to create smaller simple systems, the process itself is neither small nor simple. Planning for simplicity paradoxically adds complexity to the planning process but pays huge dividends further on. The extra work involved in delivering complex systems is far greater than the extra work involved in planning for simplicity from the beginning.

Partitioning is a science, like medicine. It must be led by people who understand the nature of the disease and have been trained in how to manage it. The training is important because there are many things that can go wrong. When things do go wrong, complexity is inadequately removed, or, in some cases, made worse.

IT failures due to poor partitioning are epidemic in our industry. For example, there are many failed service-oriented architectures (SOAs). Behind almost all of these SOA failures is a partitioning failure.

I have been studying IT failures for a number of years now. I have concluded that partitioning failures are the primary cause of most IT failures. The bigger the failure, the more likely it is that incorrect partitioning is the root cause. The symptom of partitioning failure is always the same: the project seems swamped by complexity. If you have worked on a project whose complexity killed the project, then you have experienced firsthand the effects of partitioning failure.

I have seen so many partitioning failures that I have started to recognize underlying patterns or categories of partitioning failures. At this point, I have identified eleven categories, and the list is growing as I analyze more failures. The eleven categories are as follows:

  • Delayed partitioning: The partitioning did not begin early enough in the project life cycle. It should be mostly complete before the business architecture is begun.
  • Incomplete decomposition: The decompositional analysis of functions did not go far enough, resulting in too coarse a granularity of the functions being partitioned.
  • Excessive decomposition: The decompositional analysis of functions went too far, resulting in too fine a granularity of the functions being partitioned.
  • Bloated subsystems: Too many functions have been assigned to one or more subsystems.
  • Scant subsystems. Too few functions have been assigned to one or more subsystems.
  • Incorrect assignment: Multiple functions have been assigned to the same subsystem that do not belong together or have been assigned to different subsystems that do belong together.
  • Duplicated capabilities: Functions have been duplicated across different subsystems.
  • Unnecessary capabilities: Functions have been partitioned that should not have been included in the first place.
  • Technical partition mismatch: The technical partitioning does not match the business partitioning, a common problem in purchased systems and service-oriented architectures.
  • Inadequate depth: The technical partitioning does not extend far enough. In SOAs, this problem often manifests itself as multiple services sharing common data.
  • Boundary decay: The boundaries between subsystems in the partition are not strong enough, and functionality slips back and forth between subsystems.
As you can see, there are many opportunities for errors in complexity management. And complexity (and therefore IT failure) is highly sensitive to partitioning errors, much more so than to implementation errors. It is critical, therefore, to have a well defined process to guide you and check your results along the way.

(Thanks to Chris Bird for suggesting this topic with his innocent tweeted question: “What are contributors to complexity?”)


Vann said...

Perhaps we live in two different worlds as your "partition" theory mostly focuses on complex IT systems while I am struggling with "partition" of IT organization, far from your level of partition. Do you have any insight or experience facing the seemingly trivial IT organization partitioning? In short, in my company of a 7000 global consulting business, there is a simple "hardware" and "software" IT partition with no recognition of any roles and any kind of architects. I see architects in management view as someone who takes on tasks that boarders hardware and software. Is this something already addressed in your past life? My keyword has always been "integration" in my daily struggle. You seem to have passed that and you always say "partition". How interesting...

Roger Sessions said...

The principles of partitioning theory apply equally well to many types of complexity. Organizational complexity is an excellent example.

Now keep in mind that when I say a system is partitioned into subsystems, I don't mean that those subsystems live in isolation of each other. If all we cared about was isolation, than we could use any random process for generating the partition. But we also care about the communications (integration) between the subsystems.

In order to achieve the optimal balance between integration and isolation, we drive the partitioning with a synergy analysis. In a synergy analysis, we ask the question which "atoms" (i.e. business functions, in an IT systems, or business units in an organizational system) are mutually dependent on each other.

Atoms that are mutually dependent are placed together in subsystems. This ensures that we achieve not just one of the millions of possible partitions, but the particular partition which is the simplest partition of all, given the constraints of the problem being solved.

MDM SOA said...

I agree with this idea of chaos in existing IS. We need a new and "smart EA framework" to succeed and I am afraid of TOGAF and Zachman because they are not really agile. More information below :
We are convinced it is not the case and we advocate another approach.

Sustainable IT Architecture community has defined an innovative “smart framework” to restructure IS in a progressive and sustainable way with a better agility and higher traceability of data and rules.

How is it possible? Quite simply by using MDM and BRMS as IS foundation associated with business governance features to really manage the IS.

Find out our Sustainable IT Architecture’s Framework (SITAF) and do not hesitate to attend a half-day presentation given by Pierre Bonnet:

Hope this message is hepful.


Pierre Bonnet
Founder of Sustainable IT Architecture