Sunday, October 4, 2009

Attacking Architectural Complexity

When I advocate for reducing the complexity in a large IT system, I am recommending partitioning the system into subsystems such that the overall complexity of the union of sub-systems is as low as possible while still solving the business problem.

To give an example, say we want to build a system with 10 business functions, F1, F2, F3, ... F10. Before we start building the system we want to subdivide the system into subsystems. And we want to do it in the least complex collection of subsystems.

There are a number of ways we could partition F1, F2, ... F10. We could, for example, put F1, F2, F3, F4, and F5 in S1 (for subsystem 1) and F6, F7, F8, F9, and F10 in S2 (for subsystem 2). Let's call this A1, for Architecture 1. So A1 has two subsystems, S1 with F1-F5 and S2 with F6-10.

Or we could have five subsystems with F1, F2 in S1, F3, F4 in S2, etc.. Let's call this A2, for Architecture 2. So A2 has five subsystems, each with two business functions.

Which is simpler, A1 or A2? Or, to be more accurate, which is less complex, A1 or A2. Or, to be as accurate as possible, which has the least complexity, A1 or A2?

We can't answer this question without measuring the complexity of both A1 and A2. But once we have done so we know which of the two architectures has less complexity. Let's say, for example, that A1 weighs in at 1000 SCUs (Standard Complexity Units, a measure that I use for complexity) and A2 weighs in at 500 SCUs. Now we know which is least complex and by how much. We know that A2 has half the complexity of A1. All other things being equal, we can predict that A2 will cost half as much to build as A1, give twice the agility, and cost half as much to maintain.

But is A2 the best possible architecture? Perhaps there is another architecture, say A3, that is even less complex, say, 250 SCUs. Then A3 is better than either A1 or A2.

One way we can attack this problem is to generate a set of all possible architectures that solve the business problem. Let's call this set AR. Then AR = {A1, A2, A3, ... An}. Then measure the complexity of each element of AR. Now we can choose the element with the least complexity. This method is guaranteed to yield the least complex architectural solution.

But there is a problem with this. The number of possible architectures for a non-trivial problem is very large. Exactly how large is given by Bell's Number. I won't go through the equation for Bell's number, but I will give you the bottom line. For an architecture of 10 business functions, there are 21,147 possible solution architectures. By the time we increase the system to 20 business functions, the number of architectures in the set AR is more than 5 trillion.

So it isn't practical to exhaustively look at each possible architecture.

Another possibility is to hire the best possible architects we can find on the assumption that their experience will guide them to the least complex architecture. But this is largely wishful thinking. Given a 20 business function system, the chances that even experienced architects will just happen to stumble on the least complex architecture our of more than 5 trillion possibilities is slim at best. You have a much better chance of winning the Texas lottery.

So how can we find the simplest possible architecture? We need to follow a process that leads us to the architecture of least complexity. This process is called SIP, for Simple Iterative Partitions. SIP promises to lead us directly to the least complex architecture that still solves the business problem. SIP is not a process for architecting a solution. It is a process for partitioning a system into smaller subsystems that collectively represent the least complex collection of subsystems that solve the business problem.

In a nutshell, SIP focuses exclusively on the problem of architectural complexity. More on SIP later. Stay tuned.


David Wright said...

Yes, do explain more. Right now it sounds like Structured Design: High Cohesion, Low Coupling... and how does one measure in SCUs?

Tom G said...

I would echo David here: I can't see any clear difference between this and Structured Design, other than that you have an (unexplained) metric for 'complicatedness' and some (unexplained) method for working with that metric.

What this still doesn't do is address true complexity. "All other things being equal", you say: but in a true complex context, not only are 'other things' not equal, even when they are they don't stay equal - complexity is dynamic, not static.

If, as I suspect, you use the term 'complexity' to indicate 'degree of complicatedness' in a constrained, stable, Aristotelian-logic system - typical of the internals of highly complicated IT-based systems-of-systems - then yes, your description makes sense.

But if you think this method will give you control over true complexity, you'd be falling for the same trap that many fell into with chaos-mathematics. A metric of unpredictability does not make the context more predictable: it makes the unpredictability predictable, but the context remains unpredictable because the unpredictability is an inherent characteristic of the context. (Example: radioactive decay. We can usually define the half-life to a high precision, by statistical means, but it still tells us nothing about when a single specific atom will split - and that split may be the critical factor in the system. In other words, a chaotic 'market-of-one' factor driving a complex system.)

The internals of IT systems can often usefully be described in terms of degree-of-complicatedness. Using reductionist tactics is often the most appropriate approach there. But the social/technical context within which that IT-system operates will invariably include some aspects of true complexity. Attempting to use reductionist tactics to 'solve' true complexity is a guaranteed cause of failure, especially in the longer term. So we need to be very clear about which kind of 'complexity' we're dealing with in each context, and adapt our tactics accordingly.

Richard Veryard said...

The principles of structured design were influenced by Christopher Alexander's first book Notes on the Synthesis of Form. How does SIP compare with Alexander's early thoughts on methodology?

Alexander has largely repudiated this early work, and Tom's thinking is possibly closer to Alexander's later work.

Jurgen Appelo said...

I agree with Tom G. This is about complicatedness. Not about complexity.

Richard Veryard said...

Sent TomG a couple of references via Twitter, but I thought I'd put them here as well.

Itay Maman said...

I agree that we need to be able to manage complexity and strive for solution that minimize it. Given that we cannot measure complexity a-priori, our best bet is to start building a solution and then gradually refine it. This is what Agile is all about (and presumably SIP, which I would like to hear more of).

However, the architectural complexity (of software) is much more complicated that what was presented here. In software every composite compoenent (that is: soemthing that is made of smaller building blocks) is itself a building block.

Thus, when building a solution for 10 business functions, one should also consider solutions where an individual function is taken apart and implemented by several different subsystems.

For instance, if we decompose F1 into F1a and F1b we can then come up with the following solution:
S1: F1a, F2 - F8
S2: F1b, F9

Clearly there are many other viable combinations which implies that the space of solutions is much bigger that what is implied in the post. This, of course, does not contradicts this post's main point, just strengthens it.

Roger Sessions said...

David Wright: I am just finishing off a 20 page white paper that I hope will fill in the missing details. It should be ready within a few days.

As far as “High Cohesion, Low Coupling”, I am not proposing high cohesion, low coupling, I am assuming it. One way to look at SIP is that it is driving an architecture which gives the ideal balance of high cohesion, low coupling.

SIP does not address the solution architecture (how a subset of functionality is designed, architected, or implemented). It focuses on making sure we have the right number of subsets and the right functionality in each. In other words, SIP is about the optimal way of organizing a large system into smaller, autonomous systems.

You could argue that the SIP focus is just a small part of the bigger problem of delivering an IT system. After all, Once you have these subsets identified, you have the main part of the task still in front of you, and SIP offers little here.

I completely agree that SIP addresses only a small part of the puzzle of delivering an large IT system. But I believe it is a crucial part, the complexity management. Get this part right, and the rest flows smoothly. Get this part wrong, and the rest will likely fail.

So SIP differs from a methodology like TOGAF, which addresses every aspect of architecting an IT system. SIP only tries to add value in one small but crucial area: complexity management.

Tom G, Jurgen Appelo, Richard Veryard : The goals of SIP and structured design are similar, perhaps identical. However structured design has no methodology to ensure that the eventual solution is the least complex solution. It is driven by the skill of the architect. And, as I said in the original post, this is highly unlikely to drive the simplest possible solution.

I assume that complicatedness has to do with the difficulty to understand a system. I define complexity as the number of states in a system. Complexity and complicatedness are thus related, but still quite different. Consider a bowl containing a single dice. Now add one more dice. By adding one more dice, you have increased the complicatedness of the system by two, but the complexity (the number of states) by six.

Itay Maman: I think that we can measure system complexity apriori. What I claim for SIP is exactly this: to predict the overall complexity of a completed system before the first line of code is written and before the first architectural design is laid out for that system. Now obviously I can’t predict all aspects of the system complexity, in particular, I can’t predict how complex the implementation code will be (in terms of loops, branches, and organization.) But I claim that I can predict the inherent complexity of a proposed architecture for an system and thus the "goodness" of the proposed architecture before the architecture even begins.

SIP is just the first step on the long journey of delivering an IT system. But it is a crucial step, and SIP makes sure you are heading in the right direction.

Thanks, everybody for commenting. This is helping me hone my thoughts on this.

As I said, I plan on a larger white paper that I hope will tie together the many pieces within a few days.

Tom G said...

@Roger: "I define complexity as the number of states in a system."

Okay, we have a fundamental clash in terminology here, because the Cynefin or systems-theory concept of 'complexity' is radically different.

In your model, the number of possible states is always knowable. It may be very large, but it is still possible for it to be predetermined in some way. (Presumably this is one of the key goals of the SIP methodology?)

In Cynefin and the like, the key point is that the number of states not only is not known, but by definition cannot be known - i.e. is 'unknowable'. In many cases (as in some aspects of my radioactive-decay example) it could well be infinite or near-infinite.

The mathematics for handling a finite series are significantly different from those for handling an infinite or near-infinite series. For a mathematical function involving only finite series, we can identify an explicit 'solution'; for any function involving an infinity, the best we can achieve is an approximation that points towards (implies) one or more context-specific answers, for which the optimal 'answer' is highly likely to change over time.

Your version of 'complexity' is in effect a special-case in which artificial boundaries have been applied in order to constrain the possible number of identifiable states for each factor. This is a useful tactic within the artificial constraint of a software-based system-of-systems. However, it can easily become problematic as soon as we touch the real world at either of that scale, such as seen in signal-theory (in which true complexity applies well before we reach quantum-uncertainty levels), or in any form of human/machine interface.

So whilst I don't disagree with your definition of 'complexity', I would urge you to perhaps be more careful and explicit about its limitations.

Hope this helps.

Richard Veryard said...

Christopher Alexander's first book Notes on the Synthesis of Form was "discovered" by Ed Yourdon and Tom DeMarco and influenced their thinking on structured methods. You make a fair point about structured design methods as popularized by Yourdon, deMarco and others (including James Martin). But Alexander's approach had a lot of mathematical detail that the software gurus didn't try to replicate. While this book no longer reflects Alexander's own views on the design process, it is still worth reading.

Tom G said...

@Roger: "I assume that complicatedness has to do with the difficulty to understand a system. I define complexity as the number of states in a system. Complexity and complicatedness are thus related, but still quite different. Consider a bowl containing a single dice. Now add one more dice. By adding one more dice, you have increased the complicatedness of the system by two, but the complexity (the number of states) by six."

Note that that "I assume" is only an assumption - a fairly arbitrary one at that. In the Cynefin sense, your 'complicatedness' and 'complexity' would both be forms of complicatedness, because both are 'knowable'.

To take your dice example, it's a good illustration of complicatedness - the way in which an additive relationship compounds the level of complication. (And yes, it is easy to use "adds to the complexity" as a shorthand for this - but the danger is that it then leaves us no means to describe true complexity.)

So note that you've assumed an additive relationship there: adding one more six-faced die adds six more possible states. In a true complexity, we cannot know beforehand what the number of faces will be, or what the relationship between the dice may be. We can probably identify those at the present time - but they may well change tomorrow. Even in this simple case, the effective number of states will vary according both to knowable and unknowable factors; and the outcome 'states' - the resultant outputs from the relationship will be messier still.

For example, what happens when the relationship of the two dice A and B changes at random between A+B, A-B, A*B, A/B, B/A, A XOR B and so on? In many cases we should be able to identify the (nominal) causal-relationship retrospectively, but the key point is that we may not know it at the time. And if our 'complicated' system assumes only an additive relationship between the two dice, it's likely to give us seriously wrong answers when the relationship changes. In true complexity we can only derive the relationship from the answers we get - in other words, we're forced to use non-logical methods (e.g. analogy, metaphor, inductive reasoning) to derive the current underlying logic. If we don't respect the fact that those methods are non-logical, we're again likely to end up in serious trouble.

In short, what you've described will work well within an artificially constrained system in which the factors and their relationships (however complicated they may be) do remain stable, do follow a true/false logic, and hence do remain predictable - as is typical for (most of) the internal logic of IT-based systems and systems-of-systems. But this approach will fail as soon as it hits up against any infinity or unpredictability - and the real world is riddled with those. To deal with the real world, we need to respect complexity as complexity - and not pretend that we can 'solve' it all by over-simplistic layers of complication.

Roger Sessions said...

Tom G: Keep in mind that SIP is intended only to deal with the world of IT systems. While I believe the ideas have more far reaching implications, I have not had the time to explore these implications. For now, I just claim it has value in IT.

And I do believe that my definition of complexity ("the number of states in a system") is in line with how many define complexity in related systems. For example, from wikipedia: "In information processing, complexity is a measure of the total number of properties transmitted by an object and detected by an observer. Such a collection of properties is often referred to as a state."

I am not trying to put together an all encompassing theory of complexity (at least, not yet). I am dealing with the very real problem of reducing IT failures due to mismanaged complexity. If I can help solve that problem, I will be happy!

Tom G said...

Yeah, agreed: in essence what we're dealing with is a clash of terminology.

Your usage is appropriate (or rather, a common usage) for IT-systems.

For those of us who deal in systems in the broader sense, 'complexity' has a broader meaning, as described above.

In effect, your usage applies to quite a small subset of 'complexity' in the Cynefin etc sense, in which there's an assumption that the 'world' being worked on conforms solely to Aristotelian true/false logic, and in which the concept of 'state' and suchlike are still meaningful. In Cynefin, your version of 'complex' is 'knowable', and hence belongs in the 'complicated' domain, not the 'complex' domain. Hence the confusion here.

Which is fine - except that there is a real danger of a 'term-hijack', where this subset purports to be the whole, preventing necessary usage of the term in its true broader sense. Hence 'twould be useful to perhaps be a bit more explicit that you're only applying this narrow usage ("the number of states in a system"), and not purporting to present "an all-encompassing theory of complexity"? :-)

Best wishes, anyway.