Simple Architectures for Complex Enterprises

Wednesday, September 1, 2010

SAPA Paradigm for IT Architecture

This blog is intended to give a basic introduction to the SAPA paradigm. SAPA is a new approach to designing, building, and delivering IT systems.

SAPA stands for Simple As Possible Architecture. The driving idea behind the SAPA paradigm is that simplicity reigns supreme. Complexity is the enemy. It drives systems cost, increases system failures, and greatly adds to the lifetime cost of an IT system.

There are a number of stages to building a SAPA. I think of the stages as follows:

Stage 1: Capability Mapping. Here we are trying to understand how the high level business processes that will be supported by the IT system are broken down into simplest possible set of capabilities (groups of business functionality.)

Stage 2: Business Architecture. Here we are taking one of the capabilities identified in Stage 1 and specifying the the simplest possible implementation of that capability in terms of business processes and dependencies.

Stage 3: IT Organization. Here we are identifying the simplest possible collection of IT systems necessary to support the business architecture identified in Stage 2 and documenting the dependencies between those systems.

Stage 4: IT Architecture. Here we are designing the simplest possible architecture for each of the IT systems identified in Stage 3.

Stage 5: IT Implementation. Here we are implementing the IT system as simply as possible following the architecture created in Stage 4.

Each of these stages follows the basic SAPA pattern:

Understand the immediate problem you are tying to solve
Design the simplest possible solution to that problem.
Validate that there is no simpler solution.

This common SAPA pattern has a number of implications that cut across stages. First, there must be some way to compare two solutions to see which is the simplest. Second, there must be a methodology you can use to test a solution to see if it can be simplified. And both of these imply that there must be some rational way of measuring complexity.

My recent work in the SAPA paradigm has focused at stages 1-3. For a good example of my work in these three stages, see my "anti-complexity patent." You can read about it here.

SAPA builds on the paradigms that come before it, especially Service-Oriented Architecture (SOA). But while SOAs are a natural way to realize a SAPA (stages 4-5) SOAs are in no way required for SAPA.

SAPA is a key enabler for many desirable features of IT systems. SAPA systems are easier to test, implement, and make secure. They perform better and are more reliable. They are cheaper to build and easier to maintain.

SAPA has something for everybody. For those interested in business/IT alignment, SAPA systems are highly aligned with the business processes they support. For those interested in cloud deployment, SAPA systems are well designed for life in the cloud. For those interested in Agile development, SAPA systems are a necessary prerequisite to effectively scaling up Agile approaches.

If you care about improving the organization's bottom line through more effective IT investments, you should care about SAPA. The reason is simple. As Simple As Possible!

................................................
Would you like to keep up to date with Roger's blogs, white papers, newsletters, books, conference appearances, and webinars on SAPA and Complexity? Subscribe to the ObjectWatch alerts. Instructions here.

Monday, May 3, 2010

Defining IT Complexity Terminology

My specialty is understanding how to measure, locate, and eliminate unnecessary IT complexity. One of the difficulties of this field is terminology. You would think that a word like simple would be simple to define. But there is no widespread agreement on what the words related to complexity mean.

So in this blog, I am going to define some of the terminology that is important in IT complexity reduction. I won't claim that these meanings are relevant to all fields, but when I use these words, this is what they will mean. This is my initial pass. I will give examples from my own work, since that is the domain with which I am most familiar.

I invite readers to comment. Soon I will consolidate the discussion into a terminology proposal for www.cuec.info (The Consortium for Untangling Enterprise Complexity.) I assume that better definitions will come out of this discussion.

Simplicity Framework: A simplicity framework is an approach, model, or methodology that seeks to find, measure, and remove complexity from some domain. Any simplicity framework should specify the domain for which it is intended. Example: SIP (Simple Iterative Partitions) is a simplicity framework for IT architecture.

Complexity: Complexity is a measure of an attribute of a system that makes that system difficult to understand, manage, and/or implement. Complexity is measured in some units appropriate for the domain and defined by a simplicity framework. Example: In SIP, complexity is measured in Standard Complexity Units (SCUs).

Solution Set: For some problem P in a given domain, the solution set is the set of all solutions that solve P. Note that a solution set only includes solutions to P. If a proposed solution would not solve P, then it is by definition not a member of the solution set. Example: When considering architectures for an inter-library loan system, we examined dozens of potential candidates from the solution set.

Simple Solution: A simple solution is the element of a solution set that has the least complexity of all elements in the solution set. Note that it is theoretically possible that more than one solution from the solution set share the lowest complexity. In this case, there are multiple simple solutions. Example: People were surprised that the simple solution to the inter-library loan system was not the one initially proposed.

Complex Solution: A complex solution is any of the elements of the solution set that are not the simple solution and whose complexity can be reduced through the simplicity framework. Example: In the inter-library loan system, all of the originally proposed SOA architectures were complex solutions.

Simplistic Solution: A simplistic solution is any proposed solution to a problem that lacks the minimum complexity necessary to solve that problem. By definition a simplistic solution is not a member of the solution set. Example: The catalog system we looked at turned out to be a simplistic solution to the inter-library loan system.

Chaotic Solution: A chaotic solution is a solution that is presumed to solve a problem, but whose complexity is so high that using (or continuing to use it) is not feasible and reducing it is not practical given the simplicity framework. Note that it is not always possible to determine if a chaotic solution is even a member of the solution set. Example: The present inter-library loan system is a chaotic solution.

Do you have any ideas for more terms that should be defined? Do you have issues with these definitions? Leave comments here or discuss with me on Twitter (@RSessions). And keep an eye on www.cuec.info for efforts to standardize some of these terms.

Planned Version 2 Changes:
@johanlindberg suggested using candidate to describe Simplistic and Chaotic. His argument: Simplistic and Chaotic are by definition not solutions. Good point! He also suggested giving comparison terms to show similar terms in complexity theory and SW development.

@HotFusionMan (Al Chou) suggested adding note about layers of complexity. I think this is a detail of the simplification framework, but I need to make clear the generic character of these definitions.

Add definitions for simplicity, simplification. Add description of which problem spaces these definitions are appropriate for (those that are striving for simplicity.) Make less specific to IT and general to any situation in which a move from complex to simple is desirable.

Incorporate Nikos's discussion on understandability.

And thanks to @JohnDCook for Twitter contributions!

Tuesday, April 6, 2010

The Failure of Success

I have written quite a bit about the cost of IT failure, most recently in a white paper. A number of people have criticized my analysis saying that just because a large project "fails" in the sense of being late, over budget, and/or not delivering expected functionality, doesn't mean that the project doesn't deliver value.

This may be true. The real question is this: had the business known how the project would have gone, would they still have agreed to do the project? If the answer to this is yes, then the project was a success. If the answer is no, then the project is a failure.

However even when the project comes in on-time, on-budget, and delivering expected functionality, the project may still not be a success. These three metrics (budget, time, functionality) tells us little about the project itself and much more about our ability to make predictions about the project.

Take a simple example. The business tells IT they need a system that delivers 100 functions. IT calculates the cost at $100K/function. They tell the business the project will cost $10M and three years to deliver. Business approves the project. IT delivers the project one month early and $1M below budget. The business is happy. IT looks good. The project is a success!

Or is it? All this tells us is that IT did what they said they would do. It doesn't tell us whether what IT said they would do was really reasonable.

In most projects that size, complexity is a major cost factor. If IT and the business take appropriate steps to manage that complexity, the cost/function can be greatly reduced. If a project can be delivered for $9M without complexity management, then it is highly likely that it could have been delivered for $5M with complexity management.

So is this project a success or a failure? According to most pundits (such as Standish) this project is an unqualified success. In my book, it is a $4M failure. This is one more reason I say that looking at the percentage of IT successes is a meaningless statistic. What we need to look at is the percentage of IT budgets that are wasted.

Sunday, December 6, 2009

The Fundamental Flaw of Complexity Management

As systems add more functionality, they become more complex. As systems become more complex, the traditional processes we use to manage those systems become more strained. The typical response on the part of those building these more complex systems is to try to understand how to scale up the processes from those that can handle simple systems to those that can handle complex systems.

Consider a "simple" system A with only 100 functions. Say process X has been used to successfully manage A. Process X could be Agile Development, RUP, Earned Value Management, or any other of your favorite management processes.

Now we need to build a more complex system B with 1000 functions. Since B has 10X the functionality of A and we know X works for A, most assume that we can use X to manage B as well, although we assume that it will take 10X the effort.

The flaw in this reasoning is that the difficulty of applying X to B (regardless of what X is) is proportional to the complexity of B, not to the functionality of B. And when the functionality increases by 10X, the complexity, because of the exponential relationship between functionality and complexity, actually increases thousands of times. The exact number is highly dependent on the nature of the functions of B and how they are organized, but the number will typically be very large.

As long as we focus on how to better use X to manage B, we are doomed to failure. The complexity of B will quickly outpace our ability to apply X.

Instead, we need to focus on the real problem, the complexity of B. We need to understand how to architect B not as a single system with 1000 functions, but as a cooperating group of autonomous systems, each with some subset of the total functionality of B. So instead of B, we now have B1, B2, B3, ... etc. Our ability to use X on each of Bi where i = 1, 2, ... will be dependent on how closely the complexity of the most complex Bi is to the complexity of A (which is the most complex system on which X is known to be a viable process.).

The bottom line: if we want to know how to use process X on increasingly complex systems, we must focus not on scaling up the functionality of X, but on scaling down the complexity of the systems.

For more information on scaling down complexity in IT systems, see my white paper, "The IT Complexity Crisis" available at http://bit.ly/3O3GMp.

Sunday, November 8, 2009

The IT Complexity Crisis: Danger and Opportunity

Roger's new white paper, The IT Complexity Crisis: Danger and Opportunity is now available.

Overview
The world economy is losing over six trillion USD per year to IT failures and the problem is getting worse. This 22 page white paper analyzes the scope of the problem, diagnoses the cause of the problem, and describes a cure to the problem. And while the cost of ignoring this problem is frighteningly high, the opportunities that can be realized by addressing this problem are extremely compelling.

The benefits to understanding the causes and cures for out-of-control complexity can have a transformative impact on every sector of our society, from government to private to not-for-profit.

Downloading the White Paper
You can download the white paper, download an accompanying spreadsheet for analyzing architectural complexity, and view various blogs that have discussed this white paper here.

Would you like to discuss the white paper? Add a comment to this blog!

Thursday, October 29, 2009

The Problem With Standish

In my recent white paper, The IT Complexity Crisis, I discussed how much IT failures are costing the world economy. I calculated the worldwide cost to be over $6 trillion per year. You can read the white paper here.

In this white paper I discuss the Standish Chaos numbers, but many readers have continued to question whether my conclusions are in agreement with Standish. I think my conclusions are in agreement, but I also think the Standish numbers are flawed. So I have mixed feeling about them. Let me explain.

The Standish Group has been publishing their annual study of IT failure, their "CHAOS Report" since 1994 and it is widely cited throughout the industry. According to the 2009 report, 24% of all IT projects failed outright, 44% were "challenged", and only 32% were delivered on time, on budget, and with required features and functions.

To be honest, I have never read the Standish Report. Given the $1000 price tag, not many people have. So, like most people, I am basing my analysis of it on the limited information that Standish has made public.

The problem with the Standish Report is not that it is analyzing the numbers wrong. The problem is that Standish is looking at the wrong numbers. It analyzes the percentage of IT projects that are successful, challenged (late, overbudget, etc.), or outright failures. This sounds like useful information. It isn't.

The information we really need is not what percentage of projects are successful, but what percentage of IT budgets are successful.

What is the difference between percentage of projects and percentage of budget? A lot. Let me give you an example.

Suppose you are an IT department with a $1M budget. Say you have six IT projects completed this year, four that cost $50K, one that cost $100K, and one that cost $700K.

Which of these projects is most likely to fail? All other things equal, the $700K project is most likely to fail. It is the largest and most complex. The less the project costs, the simpler the project is. The simpler the project is, the more likely it is to succeed.

So let's assume that three of the four $50K projects succeed, the $100K project succeeds, and the $700K project fails.

Standish would report this as 4/6 success rate, or a 67% success, 23% failure rate. I look at these same numbers and see something quite different.

I look at the percentage of IT budget that was successfully invested. I see $250 K of $1M budget invested in successful projects and $750 K in failed projects. I report this as a 25% success rate, a 75% failure rate.

So both Standish and I are looking at the same numbers, yet we have almost exactly opposite conclusions. Whose interpretation is better?

I argue that, from the organizational perspective, my interpretation is much more reasonable. The CEO wants to know how much money is being spent and what return that money is delivering. The CEO doesn't care how well the IT department does one-off minor projects, which are the projects that dominate the Standish numbers.

So the bottom line is that I have major issues with the Standish Report. It isn't that the Standish analysis is wrong. It is just that it is irrelevant.

Thursday, October 15, 2009

Notes from ITARC NYC Open Meeting on IT Failures

At the recent ITARC conference in NYC, I facilitated a open meeting on IT Failures. We only had one hour, but some interesting ideas were discussed. Thanks to Eric Weinstein for taking these notes.

Reasons people gave for IT failures they experienced:

-Lack of change management

- Scope of the requirements too high level/incomplete or fleshed out leading to bad outcomes

- Analysis of cost estimation was wrong b/c the requirements were not fleshed out

- Cost estimation is an art, man power, resource time, are hard to estimate.

- Lack of accurate communication and feedback AND whether the project is understood

- Final delivery had no bearing on value for the customer - all communication from the developers back to the business stakeholders that came back was totally ignored

- Functional requirements get a lot of attention BUT the non-functional requirements is invisible/doesn't get credit, how to quantify the cost avoidance, non-functional requirements

-Trade off of quick vs. correctly - executive irresponsibility

-Business has unrealistic expectations of delivery dates OR tech people in general estimating time - may skimp out on upfront analysis or testing...

-Implementation side - developers failing - tools to control SDLC process - source control system (full integration of code
check in - what requirements that code is fulfilling - must be reviewed sign/off)

- Main causes of failure is managing complexity of large systems - failure vs complexity has high relationship - more complex a system, the harder it is to scope...must learn how to take big monolithic systems and break down to smaller systems

Solutions

- "The Wrench in the System" Book recomendation

- Ask the business to delineate the success criteria, prioritize in numbers

- Understand timeframe, scope - rescope

- White paper - US Gov't - 66% of IT budget is high risk projects and half of those will fail