As systems add more functionality, they become more complex. As systems become more complex, the traditional processes we use to manage those systems become more strained. The typical response on the part of those building these more complex systems is to try to understand how to scale up the processes from those that can handle simple systems to those that can handle complex systems.
Consider a "simple" system A with only 100 functions. Say process X has been used to successfully manage A. Process X could be Agile Development, RUP, Earned Value Management, or any other of your favorite management processes.
Now we need to build a more complex system B with 1000 functions. Since B has 10X the functionality of A and we know X works for A, most assume that we can use X to manage B as well, although we assume that it will take 10X the effort.
The flaw in this reasoning is that the difficulty of applying X to B (regardless of what X is) is proportional to the complexity of B, not to the functionality of B. And when the functionality increases by 10X, the complexity, because of the exponential relationship between functionality and complexity, actually increases thousands of times. The exact number is highly dependent on the nature of the functions of B and how they are organized, but the number will typically be very large.
As long as we focus on how to better use X to manage B, we are doomed to failure. The complexity of B will quickly outpace our ability to apply X.
Instead, we need to focus on the real problem, the complexity of B. We need to understand how to architect B not as a single system with 1000 functions, but as a cooperating group of autonomous systems, each with some subset of the total functionality of B. So instead of B, we now have B1, B2, B3, ... etc. Our ability to use X on each of Bi where i = 1, 2, ... will be dependent on how closely the complexity of the most complex Bi is to the complexity of A (which is the most complex system on which X is known to be a viable process.).
The bottom line: if we want to know how to use process X on increasingly complex systems, we must focus not on scaling up the functionality of X, but on scaling down the complexity of the systems.
For more information on scaling down complexity in IT systems, see my white paper, "The IT Complexity Crisis" available at http://bit.ly/3O3GMp.
Sunday, December 6, 2009
Sunday, November 8, 2009
The IT Complexity Crisis: Danger and Opportunity
Roger's new white paper, The IT Complexity Crisis: Danger and Opportunity is now available.
Overview
The world economy is losing over six trillion USD per year to IT failures and the problem is getting worse. This 22 page white paper analyzes the scope of the problem, diagnoses the cause of the problem, and describes a cure to the problem. And while the cost of ignoring this problem is frighteningly high, the opportunities that can be realized by addressing this problem are extremely compelling.
The benefits to understanding the causes and cures for out-of-control complexity can have a transformative impact on every sector of our society, from government to private to not-for-profit.
Downloading the White Paper
You can download the white paper, download an accompanying spreadsheet for analyzing architectural complexity, and view various blogs that have discussed this white paper here.
Would you like to discuss the white paper? Add a comment to this blog!
Overview
The world economy is losing over six trillion USD per year to IT failures and the problem is getting worse. This 22 page white paper analyzes the scope of the problem, diagnoses the cause of the problem, and describes a cure to the problem. And while the cost of ignoring this problem is frighteningly high, the opportunities that can be realized by addressing this problem are extremely compelling.
The benefits to understanding the causes and cures for out-of-control complexity can have a transformative impact on every sector of our society, from government to private to not-for-profit.
Downloading the White Paper
You can download the white paper, download an accompanying spreadsheet for analyzing architectural complexity, and view various blogs that have discussed this white paper here.
Would you like to discuss the white paper? Add a comment to this blog!
Thursday, October 29, 2009
The Problem With Standish
In my recent white paper, The IT Complexity Crisis, I discussed how much IT failures are costing the world economy. I calculated the worldwide cost to be over $6 trillion per year. You can read the white paper here.
In this white paper I discuss the Standish Chaos numbers, but many readers have continued to question whether my conclusions are in agreement with Standish. I think my conclusions are in agreement, but I also think the Standish numbers are flawed. So I have mixed feeling about them. Let me explain.
The Standish Group has been publishing their annual study of IT failure, their "CHAOS Report" since 1994 and it is widely cited throughout the industry. According to the 2009 report, 24% of all IT projects failed outright, 44% were "challenged", and only 32% were delivered on time, on budget, and with required features and functions.
To be honest, I have never read the Standish Report. Given the $1000 price tag, not many people have. So, like most people, I am basing my analysis of it on the limited information that Standish has made public.
The problem with the Standish Report is not that it is analyzing the numbers wrong. The problem is that Standish is looking at the wrong numbers. It analyzes the percentage of IT projects that are successful, challenged (late, overbudget, etc.), or outright failures. This sounds like useful information. It isn't.
The information we really need is not what percentage of projects are successful, but what percentage of IT budgets are successful.
What is the difference between percentage of projects and percentage of budget? A lot. Let me give you an example.
Suppose you are an IT department with a $1M budget. Say you have six IT projects completed this year, four that cost $50K, one that cost $100K, and one that cost $700K.
Which of these projects is most likely to fail? All other things equal, the $700K project is most likely to fail. It is the largest and most complex. The less the project costs, the simpler the project is. The simpler the project is, the more likely it is to succeed.
So let's assume that three of the four $50K projects succeed, the $100K project succeeds, and the $700K project fails.
Standish would report this as 4/6 success rate, or a 67% success, 23% failure rate. I look at these same numbers and see something quite different.
I look at the percentage of IT budget that was successfully invested. I see $250 K of $1M budget invested in successful projects and $750 K in failed projects. I report this as a 25% success rate, a 75% failure rate.
So both Standish and I are looking at the same numbers, yet we have almost exactly opposite conclusions. Whose interpretation is better?
I argue that, from the organizational perspective, my interpretation is much more reasonable. The CEO wants to know how much money is being spent and what return that money is delivering. The CEO doesn't care how well the IT department does one-off minor projects, which are the projects that dominate the Standish numbers.
So the bottom line is that I have major issues with the Standish Report. It isn't that the Standish analysis is wrong. It is just that it is irrelevant.
In this white paper I discuss the Standish Chaos numbers, but many readers have continued to question whether my conclusions are in agreement with Standish. I think my conclusions are in agreement, but I also think the Standish numbers are flawed. So I have mixed feeling about them. Let me explain.
The Standish Group has been publishing their annual study of IT failure, their "CHAOS Report" since 1994 and it is widely cited throughout the industry. According to the 2009 report, 24% of all IT projects failed outright, 44% were "challenged", and only 32% were delivered on time, on budget, and with required features and functions.
To be honest, I have never read the Standish Report. Given the $1000 price tag, not many people have. So, like most people, I am basing my analysis of it on the limited information that Standish has made public.
The problem with the Standish Report is not that it is analyzing the numbers wrong. The problem is that Standish is looking at the wrong numbers. It analyzes the percentage of IT projects that are successful, challenged (late, overbudget, etc.), or outright failures. This sounds like useful information. It isn't.
The information we really need is not what percentage of projects are successful, but what percentage of IT budgets are successful.
What is the difference between percentage of projects and percentage of budget? A lot. Let me give you an example.
Suppose you are an IT department with a $1M budget. Say you have six IT projects completed this year, four that cost $50K, one that cost $100K, and one that cost $700K.
Which of these projects is most likely to fail? All other things equal, the $700K project is most likely to fail. It is the largest and most complex. The less the project costs, the simpler the project is. The simpler the project is, the more likely it is to succeed.
So let's assume that three of the four $50K projects succeed, the $100K project succeeds, and the $700K project fails.
Standish would report this as 4/6 success rate, or a 67% success, 23% failure rate. I look at these same numbers and see something quite different.
I look at the percentage of IT budget that was successfully invested. I see $250 K of $1M budget invested in successful projects and $750 K in failed projects. I report this as a 25% success rate, a 75% failure rate.
So both Standish and I are looking at the same numbers, yet we have almost exactly opposite conclusions. Whose interpretation is better?
I argue that, from the organizational perspective, my interpretation is much more reasonable. The CEO wants to know how much money is being spent and what return that money is delivering. The CEO doesn't care how well the IT department does one-off minor projects, which are the projects that dominate the Standish numbers.
So the bottom line is that I have major issues with the Standish Report. It isn't that the Standish analysis is wrong. It is just that it is irrelevant.
Thursday, October 15, 2009
Notes from ITARC NYC Open Meeting on IT Failures
At the recent ITARC conference in NYC, I facilitated a open meeting on IT Failures. We only had one hour, but some interesting ideas were discussed. Thanks to Eric Weinstein for taking these notes.
Reasons people gave for IT failures they experienced:
-Lack of change management
- Scope of the requirements too high level/incomplete or fleshed out leading to bad outcomes
- Analysis of cost estimation was wrong b/c the requirements were not fleshed out
- Cost estimation is an art, man power, resource time, are hard to estimate.
- Lack of accurate communication and feedback AND whether the project is understood
- Final delivery had no bearing on value for the customer - all communication from the developers back to the business stakeholders that came back was totally ignored
- Functional requirements get a lot of attention BUT the non-functional requirements is invisible/doesn't get credit, how to quantify the cost avoidance, non-functional requirements
-Trade off of quick vs. correctly - executive irresponsibility
-Business has unrealistic expectations of delivery dates OR tech people in general estimating time - may skimp out on upfront analysis or testing...
-Implementation side - developers failing - tools to control SDLC process - source control system (full integration of code
check in - what requirements that code is fulfilling - must be reviewed sign/off)
- Main causes of failure is managing complexity of large systems - failure vs complexity has high relationship - more complex a system, the harder it is to scope...must learn how to take big monolithic systems and break down to smaller systems
Solutions
- "The Wrench in the System" Book recomendation
- Ask the business to delineate the success criteria, prioritize in numbers
- Understand timeframe, scope - rescope
- White paper - US Gov't - 66% of IT budget is high risk projects and half of those will fail
Reasons people gave for IT failures they experienced:
-Lack of change management
- Scope of the requirements too high level/incomplete or fleshed out leading to bad outcomes
- Analysis of cost estimation was wrong b/c the requirements were not fleshed out
- Cost estimation is an art, man power, resource time, are hard to estimate.
- Lack of accurate communication and feedback AND whether the project is understood
- Final delivery had no bearing on value for the customer - all communication from the developers back to the business stakeholders that came back was totally ignored
- Functional requirements get a lot of attention BUT the non-functional requirements is invisible/doesn't get credit, how to quantify the cost avoidance, non-functional requirements
-Trade off of quick vs. correctly - executive irresponsibility
-Business has unrealistic expectations of delivery dates OR tech people in general estimating time - may skimp out on upfront analysis or testing...
-Implementation side - developers failing - tools to control SDLC process - source control system (full integration of code
check in - what requirements that code is fulfilling - must be reviewed sign/off)
- Main causes of failure is managing complexity of large systems - failure vs complexity has high relationship - more complex a system, the harder it is to scope...must learn how to take big monolithic systems and break down to smaller systems
Solutions
- "The Wrench in the System" Book recomendation
- Ask the business to delineate the success criteria, prioritize in numbers
- Understand timeframe, scope - rescope
- White paper - US Gov't - 66% of IT budget is high risk projects and half of those will fail
Sunday, October 4, 2009
Attacking Architectural Complexity
When I advocate for reducing the complexity in a large IT system, I am recommending partitioning the system into subsystems such that the overall complexity of the union of sub-systems is as low as possible while still solving the business problem.
To give an example, say we want to build a system with 10 business functions, F1, F2, F3, ... F10. Before we start building the system we want to subdivide the system into subsystems. And we want to do it in the least complex collection of subsystems.
There are a number of ways we could partition F1, F2, ... F10. We could, for example, put F1, F2, F3, F4, and F5 in S1 (for subsystem 1) and F6, F7, F8, F9, and F10 in S2 (for subsystem 2). Let's call this A1, for Architecture 1. So A1 has two subsystems, S1 with F1-F5 and S2 with F6-10.
Or we could have five subsystems with F1, F2 in S1, F3, F4 in S2, etc.. Let's call this A2, for Architecture 2. So A2 has five subsystems, each with two business functions.
Which is simpler, A1 or A2? Or, to be more accurate, which is less complex, A1 or A2. Or, to be as accurate as possible, which has the least complexity, A1 or A2?
We can't answer this question without measuring the complexity of both A1 and A2. But once we have done so we know which of the two architectures has less complexity. Let's say, for example, that A1 weighs in at 1000 SCUs (Standard Complexity Units, a measure that I use for complexity) and A2 weighs in at 500 SCUs. Now we know which is least complex and by how much. We know that A2 has half the complexity of A1. All other things being equal, we can predict that A2 will cost half as much to build as A1, give twice the agility, and cost half as much to maintain.
But is A2 the best possible architecture? Perhaps there is another architecture, say A3, that is even less complex, say, 250 SCUs. Then A3 is better than either A1 or A2.
One way we can attack this problem is to generate a set of all possible architectures that solve the business problem. Let's call this set AR. Then AR = {A1, A2, A3, ... An}. Then measure the complexity of each element of AR. Now we can choose the element with the least complexity. This method is guaranteed to yield the least complex architectural solution.
But there is a problem with this. The number of possible architectures for a non-trivial problem is very large. Exactly how large is given by Bell's Number. I won't go through the equation for Bell's number, but I will give you the bottom line. For an architecture of 10 business functions, there are 21,147 possible solution architectures. By the time we increase the system to 20 business functions, the number of architectures in the set AR is more than 5 trillion.
So it isn't practical to exhaustively look at each possible architecture.
Another possibility is to hire the best possible architects we can find on the assumption that their experience will guide them to the least complex architecture. But this is largely wishful thinking. Given a 20 business function system, the chances that even experienced architects will just happen to stumble on the least complex architecture our of more than 5 trillion possibilities is slim at best. You have a much better chance of winning the Texas lottery.
So how can we find the simplest possible architecture? We need to follow a process that leads us to the architecture of least complexity. This process is called SIP, for Simple Iterative Partitions. SIP promises to lead us directly to the least complex architecture that still solves the business problem. SIP is not a process for architecting a solution. It is a process for partitioning a system into smaller subsystems that collectively represent the least complex collection of subsystems that solve the business problem.
In a nutshell, SIP focuses exclusively on the problem of architectural complexity. More on SIP later. Stay tuned.
To give an example, say we want to build a system with 10 business functions, F1, F2, F3, ... F10. Before we start building the system we want to subdivide the system into subsystems. And we want to do it in the least complex collection of subsystems.
There are a number of ways we could partition F1, F2, ... F10. We could, for example, put F1, F2, F3, F4, and F5 in S1 (for subsystem 1) and F6, F7, F8, F9, and F10 in S2 (for subsystem 2). Let's call this A1, for Architecture 1. So A1 has two subsystems, S1 with F1-F5 and S2 with F6-10.
Or we could have five subsystems with F1, F2 in S1, F3, F4 in S2, etc.. Let's call this A2, for Architecture 2. So A2 has five subsystems, each with two business functions.
Which is simpler, A1 or A2? Or, to be more accurate, which is less complex, A1 or A2. Or, to be as accurate as possible, which has the least complexity, A1 or A2?
We can't answer this question without measuring the complexity of both A1 and A2. But once we have done so we know which of the two architectures has less complexity. Let's say, for example, that A1 weighs in at 1000 SCUs (Standard Complexity Units, a measure that I use for complexity) and A2 weighs in at 500 SCUs. Now we know which is least complex and by how much. We know that A2 has half the complexity of A1. All other things being equal, we can predict that A2 will cost half as much to build as A1, give twice the agility, and cost half as much to maintain.
But is A2 the best possible architecture? Perhaps there is another architecture, say A3, that is even less complex, say, 250 SCUs. Then A3 is better than either A1 or A2.
One way we can attack this problem is to generate a set of all possible architectures that solve the business problem. Let's call this set AR. Then AR = {A1, A2, A3, ... An}. Then measure the complexity of each element of AR. Now we can choose the element with the least complexity. This method is guaranteed to yield the least complex architectural solution.
But there is a problem with this. The number of possible architectures for a non-trivial problem is very large. Exactly how large is given by Bell's Number. I won't go through the equation for Bell's number, but I will give you the bottom line. For an architecture of 10 business functions, there are 21,147 possible solution architectures. By the time we increase the system to 20 business functions, the number of architectures in the set AR is more than 5 trillion.
So it isn't practical to exhaustively look at each possible architecture.
Another possibility is to hire the best possible architects we can find on the assumption that their experience will guide them to the least complex architecture. But this is largely wishful thinking. Given a 20 business function system, the chances that even experienced architects will just happen to stumble on the least complex architecture our of more than 5 trillion possibilities is slim at best. You have a much better chance of winning the Texas lottery.
So how can we find the simplest possible architecture? We need to follow a process that leads us to the architecture of least complexity. This process is called SIP, for Simple Iterative Partitions. SIP promises to lead us directly to the least complex architecture that still solves the business problem. SIP is not a process for architecting a solution. It is a process for partitioning a system into smaller subsystems that collectively represent the least complex collection of subsystems that solve the business problem.
In a nutshell, SIP focuses exclusively on the problem of architectural complexity. More on SIP later. Stay tuned.
Thursday, October 1, 2009
Why I Focus On Complexity
When it comes to IT failure, there is no lack of "the usual suspects". We can look to failures in project definition, project management, and needs assessment. We can point to flawed architectures, implementations, and testing procedures. We can focus on communications failures between the business and IT, between IT and the user community, and between different business units.
Yet given this extensive collection of failure factors any of which can doom an IT project, why do I focus almost exclusively on the issue of complexity?
I see all of the failure factors as falling into one or more of three categories:
1. The factor is caused by complexity.
2. The factor is greatly exacerbated by complexity.
3. The factor is solved as a side effect of solving the complexity problem.
Examples of failure factors that are directly caused by complexity are the various technical failure factors, such as poor security or scalability. It is very difficult to make a complex system secure or scalable. Solve the problem of complexity, and these problems become much easier to solve.
Examples of failure factors that are greatly exacerbated by complexity include those related to communications. As a project increases in complexity, people tend to fall more and more into specialized jargon which makes communications more difficult and adds yet more complexity to the already complex project. Different groups tend to see each other as the enemy. As a side effect of learning to solve complexity, the groups learn to see not each other as the enemy, but complexity as their common enemy. Their common language becomes the language of simplification.
Examples of factors that are solved as a side effect of solving the complexity problem include those related to organization. For example, a well known failure factor is the lack of executive sponsorship. But it is difficult to find sponsors for large complex expensive projects. Once a project is broken down into small, simpler, less expensive projects, finding executive sponsors for those projects is much easier.
The other reason I focus so much on complexity is that of all these failure factors, complexity is the only one that is universally present in all failed projects. The fact is that we are quite good at doing simple projects. Our skills just don't scale up to complex projects. So we can either tackle the failure factors piecemeal and try to figure out how to scale each up to higher levels of complexity, or we can try to figure out how to scale down the complex projects into simple projects that we already know how to solve.
So while I pay close attention to all of these failure factors, I continue to believe that the one that deserves our undivided attention is the problem of complexity.
Yet given this extensive collection of failure factors any of which can doom an IT project, why do I focus almost exclusively on the issue of complexity?
I see all of the failure factors as falling into one or more of three categories:
1. The factor is caused by complexity.
2. The factor is greatly exacerbated by complexity.
3. The factor is solved as a side effect of solving the complexity problem.
Examples of failure factors that are directly caused by complexity are the various technical failure factors, such as poor security or scalability. It is very difficult to make a complex system secure or scalable. Solve the problem of complexity, and these problems become much easier to solve.
Examples of failure factors that are greatly exacerbated by complexity include those related to communications. As a project increases in complexity, people tend to fall more and more into specialized jargon which makes communications more difficult and adds yet more complexity to the already complex project. Different groups tend to see each other as the enemy. As a side effect of learning to solve complexity, the groups learn to see not each other as the enemy, but complexity as their common enemy. Their common language becomes the language of simplification.
Examples of factors that are solved as a side effect of solving the complexity problem include those related to organization. For example, a well known failure factor is the lack of executive sponsorship. But it is difficult to find sponsors for large complex expensive projects. Once a project is broken down into small, simpler, less expensive projects, finding executive sponsors for those projects is much easier.
The other reason I focus so much on complexity is that of all these failure factors, complexity is the only one that is universally present in all failed projects. The fact is that we are quite good at doing simple projects. Our skills just don't scale up to complex projects. So we can either tackle the failure factors piecemeal and try to figure out how to scale each up to higher levels of complexity, or we can try to figure out how to scale down the complex projects into simple projects that we already know how to solve.
So while I pay close attention to all of these failure factors, I continue to believe that the one that deserves our undivided attention is the problem of complexity.
Monday, September 28, 2009
Cost of IT Failure
What does IT failure cost us annually? A lot.
According to the World Technology and Services Alliance, countries spend, on average, 6.4% of the Gross Domestic Product (GDP) on Information Communications Technology, with 43% of this spent on hardware, software, and services. This means that, on average, 6.4 X .43 = 2.75 % of GDP is spent on hardware, software, and services. I will lump hardware, software, and services together under the banner of IT.
According to the 2009 U.S. Budget, 66% of all Federal IT dollars are invested in projects that are “at risk”. I assume this number is representative of the rest of the world.
A large number of these will eventually fail. I assume the failure rate of an “at risk” project is between 50% and 80%. For this analysis, I’ll take the average: 65%.
Every project failure incurs both direct costs (the cost of the IT investment itself) and indirect costs (the lost “opportunity” costs). I assume that the ratio of indirect to direct costs is between 5:1 and 10:1. For this analysis, I’ll take the average: 7.5:1.
To find the predicted cost of annual IT failure, we then multiply these numbers together: .0275 (fraction of GDP on IT) X .66 (fraction of IT at risk) X .65 (failure rate of at risk) X 7.5 (indirect costs) = .089. To predict the cost of IT failure on any country, multiply its GDP by .089.
Based on this, the following gives the annual cost of IT failure on various regions of the world in billions of USD:
According to the World Technology and Services Alliance, countries spend, on average, 6.4% of the Gross Domestic Product (GDP) on Information Communications Technology, with 43% of this spent on hardware, software, and services. This means that, on average, 6.4 X .43 = 2.75 % of GDP is spent on hardware, software, and services. I will lump hardware, software, and services together under the banner of IT.
According to the 2009 U.S. Budget, 66% of all Federal IT dollars are invested in projects that are “at risk”. I assume this number is representative of the rest of the world.
A large number of these will eventually fail. I assume the failure rate of an “at risk” project is between 50% and 80%. For this analysis, I’ll take the average: 65%.
Every project failure incurs both direct costs (the cost of the IT investment itself) and indirect costs (the lost “opportunity” costs). I assume that the ratio of indirect to direct costs is between 5:1 and 10:1. For this analysis, I’ll take the average: 7.5:1.
To find the predicted cost of annual IT failure, we then multiply these numbers together: .0275 (fraction of GDP on IT) X .66 (fraction of IT at risk) X .65 (failure rate of at risk) X 7.5 (indirect costs) = .089. To predict the cost of IT failure on any country, multiply its GDP by .089.
Based on this, the following gives the annual cost of IT failure on various regions of the world in billions of USD:
REGION GDP (B USD) Cost of IT Failure (B USD)
World 69,800 6,180
USA 13,840 1,225
New Zealand 44 3.90
UK 2,260 200
Texas 1,250 110
Sunday, September 20, 2009
Sessions's Complexity Aphorisms
- Complexity is like heat. We need just enough to solve the problem. Any more kills us.
- The best IT solution is the simplest IT solution that solves the business problem.
- The simplest IT solution is the NULL solution, but that fails the effectiveness test.
- Complexity is the enemy.
- The cloud is a platform. Not a complexity solution.
- 60% of all IT budgets are invested in projects that are at risk for unnecessary complexity.
- Occam's Razor Applied to IT: When you have two competing architectures that solve exactly the same business problem, the simpler one is the better.
- Consulting organizations perpetuate complexity through fear: This project is too risky. You will fail and be blamed. Let us fail and be blamed.
- IT Complexity is a tax paid by everybody that benefits nobody.
- The best IT solution is the simplest IT solution that solves the business problem.
- The simplest IT solution is the NULL solution, but that fails the effectiveness test.
- Complexity is the enemy.
- The cloud is a platform. Not a complexity solution.
- 60% of all IT budgets are invested in projects that are at risk for unnecessary complexity.
- Occam's Razor Applied to IT: When you have two competing architectures that solve exactly the same business problem, the simpler one is the better.
- Consulting organizations perpetuate complexity through fear: This project is too risky. You will fail and be blamed. Let us fail and be blamed.
- IT Complexity is a tax paid by everybody that benefits nobody.
Monday, July 6, 2009
Three Rules for IT Simplification: Small, Separate, and Synergistic.
The most important factor in building a successful IT project is simplification. Simplification means that the overall system should be as simple as it can possible be while still delivering the goals of the project.
In general, there are three rules you must follow to build a simple IT system.
- Keep it small.
- Keep it separate.
- Keep it synergistic.
Let's take a look at each of these.
Rule 1. Keep it small.
Keeping it small means that a system should have no more functionality than it needs to deliver its defined goals. Every ounce of IT functionality must be clearly traceable to the business goals of the project. If a function is not traceable, it should be removed.
This, of course, can only be done if the business goals are clearly understood. Usually they are not. So make sure that at the beginning of the project, the business folks and the IT folks have clearly documented each business goal of the system. And that every bit of planned functionality can be mapped back to one of those goals.
This goal is often at odds with people's desire to make systems "reusable". In order to make systems more "reusable", IT likes to make systems more configurable and more functional. IT often convinces the business that reusability is an important goal, thereby justifying their efforts. Efforts to achieve reusability almost always add unnecessary complexity, and, in the end, rarely succeed. Think long and hard before you sacrifice simplicity at the alter of reusability.
Rule 2. Keep it separate.
An IT organization will inevitably need many systems and these systems will need to interoperate. Interoperability is at odds with simplicity. From a simplicity perspective, the less interoperability between systems, the better. From a corporate perspective, the more interoperability between systems, the better.
You can see the conflict. It is important to strike a balance between these two objectives. Design as much interoperability as necessary, but no more than necessary and design it with as much respect as possible for the boundaries between systems.
Rules 3. Keep it synergistic.
One of the most critical issues, from a simplicity perspective, is how functionality is placed. Many systems suffer huge complexity problems because functionality that should be co-located is not, or, just as serious, functionality that should not be co-located, is. It is critical that functions be placed based on natural (that is, business-defined) synergies.
So if simplicity is your game, remember the three S's that govern simplicity: small, separate, and synergistic. You won't be sorry!
In general, there are three rules you must follow to build a simple IT system.
- Keep it small.
- Keep it separate.
- Keep it synergistic.
Let's take a look at each of these.
Rule 1. Keep it small.
Keeping it small means that a system should have no more functionality than it needs to deliver its defined goals. Every ounce of IT functionality must be clearly traceable to the business goals of the project. If a function is not traceable, it should be removed.
This, of course, can only be done if the business goals are clearly understood. Usually they are not. So make sure that at the beginning of the project, the business folks and the IT folks have clearly documented each business goal of the system. And that every bit of planned functionality can be mapped back to one of those goals.
This goal is often at odds with people's desire to make systems "reusable". In order to make systems more "reusable", IT likes to make systems more configurable and more functional. IT often convinces the business that reusability is an important goal, thereby justifying their efforts. Efforts to achieve reusability almost always add unnecessary complexity, and, in the end, rarely succeed. Think long and hard before you sacrifice simplicity at the alter of reusability.
Rule 2. Keep it separate.
An IT organization will inevitably need many systems and these systems will need to interoperate. Interoperability is at odds with simplicity. From a simplicity perspective, the less interoperability between systems, the better. From a corporate perspective, the more interoperability between systems, the better.
You can see the conflict. It is important to strike a balance between these two objectives. Design as much interoperability as necessary, but no more than necessary and design it with as much respect as possible for the boundaries between systems.
Rules 3. Keep it synergistic.
One of the most critical issues, from a simplicity perspective, is how functionality is placed. Many systems suffer huge complexity problems because functionality that should be co-located is not, or, just as serious, functionality that should not be co-located, is. It is critical that functions be placed based on natural (that is, business-defined) synergies.
So if simplicity is your game, remember the three S's that govern simplicity: small, separate, and synergistic. You won't be sorry!
Thursday, April 30, 2009
How Many Enterprise Architects Does It Take To Measure A Donkey?
I don't know how this got started, but we were tweeting a discussion about enterprise architecture, and somehow the question came up...
How Many Enterprise Architects Does It Take To Measure A Donkey?
A: depends on where the datum is and which part of the donkey they measure!
A: and should not speed of donkey be considered too? For relativity effects I mean.
A: ahh yes, but it is velocity and in the direction of measurement! A jumping donkey!
A: jumping donkeys - can I have some of what you two are on?
A: EAs measuring the donkey? One measures required height of TO-BE donkey, one argues length and height are basically the same, ...
A: .. one says the problem is that the front half of the donkey and the rear half are not in alignment.
A: ... one says that we can't measure the donkey until we have completed the business case.
A: one say that we need an industry consortium to define the best practices for donkey measuring.
A: ... one says we don't need to measure the donkey. It's not strategic. We need to outsource it.
A: ... one says that we can't measure the donkey until we first partition it into little pieces so that it isn't so complex.
With contributions from @RSessions, @richardveryard,@seabird20, @taotwit, and @j4ngis. (Did I miss anybody?)
How Many Enterprise Architects Does It Take To Measure A Donkey?
A: depends on where the datum is and which part of the donkey they measure!
A: and should not speed of donkey be considered too? For relativity effects I mean.
A: ahh yes, but it is velocity and in the direction of measurement! A jumping donkey!
A: jumping donkeys - can I have some of what you two are on?
A: EAs measuring the donkey? One measures required height of TO-BE donkey, one argues length and height are basically the same, ...
A: .. one says the problem is that the front half of the donkey and the rear half are not in alignment.
A: ... one says that we can't measure the donkey until we have completed the business case.
A: one say that we need an industry consortium to define the best practices for donkey measuring.
A: ... one says we don't need to measure the donkey. It's not strategic. We need to outsource it.
A: ... one says that we can't measure the donkey until we first partition it into little pieces so that it isn't so complex.
With contributions from @RSessions, @richardveryard,@seabird20, @taotwit, and @j4ngis. (Did I miss anybody?)
Wednesday, April 15, 2009
Factors Driving System Complexity
IT Systems are failing at an alarming rate and the rate of failure is increasing. Approaches that we have used in the past to successfully deliver IT systems are no longer working.
Something has changed: the complexity of the systems. It should be no surprise to anybody in IT that the complexity of the systems we are being asked to build has dramatically increased. The only surprise is that we have not adapted the processes that we use to build IT systems to take into account this increase.
Processes that can drive simple IT systems development do not scale up as the complexity of the systems increases beyond a certain point, a point that we have long passed.
I do not advocate throwing out what we know about building simple systems. Approaches such as Agile Development are great, as long as the complexity of what we are building is manageable. The approach that I advocate is learning how to break large, complex systems that we do not know how to build into smaller, simpler systems that we do know how to build.
I call this process partitioning. It is important to remember that the goal is not just to create smaller systems, but smaller systems that are as simple as possible. The specific process that I advocate for doing this is called SIP, for Simple Iterative Partitions. SIP is designed from the start to generate systems that are both small and simple.
While the goal of partitioning is to create smaller simple systems, the process itself is neither small nor simple. Planning for simplicity paradoxically adds complexity to the planning process but pays huge dividends further on. The extra work involved in delivering complex systems is far greater than the extra work involved in planning for simplicity from the beginning.
Partitioning is a science, like medicine. It must be led by people who understand the nature of the disease and have been trained in how to manage it. The training is important because there are many things that can go wrong. When things do go wrong, complexity is inadequately removed, or, in some cases, made worse.
IT failures due to poor partitioning are epidemic in our industry. For example, there are many failed service-oriented architectures (SOAs). Behind almost all of these SOA failures is a partitioning failure.
I have been studying IT failures for a number of years now. I have concluded that partitioning failures are the primary cause of most IT failures. The bigger the failure, the more likely it is that incorrect partitioning is the root cause. The symptom of partitioning failure is always the same: the project seems swamped by complexity. If you have worked on a project whose complexity killed the project, then you have experienced firsthand the effects of partitioning failure.
I have seen so many partitioning failures that I have started to recognize underlying patterns or categories of partitioning failures. At this point, I have identified eleven categories, and the list is growing as I analyze more failures. The eleven categories are as follows:
(Thanks to Chris Bird for suggesting this topic with his innocent tweeted question: “What are contributors to complexity?”)
Something has changed: the complexity of the systems. It should be no surprise to anybody in IT that the complexity of the systems we are being asked to build has dramatically increased. The only surprise is that we have not adapted the processes that we use to build IT systems to take into account this increase.
Processes that can drive simple IT systems development do not scale up as the complexity of the systems increases beyond a certain point, a point that we have long passed.
I do not advocate throwing out what we know about building simple systems. Approaches such as Agile Development are great, as long as the complexity of what we are building is manageable. The approach that I advocate is learning how to break large, complex systems that we do not know how to build into smaller, simpler systems that we do know how to build.
I call this process partitioning. It is important to remember that the goal is not just to create smaller systems, but smaller systems that are as simple as possible. The specific process that I advocate for doing this is called SIP, for Simple Iterative Partitions. SIP is designed from the start to generate systems that are both small and simple.
While the goal of partitioning is to create smaller simple systems, the process itself is neither small nor simple. Planning for simplicity paradoxically adds complexity to the planning process but pays huge dividends further on. The extra work involved in delivering complex systems is far greater than the extra work involved in planning for simplicity from the beginning.
Partitioning is a science, like medicine. It must be led by people who understand the nature of the disease and have been trained in how to manage it. The training is important because there are many things that can go wrong. When things do go wrong, complexity is inadequately removed, or, in some cases, made worse.
IT failures due to poor partitioning are epidemic in our industry. For example, there are many failed service-oriented architectures (SOAs). Behind almost all of these SOA failures is a partitioning failure.
I have been studying IT failures for a number of years now. I have concluded that partitioning failures are the primary cause of most IT failures. The bigger the failure, the more likely it is that incorrect partitioning is the root cause. The symptom of partitioning failure is always the same: the project seems swamped by complexity. If you have worked on a project whose complexity killed the project, then you have experienced firsthand the effects of partitioning failure.
I have seen so many partitioning failures that I have started to recognize underlying patterns or categories of partitioning failures. At this point, I have identified eleven categories, and the list is growing as I analyze more failures. The eleven categories are as follows:
- Delayed partitioning: The partitioning did not begin early enough in the project life cycle. It should be mostly complete before the business architecture is begun.
- Incomplete decomposition: The decompositional analysis of functions did not go far enough, resulting in too coarse a granularity of the functions being partitioned.
- Excessive decomposition: The decompositional analysis of functions went too far, resulting in too fine a granularity of the functions being partitioned.
- Bloated subsystems: Too many functions have been assigned to one or more subsystems.
- Scant subsystems. Too few functions have been assigned to one or more subsystems.
- Incorrect assignment: Multiple functions have been assigned to the same subsystem that do not belong together or have been assigned to different subsystems that do belong together.
- Duplicated capabilities: Functions have been duplicated across different subsystems.
- Unnecessary capabilities: Functions have been partitioned that should not have been included in the first place.
- Technical partition mismatch: The technical partitioning does not match the business partitioning, a common problem in purchased systems and service-oriented architectures.
- Inadequate depth: The technical partitioning does not extend far enough. In SOAs, this problem often manifests itself as multiple services sharing common data.
- Boundary decay: The boundaries between subsystems in the partition are not strong enough, and functionality slips back and forth between subsystems.
(Thanks to Chris Bird for suggesting this topic with his innocent tweeted question: “What are contributors to complexity?”)
Thursday, April 2, 2009
Adaptability of Large Systems
As I was tweeting the other day, I came upon a tweet from Noel Dickover about large systems being less adaptable than small systems. This began a tweet exchange that I thought brought up some important issues about adaptability.
Many people believe that large systems are harder to adapt than smaller systems. And, in genera, this is true. But system adaptability is not a function of size, it is a function of architecture. When a large system is partitioned correctly so that it is composed of a number of smaller, autonomous systems, then it MAY be more adaptable than a single smaller system.
I say "may", because its adaptability depends on how well it has been partitioned. The key is not whether the partitioning is good technically (say, mapping well to an SOA), but how well the technical partitions overlay on top of the business partitions of the organizations.
In other words, when a large system is built of autonomous smaller systems AND those smaller systems map well to the autonomous processes that occur naturally within the business, then, and only then you have a large system which is highly adaptable.
The reason so many systems fail to achieve this (even when they do manage a reasonable technical partitioning) is that the technical partitioning MUST BE driven by the business partitioning. This requires a partitioning analysis of the business that is completed before the technical architecture of the system is even begun.
This business partitioning analysis is best done by representatives of both the business and the IT organization. The business group has the best understanding of how functions relate to each other and the technical group has the best understanding of how this business partitioning analysis will eventually drive the technical partitioning architecture.
Since both business and technical experts are involved in this exercise, I place this work in the common ground between business and technology, the watering hole that we call enterprise architecture. But it is enterprise architecture with a very specific focus: driving technical partitioning from business partitioning analysis with the eventual goal of highly flexible systems that are pegged closely to the business need and mirror closely the business organization.
Many people believe that large systems are harder to adapt than smaller systems. And, in genera, this is true. But system adaptability is not a function of size, it is a function of architecture. When a large system is partitioned correctly so that it is composed of a number of smaller, autonomous systems, then it MAY be more adaptable than a single smaller system.
I say "may", because its adaptability depends on how well it has been partitioned. The key is not whether the partitioning is good technically (say, mapping well to an SOA), but how well the technical partitions overlay on top of the business partitions of the organizations.
In other words, when a large system is built of autonomous smaller systems AND those smaller systems map well to the autonomous processes that occur naturally within the business, then, and only then you have a large system which is highly adaptable.
The reason so many systems fail to achieve this (even when they do manage a reasonable technical partitioning) is that the technical partitioning MUST BE driven by the business partitioning. This requires a partitioning analysis of the business that is completed before the technical architecture of the system is even begun.
This business partitioning analysis is best done by representatives of both the business and the IT organization. The business group has the best understanding of how functions relate to each other and the technical group has the best understanding of how this business partitioning analysis will eventually drive the technical partitioning architecture.
Since both business and technical experts are involved in this exercise, I place this work in the common ground between business and technology, the watering hole that we call enterprise architecture. But it is enterprise architecture with a very specific focus: driving technical partitioning from business partitioning analysis with the eventual goal of highly flexible systems that are pegged closely to the business need and mirror closely the business organization.
Thursday, March 12, 2009
The Cancer of Complexity
After many years in IT, I am convinced that code complexity is a relatively unimportant issue. This may sound strange, coming from someone who is always reminding people that Complexity is the Enemy. How can I not care about code complexity?
What is much more important than code complexity is how that code is architecturally organized. We can deal with complex code if that code is well sequestered. It is the effectiveness of the sequestering rather than any pockets of code complexity that will determine the complexity of the larger system.
Complex code that is well sequestered has what I describe as "benign complexity". An example of a system with benign complexity is a web service that is poorly written (complex) but well encapsulated (sequestered). IT systems that have benign complexity may have localized problems, but these problems are rarely lethal to the system as a whole, and, if they can't be solved, can at least be surgically excised.
Complex code that is poorly sequestered has what I describe as “malignant complexity”. An example of a system with malignant complexity is collection of services that all share common persistent data. These systems are also complex, but now the complexity is not localized and is almost impossible to address. These systems are usually headed for serious problems.
There are many similarities between malignant cancer (that is, cancer that is progressive and uncontrolled) and malignant complexity. Here are some that I have identified in recent twitter conversations:
I do believe that complexity is the enemy. Until we better understand complexity, our chances of building better IT systems is limited. The first thing we must understand about complexity is that not all complexity is equal. And the complexity on which most people focus is probably the least complex complexity of all.
(Join the twitter conversation - @RSessions.)
What is much more important than code complexity is how that code is architecturally organized. We can deal with complex code if that code is well sequestered. It is the effectiveness of the sequestering rather than any pockets of code complexity that will determine the complexity of the larger system.
Complex code that is well sequestered has what I describe as "benign complexity". An example of a system with benign complexity is a web service that is poorly written (complex) but well encapsulated (sequestered). IT systems that have benign complexity may have localized problems, but these problems are rarely lethal to the system as a whole, and, if they can't be solved, can at least be surgically excised.
Complex code that is poorly sequestered has what I describe as “malignant complexity”. An example of a system with malignant complexity is collection of services that all share common persistent data. These systems are also complex, but now the complexity is not localized and is almost impossible to address. These systems are usually headed for serious problems.
There are many similarities between malignant cancer (that is, cancer that is progressive and uncontrolled) and malignant complexity. Here are some that I have identified in recent twitter conversations:
- Both malignancies grow exponentially over time.
- It is difficult or impossible to control the rate of growth. (Thanks to A. Jangbrand for this one.)
- Both malignancies can be prevented much more easily than cured.
- Left to their own devices, both malignancies will destroy their host.
- When removal is attempted, it is easy to splinter the malignancy and form new malignancies that can themselves grow. (Thanks to Richard Veryard for this one.)
- By the time symptoms are noticed, the malignancy has often reached an advanced and sometimes incurable state.
- Both malignancies can spread to other locations that are only remotely connected with the original location.
I do believe that complexity is the enemy. Until we better understand complexity, our chances of building better IT systems is limited. The first thing we must understand about complexity is that not all complexity is equal. And the complexity on which most people focus is probably the least complex complexity of all.
(Join the twitter conversation - @RSessions.)
Tuesday, January 27, 2009
Editorial: Obama's Information Technology Priority
Federal IT projects fail at an alarming rate. The total cost to the U.S. economy? According to my calculations, at least $200 billion per year. This editorial that I wrote is reprinted from the Perspectives of the International Association of Software Architects (January 2009). It is an in-depth analysis of why so many Federal IT systems are in trouble and what steps the Obama administration needs to take to control this problem.
You can pick up a PDF version of the editorial here. And if that doesn't work, go to www.objectwatch.com and follow the whitepapers tab.
What do you think? Leave a comment and let me know.
You can pick up a PDF version of the editorial here. And if that doesn't work, go to www.objectwatch.com and follow the whitepapers tab.
What do you think? Leave a comment and let me know.
Subscribe to:
Posts (Atom)