Simple Architectures for Complex Enterprises

Thursday, August 21, 2008

Complexity is not a technology problem

In a recent copy of CIO Magazine, John Parkinson wrote an article titled Simplifying IT Management is Anything But. In this article, John discussed the difficulty in seeing through the vendor presentations to see which technologies would really make a difference to an organization.

In my view, if you are trying to understand what technologies to use to address complexity, you are probably too late. The solution to complexity is never found in technology. It is found in architecture. And architecture is mostly technology neutral. My perspective on complexity is that it can only be addressed at the enterprise architectural level, and then only if an organization specifically focuses on that as the overriding issue. In my book, Simple Architectures for Complex Enterprises, I address the issue of architectural complexity in depth. In summary, we must first understand complexity, then model it, then build processes that can be proven to address it. But the single most important factor is attitude. We need to view complexity as an enemy and simplicity as a core business asset. Every business analyst and IT architect needs to unite in the battle against complexity. Most IT systems fail, and the bigger they are, the harder they fall. When an IT system fails, the reason is almost always unmanaged complexity.

Friday, August 15, 2008

Simple Architectures - The Book (Discussion)

Simple Architectures For Complex Enterprises is now available. You can order it at Amazon. This book introduces a new way of thinking about Enterprise Architecture. The focus of Enterprise Architecture should not be on efficient use of IT, improving alignment between IT and the business, or any of the other traditional concerns of Enterprise Architecture. Why? Because none of these are the real problem, they are merely symptoms of the problem. The real problem is enterprise complexity. This book discusses the nature of enterprise complexity, why complexity causes so many problems for IT, how complexity can be understood, and, most important, how complexity can be attacked.

Would you like to discuss some of the issues raised in Simple Architectures? This is as good as place as any!

Thursday, May 29, 2008

The Five Causes of IT Complexity

Complexity is a big problem for IT. Complex systems cost too much, fail too often, and usually do not meet basic business and architectural requirements.

I believe that all IT complexity can be traced to five causes. Eliminate the five root causes of complexity, and you can eliminate unnecessary complexity.

These root causes are as follows:

Partitioning Failures – that is, systems in which either data and/or functionality has been divided into subsets that do not represent true partitions.
Decompositional Failures – that is, systems that have not been decomposed into small enough subsets.
Recompositional Failures – that is, systems that have been decomposed into subsets that are too small and have not been recomposed appropriately.
Boundary Failures – that is, systems in which the boundaries between subsets are weak.
Synergistic Failures – that is, systems in which functionality and/or data placement does not pass the test for synergy.

I’m planning on exploring these five causes in my next ObjectWatch Newsletter, so if you are interested, stay tuned.

Can anybody think of a cause of IT complexity that is not covered above?

Tuesday, January 22, 2008

How Not To Make IRS Systems Secure

The GAO recently came out with a study (GAO-08-211) that highlights a problem that is frequently associated with IT complexity: poor security. The GAO, for those of you who are not familiar with it, is the United States Government Accounting Office. This is the organization charged with making sure that the U.S. Government is spending our tax dollars wisely.

In this report, the GAO severely chastised the Internal Revenue Service for “pervasive weaknesses” in the security of the IRS IT systems. According to this report, these weaknesses “continue to threaten the confidentiality and availability of IRS’s financial processing systems and information, and limit assurance of the integrity and reliability of its financial and taxpayer information.”

Now you might wonder why the IRS would do such a poor job with IT security. Surely the IRS is aware of the need for IT security!

The reason for the IRS problems is simple. The IRS systems are highly complex. And highly complex systems are notoriously difficult to make secure. Among the problems noted by the GAO:

The IRS does not limit user rights to only what is needed to perform specific job functions
The IRS does not encrypt sensitive data
The IRS does not effectively monitor changes on its mainframe.

This is ironic, given how important the IRS considers security. But the GAO finding is an excellent illustration of a point I make many times: controlling complexity is more important than controlling security. A system whose complexity has been controlled can be made secure relatively easily. A system whose complexity has not been controlled cannot be made secure, regardless of how much effort is expended.

Those readers familiar with my approach to controlling complexity (SIP) know that I advocate a form of mathematical partitioning to greatly reduce an IT system’s complexity. This process results in a number of sets of synergistic business functionality. I call these sets ABCs for autonomous business capabilities.

These sets represent a mathematical partition and that partition extends through the data ownership. Because of this, it is relatively easy to fix all three of the problems noted above with the IRS system.

For example, it is relatively easy to ensure that a given user need be given no more access rights than they need to complete a specific job, since they are given rights to business functionality, not to data.

It is relatively easy to encrypt data, since data moves between business functions only though well-defined messages and these messages are easily encrypted.

It is relatively easy to effectively monitor all changes made to the mainframe and associate those changes with specific business events and specific users, since any data is owned by specific ABCs and is never visible outside the ABC (except by messaging contracts).

The good news is that the IRS has responded to the GAO report by stating that it is addressing all of the issues raised. The bad news is that it is going about this in exactly the wrong way.

According to Linda E. Stiff, Acting Commissioner of the IRS, “…the IRS has obtained additional expert-level technical support to assist in the development of a comprehensive security analysis of the architecture, processes, and operations of the mainframe computing center complex in order to develop a roadmap and strategy to address several of the issues noted by the GAO in the report.”

In other words, the IRS is going to continue making the same mistakes that led to its current problems: worrying about security and ignoring the real problem, complexity.

This is unfortunate. It means that for the near term, we can expect the IRS systems to continue to be “unprotected from individuals and groups with malicious intent who can intrude and use their access to obtain sensitive information, commit fraud, disrupt operations, or launch attacks against other computer systems and networks,” as the GAO describes the IRS systems today.

Friday, January 4, 2008

Feedback on The Top Seven Mistakes CIOs Will Make in 2008

My article on The Top Seven Mistakes CIOs Will Make in 2008 drew many comments with excellent observations. Let me respond to what I have seen so far.

Mark Blomsma questions whether we can partition an enterprise architecture into subsets that do not influence each other. He points out that a small failure in one subset can lead to a massive failure in another subset.

Mark is very correct in that it is difficult to create subsets that do not influence each other. I call the interactions between subsets thin spots. Our job in enterprise architecture is to minimize these thin spots. Partitioning is based on the theoretical ideal that there is no interaction between subsets of enterprise functionality (no thin spots). Of course, we know that this ideal is not attainable. So we need to compromise, allowing interoperability between subsets while minimizing the thin spot degradations.

In general, I believe our best hope of minimizing thin spot degradation of an enterprise architecture comes from having interactions between subsets occur through software systems rather than through business processes. There are two reasons for this. First, the more we can automate workflow, the better off we are in general. Second, we have a better understanding of managing software thin spots than we do for managing business process thin spots. As one such example, see the post I did on data partitioning.

One of the primary tasks of the Enterprise Architect is boundary management, that is, on architecting the boundaries between subsets of the enterprise. The stronger the boundaries, the better the partition and the better we can manage overall complexity. In my upcoming book, Simple Architectures for Complex Enterprises, I have a whole chapter dedicated to managing software boundaries.

The problem the enterprise architect faces is that the boundaries between subsets are hard to define early in the architecture and under a never ending attack by good intentioned developers for the entire life cycle of the architecture. One of the best approaches I have found to boundary management is getting the enterprise as a whole to recognize the critical importance of complexity control. Once complexity control has been embraced, it is easier to get people to understand the role of strong boundaries in controlling complexity.

KCassio Macedo notes the importance of tying EA to business value.

This is a great point. An enterprise architecture that does not deliver business value is not worth doing. One of the advantages of partitioning an enterprise architecture is that we end up with autonomous subsets, that is, subsets of functionality that can be iteratively delivered. I call these subsets ABCs, for autonomous business capabilities. These ABCs are very compatible with Agile implementation approaches.

Partitioning is our best approach to managing complexity and mapping the enterprise into autonomous bite-size pieces that can be delivered in an agile and iterative manner.

I believe business value need to be measured not only in ROI (return on investment) but on TTV (time to value). TTV is a measure of how quickly the enterprise sees value in having undertaken the enterprise architecture exercise. Most organizations are initially skeptical of enterprise architecture, and the shorter you can keep the TTV, the sooner you can transform skepticism into support. This builds momentum and improves even more the chances of future success.

Mrinal Mitra notes the difficulty in partitioning the business process and points out that some partitioning might even require a business reorganization. This is an excellent point, partitioning does sometimes require business reorganization.

However, why are we doing an enterprise architecture in the first place? As KCassio Macedo points out in the last comment, enterprise architecture must be tied to business value.

There is rarely any business value in merely documenting what we have today. We are more often trying to find a better way of doing things. This is where the business value of an enterprise architecture comes in. A partitioned enterprise is less complex. A less complex enterprise is easier to understand. The easier we can understand our enterprise, the more effectively we can use IT to meet real business needs.

When I am analyzing an existing enterprise for problems, I usually don’t start by partitioning the enterprise, but instead, analyzing how well the enterprise is already partitioned. If it is already well partitioned, then there is little value in undertaking the partitioning exercise. If it isn’t, then we can usually find areas for improvement before we actually invest in the partitioning exercise.

Anonymous gives an analogy of barnacles as an accumulation of patches, reworks, and processes that gradually accumulate and, over time, reduce a system that may have been initially well designed into a morass of complexity.

This is an excellent point. It is not enough to create a simple architecture and then forget about it. Controlling complexity is a never ending job. It is a state of mind rather than a state of completion.

In physics, we have the Law of Entropy. The Law of Entropy states that all systems tend toward a maximal state of disorder unless energy is continually put into the system to control the disorder. This is just as true for enterprise and IT architectures as for other systems.

We have all seen IT systems that were initially well designed, and then a few years later, were a mess of what Anonymous calls barnacles. Enterprise architecture is not a goal that is attained and then forgotten. It is a continuing process. This is why governance is such an important topic in enterprise architecture. In my view, the most important job of the enterprise architecture group is understanding the complexity of the enterprise and advocating for its continual control.

Anonymous also points out that complex IT solutions often reflect poorly conceived business processes. Quite true. This dovetails nicely on Mrinal Mitra observation in the previous comment. And this is also why we need to see enterprise architecture as encompassing both business processes and IT systems.

Alracnirgen points out that the many people involved in projects, each with their own perspective and interpretation, is a major source of complexity.

This is another good point. There are enterprise architectural methodologies that focus specifically on addressing the issues of perspective, interpretation, and communications. Perspective, for example, is the main issue addressed by Zachman’s framework. Communications is the main issue addressed by VPEC-T.

My own belief is that we must first partition the enterprise, focusing exclusively on the goal of reducing complexity. Interpretation will still be an issue, but we can limit the disagreements about interpretation to only those that directly impact partitioning. Once we have a reasonable partition of subsets, then we can, within those autonomous regions, focus on perspective, interpretation, and communications in the broader picture.

Terry Young asks if the relationship between complexity and cost is linear. Terry’s experience is that “doubling the complexity more than doubles the cost to build and maintain the system.” This relationship between complexity and cost that Terry brings up is very important.

I agree with Terry that the relationship between complexity and cost often appears to be non-linear, however I believe that this is somewhat of an illusion. Let me explain.

I believe that the relationship between complexity and cost is, in fact, linear. However the relationship between functionality and complexity is non-linear. Adding a small amount of functionality to a system greatly increases the complexity of that system. And since complexity and cost are linear, adding a small amount of functionality to a system therefore greatly increases its cost. It is this exponential relationship between functionality and cost that I believe Terry is observing.

Just to give you one example, Cynthia Rettig wrote a paper in the MIT Sloan Management Review (The Trouble with Enterprise Software by Cynthia Rettig) in which she quoted studies showing that every 25% increase in the functionality of a software systems increase the complexity of the software system by 100%. In other words, adding 25% more functionality doubles the complexity of the software system.

Let’s assume that we start out with a system that has 100 different functions. Adding 25% more functionality to this would be equivalent to adding another 25 functions to the original 100. Rettig tells us that this 25% increase in functionality doubles the complexity, so the software with 125 functions is now twice as complex as the one with 100 functions. Another 25% increase in functionality gives us a total of 156 functions, with four times the original complexity. Another 25% increase in functionality gives us a total of 195 functions with eight times the original complexity.

This analysis shows us that by doubling the functionality in the system, we increase its complexity (and cost) by 800 per cent. If we continue this analysis, we discover that tripling the functionality of the original system increases its complexity by 126 times. In other words, system B with 3 times the functionality of system A is 126 times more complex than system A. And, since complexity and cost are linear, System B is also 126 time more expensive than system A.

But this is the pessimistic way of looking at things. The optimistic way is to note that by taking System B and partitioning into three sub-systems, all about equal in functionality to System A we reduce its complexity (and cost) to less than 3% of where it started.

If you are trying to explain the value of partitioning to your CEO, this argument is guaranteed to get his or her attention!

Wednesday, January 2, 2008

The Top Seven Mistakes CIOs Will Make in 2008

CIO Magazine recently interviewed 250 CIOs from a variety of organizations and asked what their top ten goals are for 2008. As I read this article, I realized that of the top ten goals, seven have virtually no hope of being attained. Why is this?

In my January ObjectWatch Newsletter, I discussed why so many CIOs are heading down a path of failure in the coming year. The article is available without cost or registration at the ObjectWatch Web Site. Feel free to comment on the article here.

Sunday, December 30, 2007

Data Partitioning

Those who have been reading my newsletters and white papers know I believe that complexity in IT systems and business processes is best thought of as the amount of disorder in the system in question.

The best way to control this disorder is to partition the system into a small number of subsets. Mathematically, partitioning can be shown to have a huge impact on the amount of disorder in a system, and therefore, its complexity. For those interested in more information on this topic, see my briefing papers on SIP (Simple Iterative Partitions) at the ObjectWatch web site.

Recently, one of my readers sent me an email asking about my philosophy about data sharing. He asked

I understand the general philosophy [of partitioning] but am not sure how that would work at the implementation level for data(say database). Does your solution mean that HR database tables can never participate in queries with say planning module database tables?

In general, data sharing can be represented by the following diagram:

It should be pretty obvious from this diagram that data sharing severly breaks down the boundaries that logically separate applications. If application A decides to change how it stores data, application B will be broken.

Anytime we destabilize the boundaries between two applications, we add tremendously to the overall disorder of the system. Just a little bit of destabilization adds a huge amount to the disorder.

Some organizations attempt to deal with this by creating a common data access layer, but this doesn't significantly change the problem. If the implementation of A changes so that it no longer uses the common data access layer, application B is broken.

In order to ensure strong application boundaries (and therefore minimal system complexity), we take all possible steps to minimize dependencies between our applications. The best way to accomplish this is to insist that all interactions between applications go through intermediaries.

If application A wants to ask application B for some information, it makes the request through an intermediary that wraps A. That intermediary makes the request to an intermediary that wraps B. The intermediary on the requesting side is called an "envoy" and the intermediary on the receiving side is called a "guard".

How the envoy and guard send and receive requests is not the problem of either application, but the most common way of doing so is through the use of messages that are defined by the general family of standards known as web services.

Notice how this protects the two applications from implementation changes on the other side. If the implementation of A changes, it is likely that A's envoy might also need to change. But nothing on B's side needs to change.

In this architecture, we have strong boundaries between the two applications, and these boundaries serve to control the over all disorder of the system .

But still, sometimes application A needs access to application B's data. What do we do then?

There are two possibilities. The first is that application A can ask application B for the data it needs (using, of course, intermediaries). The second is that a third application can be created that manages the shared knowledge of the organization. Let's call this application the KnowledgeManager. When any application makes changes to data that is considered part of this shared knowledge, that application is responsible for letting the KnowlegeManager know about those changes (using intermediaries, of course).

Some may argue that this architecture is inefficient. Why not just go directly to the database for what you want? The answer has to do with our ultimate goals. I believe that our most important goal is controlling IT complexity. Controlling IT complexity is far more important that a few milliseconds of performance.

Complexity is expensive. Disk access is cheap.