Simple Architectures for Complex Enterprises: January 2008

The GAO recently came out with a study (GAO-08-211) that highlights a problem that is frequently associated with IT complexity: poor security. The GAO, for those of you who are not familiar with it, is the United States Government Accounting Office. This is the organization charged with making sure that the U.S. Government is spending our tax dollars wisely.

In this report, the GAO severely chastised the Internal Revenue Service for “pervasive weaknesses” in the security of the IRS IT systems. According to this report, these weaknesses “continue to threaten the confidentiality and availability of IRS’s financial processing systems and information, and limit assurance of the integrity and reliability of its financial and taxpayer information.”

Now you might wonder why the IRS would do such a poor job with IT security. Surely the IRS is aware of the need for IT security!

The reason for the IRS problems is simple. The IRS systems are highly complex. And highly complex systems are notoriously difficult to make secure. Among the problems noted by the GAO:

The IRS does not limit user rights to only what is needed to perform specific job functions
The IRS does not encrypt sensitive data
The IRS does not effectively monitor changes on its mainframe.

This is ironic, given how important the IRS considers security. But the GAO finding is an excellent illustration of a point I make many times: controlling complexity is more important than controlling security. A system whose complexity has been controlled can be made secure relatively easily. A system whose complexity has not been controlled cannot be made secure, regardless of how much effort is expended.

Those readers familiar with my approach to controlling complexity (SIP) know that I advocate a form of mathematical partitioning to greatly reduce an IT system’s complexity. This process results in a number of sets of synergistic business functionality. I call these sets ABCs for autonomous business capabilities.

These sets represent a mathematical partition and that partition extends through the data ownership. Because of this, it is relatively easy to fix all three of the problems noted above with the IRS system.

For example, it is relatively easy to ensure that a given user need be given no more access rights than they need to complete a specific job, since they are given rights to business functionality, not to data.

It is relatively easy to encrypt data, since data moves between business functions only though well-defined messages and these messages are easily encrypted.

It is relatively easy to effectively monitor all changes made to the mainframe and associate those changes with specific business events and specific users, since any data is owned by specific ABCs and is never visible outside the ABC (except by messaging contracts).

The good news is that the IRS has responded to the GAO report by stating that it is addressing all of the issues raised. The bad news is that it is going about this in exactly the wrong way.

According to Linda E. Stiff, Acting Commissioner of the IRS, “…the IRS has obtained additional expert-level technical support to assist in the development of a comprehensive security analysis of the architecture, processes, and operations of the mainframe computing center complex in order to develop a roadmap and strategy to address several of the issues noted by the GAO in the report.”

In other words, the IRS is going to continue making the same mistakes that led to its current problems: worrying about security and ignoring the real problem, complexity.

This is unfortunate. It means that for the near term, we can expect the IRS systems to continue to be “unprotected from individuals and groups with malicious intent who can intrude and use their access to obtain sensitive information, commit fraud, disrupt operations, or launch attacks against other computer systems and networks,” as the GAO describes the IRS systems today.

My article on The Top Seven Mistakes CIOs Will Make in 2008 drew many comments with excellent observations. Let me respond to what I have seen so far.

Mark Blomsma questions whether we can partition an enterprise architecture into subsets that do not influence each other. He points out that a small failure in one subset can lead to a massive failure in another subset.

Mark is very correct in that it is difficult to create subsets that do not influence each other. I call the interactions between subsets thin spots. Our job in enterprise architecture is to minimize these thin spots. Partitioning is based on the theoretical ideal that there is no interaction between subsets of enterprise functionality (no thin spots). Of course, we know that this ideal is not attainable. So we need to compromise, allowing interoperability between subsets while minimizing the thin spot degradations.

In general, I believe our best hope of minimizing thin spot degradation of an enterprise architecture comes from having interactions between subsets occur through software systems rather than through business processes. There are two reasons for this. First, the more we can automate workflow, the better off we are in general. Second, we have a better understanding of managing software thin spots than we do for managing business process thin spots. As one such example, see the post I did on data partitioning.

One of the primary tasks of the Enterprise Architect is boundary management, that is, on architecting the boundaries between subsets of the enterprise. The stronger the boundaries, the better the partition and the better we can manage overall complexity. In my upcoming book, Simple Architectures for Complex Enterprises, I have a whole chapter dedicated to managing software boundaries.

The problem the enterprise architect faces is that the boundaries between subsets are hard to define early in the architecture and under a never ending attack by good intentioned developers for the entire life cycle of the architecture. One of the best approaches I have found to boundary management is getting the enterprise as a whole to recognize the critical importance of complexity control. Once complexity control has been embraced, it is easier to get people to understand the role of strong boundaries in controlling complexity.

KCassio Macedo notes the importance of tying EA to business value.

This is a great point. An enterprise architecture that does not deliver business value is not worth doing. One of the advantages of partitioning an enterprise architecture is that we end up with autonomous subsets, that is, subsets of functionality that can be iteratively delivered. I call these subsets ABCs, for autonomous business capabilities. These ABCs are very compatible with Agile implementation approaches.

Partitioning is our best approach to managing complexity and mapping the enterprise into autonomous bite-size pieces that can be delivered in an agile and iterative manner.

I believe business value need to be measured not only in ROI (return on investment) but on TTV (time to value). TTV is a measure of how quickly the enterprise sees value in having undertaken the enterprise architecture exercise. Most organizations are initially skeptical of enterprise architecture, and the shorter you can keep the TTV, the sooner you can transform skepticism into support. This builds momentum and improves even more the chances of future success.

Mrinal Mitra notes the difficulty in partitioning the business process and points out that some partitioning might even require a business reorganization. This is an excellent point, partitioning does sometimes require business reorganization.

However, why are we doing an enterprise architecture in the first place? As KCassio Macedo points out in the last comment, enterprise architecture must be tied to business value.

There is rarely any business value in merely documenting what we have today. We are more often trying to find a better way of doing things. This is where the business value of an enterprise architecture comes in. A partitioned enterprise is less complex. A less complex enterprise is easier to understand. The easier we can understand our enterprise, the more effectively we can use IT to meet real business needs.

When I am analyzing an existing enterprise for problems, I usually don’t start by partitioning the enterprise, but instead, analyzing how well the enterprise is already partitioned. If it is already well partitioned, then there is little value in undertaking the partitioning exercise. If it isn’t, then we can usually find areas for improvement before we actually invest in the partitioning exercise.

Anonymous gives an analogy of barnacles as an accumulation of patches, reworks, and processes that gradually accumulate and, over time, reduce a system that may have been initially well designed into a morass of complexity.

This is an excellent point. It is not enough to create a simple architecture and then forget about it. Controlling complexity is a never ending job. It is a state of mind rather than a state of completion.

In physics, we have the Law of Entropy. The Law of Entropy states that all systems tend toward a maximal state of disorder unless energy is continually put into the system to control the disorder. This is just as true for enterprise and IT architectures as for other systems.

We have all seen IT systems that were initially well designed, and then a few years later, were a mess of what Anonymous calls barnacles. Enterprise architecture is not a goal that is attained and then forgotten. It is a continuing process. This is why governance is such an important topic in enterprise architecture. In my view, the most important job of the enterprise architecture group is understanding the complexity of the enterprise and advocating for its continual control.

Anonymous also points out that complex IT solutions often reflect poorly conceived business processes. Quite true. This dovetails nicely on Mrinal Mitra observation in the previous comment. And this is also why we need to see enterprise architecture as encompassing both business processes and IT systems.

Alracnirgen points out that the many people involved in projects, each with their own perspective and interpretation, is a major source of complexity.

This is another good point. There are enterprise architectural methodologies that focus specifically on addressing the issues of perspective, interpretation, and communications. Perspective, for example, is the main issue addressed by Zachman’s framework. Communications is the main issue addressed by VPEC-T.

My own belief is that we must first partition the enterprise, focusing exclusively on the goal of reducing complexity. Interpretation will still be an issue, but we can limit the disagreements about interpretation to only those that directly impact partitioning. Once we have a reasonable partition of subsets, then we can, within those autonomous regions, focus on perspective, interpretation, and communications in the broader picture.

Terry Young asks if the relationship between complexity and cost is linear. Terry’s experience is that “doubling the complexity more than doubles the cost to build and maintain the system.” This relationship between complexity and cost that Terry brings up is very important.

I agree with Terry that the relationship between complexity and cost often appears to be non-linear, however I believe that this is somewhat of an illusion. Let me explain.

I believe that the relationship between complexity and cost is, in fact, linear. However the relationship between functionality and complexity is non-linear. Adding a small amount of functionality to a system greatly increases the complexity of that system. And since complexity and cost are linear, adding a small amount of functionality to a system therefore greatly increases its cost. It is this exponential relationship between functionality and cost that I believe Terry is observing.

Just to give you one example, Cynthia Rettig wrote a paper in the MIT Sloan Management Review (The Trouble with Enterprise Software by Cynthia Rettig) in which she quoted studies showing that every 25% increase in the functionality of a software systems increase the complexity of the software system by 100%. In other words, adding 25% more functionality doubles the complexity of the software system.

Let’s assume that we start out with a system that has 100 different functions. Adding 25% more functionality to this would be equivalent to adding another 25 functions to the original 100. Rettig tells us that this 25% increase in functionality doubles the complexity, so the software with 125 functions is now twice as complex as the one with 100 functions. Another 25% increase in functionality gives us a total of 156 functions, with four times the original complexity. Another 25% increase in functionality gives us a total of 195 functions with eight times the original complexity.

This analysis shows us that by doubling the functionality in the system, we increase its complexity (and cost) by 800 per cent. If we continue this analysis, we discover that tripling the functionality of the original system increases its complexity by 126 times. In other words, system B with 3 times the functionality of system A is 126 times more complex than system A. And, since complexity and cost are linear, System B is also 126 time more expensive than system A.

But this is the pessimistic way of looking at things. The optimistic way is to note that by taking System B and partitioning into three sub-systems, all about equal in functionality to System A we reduce its complexity (and cost) to less than 3% of where it started.

If you are trying to explain the value of partitioning to your CEO, this argument is guaranteed to get his or her attention!

Simple Architectures for Complex Enterprises

Pages

Tuesday, January 22, 2008

How Not To Make IRS Systems Secure

Friday, January 4, 2008

Feedback on The Top Seven Mistakes CIOs Will Make in 2008

Wednesday, January 2, 2008

The Top Seven Mistakes CIOs Will Make in 2008

Complexity Links

About Me

Blog Archive