Tuesday, July 30, 2013

Gartner: Complexity Reduction One of Top Ten Technical Trends



Gartner recently came out with their Top Ten Strategic Technical Trends Through 2015. Gartner states that one of the Top Ten Trends is IT Complexity. IT Complexity, Gartner says, is a major inhibitor “of an enterprise to get the most out of IT money spent.”

I have been writing for the last ten years about the need to reduce complexity and the bottom-line benefits of doing so. Gartner quotes one of my early papers, The Mathematics of IT Simplification. In this paper I introduced a modern, rational understanding of IT Complexity.

Until this white paper appeared, IT Complexity was largely ignored by Enterprise Architects. This is despite the fact that IT Complexity was responsible for numerous IT disasters and was costing the world economy trillions of dollars per year. IT Complexity was still treated in a random, ad hoc manner if it was treated at all.

This White Paper laid a new foundation for the science of IT Complexity. This paper showed how to model IT complexity, how to measure it, and how to manage it using verifiable, reproducible methodologies that are based on solid mathematics.

If we want to take IT Complexity as seriously as Gartner says we should, we need to start by understanding it. This paper is where it all began. Gartner read it. You should too.

You can find the White Paper, The Mathematics of IT Simplification [here].
You can view the Gartner presentation, The Top 10 Strategic Technology Trends for 2013 [here].
You can view a 20 minute overview of The Snowman Architecture, my answer to IT Complexity, [here]

Photo by Alan in Belfast, via Flickr and Creative Commons.

Friday, July 5, 2013

The 3000 Year Old IT Problem


It was first described 3000 years ago by Sun Tzu in his timeless book, The Art of War.

We can form a single united body, while the enemy must split up into fractions. Hence there will be a whole pitted against separate parts of a whole, which means that we shall be many to the enemy's few.1

This is the first description of a problem solving technique that would be named 600 years later by Julius Caesar as divide et impera. Or, as we know it, divide and conquer.

The Art of War is frequently named as one of the top ten must reads in business2. I have long been fascinated by The Art of War and especially the divide and conquer strategy. Back in the days when I was a researcher at the National Cancer Institute, divide and conquer was a common research strategy for testing drugs. Divide and conquer is used extensively in business, economics, legal, politics, and social sciences, to name a few.

Oddly enough, the only field I can think of that divide and conquer has not been used successfully is IT. Whereas virtually every other field has been able to solve large complex problems using a divide and conquer strategy, IT is alone in its failure. Try to divide and conquer a large IT system, and you get an interconnected web of small systems that have so many interdependencies that is practically impossible to coordinate their implementation.

This is the reason so many large IT systems are swamped by IT complexity3. The one problem solving strategy that universally works against complexity seems to be inexplicably inept when it comes to IT.

How can this be? What is so special about IT that makes it unable to apply this strategy that has found such widespread applicability?

There are only two possible answers. The first is that IT is completely different from every other field of human endeavor. The second is that IT doesn't understand how divide and conquer works. I think the second explanation is more likely.

To be fair, it is not all IT's fault. Some of the blame must be shared by Julius Caesar. He is the one, remember, who came up with the misleading name divide and conquer (or, as he said it, divide et impera.) Unfortunately, IT has taken this name all too literally.

What is the problem with the name divide and conquer? The name seems to imply that we conquer big problems by breaking them down into smaller problems. In fact, this is not really what we do. What we do do is learn to recognize the natural divisions that exist within the larger problem. So we aren't really dividing as much as we are observing.

Let's take a simple example of divide and conquer: delivering mail.

The U.S. Postal service delivers around 600 million letters each day. Any address can send a letter to any other address. The number of possible permutations of paths for any given letter are astronomical. So how has the Postal Service simplified this problem?

They have observed the natural boundaries and population densities in different areas of the country and divided the entire country up into about 43,000 geographic chunks of roughly equal postal load. Then they have assigned a unique zip code to each chunk.

In the following picture, you can see a zip code map for part of New York City.
Partial Zip Code Map for New York City

The areas that are assigned to zip codes are not equal sized. They vary depending on the population density. The area that includes Rockefeller Center (10020) is very dense, so the area is small. The area where I grew up (10011) has medium density and the area assigned to the zip code is consequently average sized.

If we blow up my home zip code we can see some other features of the system. Here is 10011 enlarged:
Zip Code 10011
You can see that the zip code boundary makes some interesting zigs and zags. For example, the left most boundary winds around the piers of the Hudson River. Towards the bottom, the boundary takes a sharp turn to the South about half way through the southern boundary. This is to follow Greenwich Avenue. On the right side, the boundary follows Fifth Avenue for quite a while until we hit the relatively chaotic Flatiron district at 20th street.

So zip codes are not randomly assigned. They take into account population densities, street layouts, and natural boundaries. The main point here is that a lot of observation takes place before the first zip code is assigned.

Suppose we created the zip code map by simply overlaying a regular grid on top of New York City. We would end up with zip codes in the middle of the Hudson River, zip codes darting back and forth across Fifth Avenue, and some zip codes with huge population densities while others would consist of only a few bored pigeons.

You can see why the so-called divide and conquer algorithm is probably better named observe and conquer.

The general rule of thumb with observe and conquer is that your ability to solve a large complex problem is highly dependent on your ability to observe the natural boundaries that define the sub problems.

Let's consider one other example, this time from warfare.

In about 50 BCE, Julius Caesar and 60,000 troops completed the defeat of Gaul, a region consisting of at least 300,000 troops. How did Caesar conquer 300,000 troops with one fifth that number? He did it through divide and conquer. Although the Gallic strength was 300,000 in total, this number was divided into a number of tribes that had a history of belligerence among themselves. Caesar was able to form alliances with some tribes, pit other tribes against each other, and pick off remaining tribes one by one. But he was only able to do this through carefully observing and understanding the "natural" political tribal boundaries. Had Caesar laid an arbitrary grid over Gaul and attempted to conquer one square at a time, he could never have been successful.

This brings us to the reason that divide and conquer has been so dismally unsuccessful in IT. Because IT hasn't understood that observation must occur before division.

If you don't believe me, try this simple experiment. The next time an IT person, say John,  suggests breaking up a large IT project into smaller pieces, ask John this question: what observations do we need to make to determine the natural boundaries of the smaller pieces? I can pretty much guarantee how John will respond: with a blank look. Not only will John not be able to answer the question, the chances are he will have no idea what you are talking about.

With such a result, is it any wonder that divide and conquer almost never works in IT? It is as if you assigned zip codes by throwing blobs of paint at a map of New York City and assigning all of the red blobs to one zip code and all of the green blobs to another.

For the first time in the history of IT, we now have a solution to this problem; a scientific, reproducible, and verifiable approach to gathering the observations necessary to make divide and conquer work. I call this approach synergistic partitioning. It is the basis for the IT architectural approach called The Snowman Practice. And if you are ever going to try to build a multi-million dollar IT system, Snowmen is where you need to start. I can assure you, it will work a lot better than throwing blobs of paint at the wall.

You can start reading about The Snowman Practice [here]. Or contact me [here]. What do Snowmen have to do with defining divide and conquer boundaries? Ask me. I'll be glad to tell you.

Subscription Information

Your one stop signup for information any new white papers, blogs, webcasts, or speaking engagements by Roger Sessions and The Snowman Methodology is [here].

References

(1) 6:14, translation from http://suntzusaid.com
(2) See, for example, Inc.'s list [here].
(3) I have written about the relationship between complexity and failure rates in a number of places, for example, my Web Short The Relationship Between IT Project Size and Failure available [here].

Legal Notices and Acknowledgements

Photograph from brainrotting on Flickr via Creative Commons.

This blog and all of these blogs are Copyright 2013 by Roger Sessions. This blog may be copied and reproduced as long as it is not altered in any way and that full attribution is given. 

Tuesday, May 21, 2013

SIP and TOGAF: A Natural Partnership

If you have ever heard me talk, you know the standard three-tier architecture is a bad approach for large complex IT projects. It has poor security. It is difficult to modify. It is highly likely to fail.

Instead, I recommend the Snowman Architecture. This architecture is great for large complex IT projects. It has excellent security. It is easy to modify. And it is highly likely to be delivered on time, on budget, and with all of the functionality the business needs.

I often show pictures contrasting these two different architectures, such as this:


The classic three tier architecture is generated by standard IT methodologies, such as TOGAF1. The Snowman Architecture is generated by the methodology known as SIP (Snowman Identification Process.)

From this discussion you might assume that traditional methodologies such as TOGAF are in conflict with SIP. But this is not the case. The ideal architecture comes not from SIP instead of TOGAF, but from an intimate dance between SIP and TOGAF2.

To see how this dance works, let me take you through the steps needed to define a Snowman Architecture and point out where SIP takes the lead and where TOGAF takes the lead.

The starting point for a Snowman Architecture is identifying the business functions. These are the smallest granular functions that are still recognizable to the business analysts. SIP drives the identification of these business functions.

The second step is the partitioning of these business functions into closely related groupings. We call these groupings synergy groups. SIP includes very precise mathematically based algorithms for identifying these synergy groups and it is critical to the project that this partitioning be done correctly.

The third step is the definition of the strong vertical boundaries that will separate the snowmen from each other. This again is SIP functionality. And now SIP takes a break and TOGAF takes over.

To review, these first three SIP lead steps are shown here:


Next we gather the requirements for the business architecture. The business architecture is the head of the snowman. From the SIP analysis, we know which functions are in each head, but we don't know much about their requirements. TOGAF has good processes in place for requirements gathering.

The fifth step is defining the technical architecture. TOGAF has been doing this for years, so SIP has nothing to add here.

The sixth step is defining the data architecture. Again, TOGAF does a great job. We don't need SIP for this.

These three steps, then, are the heart of the TOGAF contributions and are shown here:



The final two steps (seven and eight) involve tying the Snowmen together. First we define the business dependencies between synergy groups. While TOGAF has some information about how to define business dependencies, the SIP defined synergy groups are critical here because they greatly reduce the number of dependencies that will need to be implemented.

Once the business dependencies have been identified, then we are back to TOGAF to translate these business dependencies into a service oriented architecture. These last two steps are shown here:


As you can see, SIP and TOGAF complement each other. SIP depends on TOGAF to fill in the SIP defined Snowman structure. TOGAF depends on SIP to scale up to projects larger than one million dollars. Put the two together and we have a highly effective solution to building large, complex IT systems. The best of all worlds.

Footnotes

(1) TOGAF stands for The Open Group Architectural Framework. It is a trademark of the OMG.

(2) While SIP works well with TOGAF, SIP is  agnostic about which methodology is used to drive the non-SIP parts of the analysis. If your  organization is using some other architectural methodology rather than TOGAF, you can replace TOGAF in these diagrams and discussions by your favorite methodology.

Tuesday, December 4, 2012

The Snowman Architecture Part Three: The Technical Benefits


This is the third part of a four part blog.about the Snowman Architecture. The first part was The Snowman Architecture: An Overview. The second part was The Snowman Architecture: The Economic Benefits. In this part, I will be discussing the technical benefits of the Snowman Architecture. But don't read this until you have read the overview! And if you care about things like ROI, check out part two.

But whatever you do, don't miss this part! Over the next twelve pages or so (yes, I know it's a bit long) I'm going to take you through fourteen of the most compelling technical reasons why the Snowman Architecture is a huge improvement over today's approaches to large IT. You are now reading nothing less than my declaration of war on traditional IT methodologies.The first snowball has now been fired!

Review

I gave an overview of the Snowman Architecture in part one, but let's review briefly.

The Snowman Architecture breaks down a large IT system into small vertically partitioned subsystems called Snowmen. These snowmen interact with each other through asynchronous messages. Snowmen are designed to be as autonomous as possible using a design methodology known as Simple Iterative Partitions1 (SIP).

Snowmen come in three layers. The head of the snowman consists of the business functions that make up a capability. The torso of the snowman consists of the technical systems that support those business functions. The bottom of the snowman consists of the data that is used by those technical systems. Each of these layers is strongly partitioned based on the business functions that make up the head.

Snowmen reach out to each other through their arms, the asynchronous messaging system. Often this is implemented as an SOA.


Snowmen reach out to each other through their arms.

Contrast to Traditional Architectures

A traditional IT architecture is also implemented in three layers. These layers are the same as those of the Snowman Architecture: business architecture, technical architecture, and data architecture. What differentiates the Snowman Architecture from a traditional IT architecture is the strong vertical partitioning, as shown in Figure 1.

Figure 1. Traditional IT Architecture vs. Snowman Architecture

It turns out that this strong vertical partitioning has a major impact on the effectiveness of the architecture. Let's take a look at fourteen key non-functional attributes of a large IT system. As you will see, every single one of them is improved by the strong vertical partitioning that characterizes the Snowman Architecture.

In this analysis, I assume that the system we are evaluating is a large (greater than ten million dollar) system. This is the point at which traditional IT architectural methodologies are no longer able to keep up with the exponential increases in system complexity2. I also assume that SIP was used to assign business functions to the head of the snowman, an essential step to minimizing the overall complexity of the Snowman Architecture.

Okay, given these two assumptions, let's see why the Snowman Architecture outperforms all traditional approaches to large IT system design. I'll start by listing the fourteen attributes and then go through them one by one. The attributes I will look at are these:
  • Business Alignment 
  • Regulatory Compliance 
  • Auditing 
  • Security
  • Agile Friendliness 
  • Maintainability 
  • Testability 
  • Reliability 
  • Recovery 
  • Throughput 
  • Scalability 
  • Flexibility 
  • Cloud Effectiveness 
  • Vendor Lock in
You might as well do a quick check-point. Are any of these attributes important to your IT systems? If not, you might as well stop reading now. If one or more of these are of interest then keep reading.

Okay, now let's go through them one by one. Feel free to skip those you don't care about.

Business Alignment

A system is well aligned when it meets the needs of the business. You can think of business alignment as the Wow factor. When the system is delivered, does the business say, "Wow!" Or does it shake its collective head and reach for the nearest bottle of Tequila?

In any system design life cycle, there is a phase in which the business requirements are gathered. In the traditional approach (the left hand side of Figure 1) the requirements are gathered more or less immediately after the project has been approved and before the technical architecture is designed.

The size of the requirements document(s) is always proportional to the size of the project. Massive projects require lots of requirements documentation, often tens of thousands of pages. The larger the stack of requirements, the lower the chances are that those requirements accurately reflect what the business actually needs.



In SIP (the guiding design methodology for the Snowman Architecture) an additional project phase is introduced: the Partitioning Phase. This is when the basic shape of the Snowman is identified.  

This is when the basic shape of the Snowman is identified.
What is important from an alignment perspective is that this partitioning of the larger system into smaller autonomous snowmen takes place before the requirements have been gathered. Since a typical snowman rarely exceeds one million dollars in cost, it's requirements are modest. Since the requirements are relatively modest, it is more likely that those requirements accurately reflect the actual business need.

Since the Snowman Architecture has more accurate requirements than the traditional IT architecture, it is likely to actually meet the business need.

Regulatory Compliance 

An IT system is considered compliant when it can be shown to operate within the constraints of regulatory laws and regulations. Some enterprises such as the video gaming industry have few if any restraints. Others, such as financial organizations, have a complex web of laws and regulations. 

Our ability to show that a given IT system operates within its regulatory constrains is dependent on the complexity of the system. The more complex the system, the more difficult it is to prove compliance.

Large traditional IT systems (the left side of Figure 1.)  have a highly complex web of relationships between business functions, technical processes, and data. Thus it is very difficult to prove compliance.

The Snowman Architecture (the right side of Figure 1.) has a number of simple relationships between business functions, technical processes, and data. The architectural simplicity is guaranteed by the SIP directed partitioning of the business functions into snowmen heads and the strong vertical partitioning that occurs once the technical and data architectures are created.

Because each snowman is relatively simple, it is correspondingly easy to prove that it operates in compliance with any relevant regulatory constrains.

Auditing 

An IT system is considered auditable when we can accurately trace data changes back to technical processes, from there back to business functions, and from there back to human beings. The more paths there are to the data, the more difficult it is to trace these paths. 

The traditional IT architecture has a large number of complex paths to the data. It is nearly impossible for an examiner to determine which of these paths resulted in a particular item of data being updated .

The Snowman Architecture has a small number of simple paths to the data. An examiner can easily get the whole picture and then determine which of these few candidate paths resulted in a particular item of data being updated. 

An examiner can easily get the whole picture.

Security

We when talk about the security of a system, we are generally talking about our ability to protect data. Data, of course, resides in a database. So security comes down to our ability to configure the database so that unauthorized updates are not possible. In a traditional architecture, there are so many processes that need to update so many parts of the database under so many different circumstances that it is difficult to figure out a secure configuration. And even if one does manage a secure configuration, the next process that is added will change everything.

In the Snowman Architecture, database configuration is much easier. Because the partitioning of the head of the snowman (the business processes) dictates the partitioning of the technical layer and then the partitioning of the data layer, the only processes that will ever need to access the data in the snowman are the processes in the torso of the snowman. This makes configuration easy: allow the processes in the snowman torso to access the data in the snowman bottom and don't allow any process outside the snowman to access any data in the snowman. Viola. Done.

Agile Friendliness 

Many organizations are attracted to Agile Development Methodologies. I agree that Agile development has a lot of promise. However I also think it doesn't scale. 

A recent paper by Vikash Lalsing et al.3 indicates that Agile projects of 0.5 person years or less are excellent candidates for Agile development. They predict such projects will be less than 10% over budget. By the time the project  size reaches 3.6 person years, the budget overrun increases to 18%. And by the time the project size reaches 8.2 person years, the budget overrun increases to 66%.

The Snowman Architecture is ideally suited to projects of greater than $10M. This equates to an effort of close to 100 person years. This is more than ten times the project size that yielded the 66% budget overrun. 

In the Snowman Architecture, the larger project is broken down into relatively simple, autonomous chunks of project work. Each of these chunks becomes an individual snowman. The use of the SIP methodology ensures that not only is each snowman as simple as possible, but the relationships between snowmen are as simple as possible. 

The project size of any one snowmen is unlikely to exceed $1M and in many cases will be much less. A $1M project is around 7 person years. This is still large by Agile standards, but far closer to a workable agile number than a project that does not have the benefit of the Snowman Architecture.

Maintainability 

A system is maintainable when it is easy to locate the source of bugs. The more complex the system, the more difficult it is to find the source of bugs.

The complexity of the traditional architecture (the left side of Figure 1.) is much higher than the complexity of the Snowman Architecture. The maintainability of a traditional architecture is therefore much lower. The use of the SIP methodology guarantees that not only is the overall complexity of the Snowman Architecture low, it is as low as it can possibly be4. Simplicity is important when it comes to snowmen. A simple snowman is simple to maintain.

A simple snowman is simple to maintain.

Testability 

System bugs can manifest themselves at any point in the system life cycle. The later in the life cycle the bug is manifest, the more problems it causes. Our goal in system testing is to find bugs in the system as early as possible and definitely before the system is delivered to customers. The most common strategy for system testing is to write code or scripts that exercise the system and ensure it is working correctly. 

To be sure a system is working correctly you must write test code that exercises every possible logical path through the system. The more logical paths there are through the system, the more difficult it will be to create the test code and the more likely it will be that you will have missed an important path. This translates to a greater likelihood that you will ship buggy code.

There are two reasons the Snowman Architecture is more testable than the traditional architecture.  

The first reason the Snowman Architecture is more testable has to do with the number of paths. 

A traditional IT architecture has many possible paths. By the time the system reaches a few million dollars in size, it effectively has an infinite number of paths and there is no way they can all be tested.

In the Snowman Architecture, each snowman can be tested independently. Since each snowman is relatively small and simple, there are relatively few paths through the snowman. Once you have tested all of the snowman and the connections between them, you have effectively tested the system as a whole. Thus your chances of shipping buggy code are greatly reduced if you are using the Snowman Architecture.

The second reason the Snowman Architecture is more testable has to do with how pieces of the system are connected together. In a traditional IT architecture, segments of code are often connected by shared data in a database. In the Snowman Architecture, snowmen are almost always connected through asynchronous messages. 

These two approaches to connections are very different from a testability perspective. Shared data connections are almost impossible to test. There are just too many ways the data can be accessed. Asynchronous messages, in contrast, are very easy to test. One need only write a messaging harness, a common practice among service-oriented architectures, and the connection points become easily tested. 

So we see two reasons the Snowman Architecture is so easier to test than the traditional IT architecture. First, it has fewer code paths. Second, it uses asynchronous messages for its connection points. It is hard to test a traditional IT architecture. It is easy to test a Snowman Architecture.

It is easy to test a Snowman Architecture.

Reliability 

Reliability is a measure of the typical amount of time a system will remain running before it unexpectedly drops dead. Reliability is often described as mean time between failures. 

Reliability is related to testability, the last attribute I discussed. The more testable a system is, the less likely it is to have post-delivery bugs. It is these post-delivery bugs that cause systems to fail. The fewer bugs, the less likely the system is to fail. Since the Snowman Architecture is easier to test than the traditional IT architecture, is is likely to have fewer bugs and thus will be less likely to fail.

But there is another factor that favors reliability of the Snowman Architecture. This has to do with how easy it is to quarantine a bug. Consider Figure 2, which is a blowup of the left hand side of Figure 1 with some labels added for reference.

Figure 2. Blow-up of Traditional IT Architecture.

Assume that database D crashes. We have three processes dependent on D, namely, n, o, and p. So these three processes crash. Processes g and i are both dependent on n, so they both go down. Process i is also dependent on o, but since it has already crashed, we need not worry about it further. Processes i, d, and f are all dependent on p. Process i is already down, but now d and f join the fun. So now we have D, n, o, p, i, d, and f all down. This can corrupt any databases they are involved with which includes C and E. This brings down their dependent processes b and h. Which in turn... you get the picture. There is no quarantine, so when one part of a system catches a bug, that bug can rapidly propagate to the entire system.

Contrast this to Figure 3., which shows a closeup of the Snowman Architecture.

Figure 3. Closeup of Snowman Architecture

Assume in Figure 3. that database B crashes. It can bring down processes d, e, f, and g. But that's it. The boundaries of the snowman have effectively quarantined the bug from spreading further. The only connections between d, e, f, g and other processes are through asynchronous messages, and these channels can easily be protected. So while the bug in B may crash the entire snowman, there is no pathway for the bug to spread further.

The bottom line is that bugs occur less frequently with the Snowman Architecture (because it is easier to test) and when they do occur they tend to have only a local impact. In the traditional architecture, bugs occur more frequently (because it is harder to test) and when they do occur, they tend to have a global impact.

Recovery 

Recovery is related to reliability (the last section.) Whereas reliability measures how often the system fails, recovery measures how long the failure lasts. In an ideal system, we have high reliability and fast recovery, meaning that the system rarely crashes and when it does, the crash doesn't last long.

It is difficult to develop an effective recovery strategy for a traditional IT architecture. There are too many databases, too many processes, and too many ways everything can be related to each other. When this web of relationships goes down, what do you do? You try to protect the entire system but this is difficult because the system is a large, it is complex, and it is a moving target. 

In contrast, it is easy to develop an effective recovery strategy for a Snowman architecture. All you need to do is shadow any requests to the snowman to a backup snowman. Then if a failure occurs, reroute all new requests to the backup. This rerouting can occur as quickly as one can notice that the primary snowman has failed. This is shown if Figure 4.

Figure 4. Recovery Mechanism for Snowman
Taking this and the last two sections together, I can make the following claims about the Snowman Architecture relative to the traditional IT architecture:
  • The Snowman Architecture will have fewer bugs.
  • The bugs will have less impact.
  • Recovery from that impact will be faster.

Throughput 

Throughput refers to the amount of work a system can process in a unit of time. Often we measure throughput in transactions per minute. Throughput should not be confused with response time which measures how long a single user waits for work to be completed.

Throughput is important because it directly influences cost. If a system has low throughput, then a lot of resources are needed to process a given workload. If a system has high throughput, the number of resources needed to process the same workload is much less. 

There are two architectural factors that strongly influence throughput: the number of synchronous connections and the amount of shared data. Synchronous connections slow down throughput by blocking processes until connected processes have completed their work. Shared data slows down throughput by blocking databases.

Both of these factors come together in the traditional IT architecture. These systems heavily favor synchronous connections and make extensive use of shared data. Between the two, throughput is substantially degraded.

In the Snowman Architecture, synchronous connections are only used within a snowman. All (or almost all) connections between snowman occur through asynchronous messaging. Which means no blocked processes.

In the Snowman Architecture, shared data is the equivalent of a multi-headed snowman. This is anathema to the Snowman Architecture. The only processes that are allowed to share data are those that live within a single snowman. Since only a few processes ever share data, database blocking is kept to a minimum.

Between the judicious use of asynchronous messaging and non-shared data, the Snowman Architecture performs at a much higher throughput that does the traditional IT architecture. This means lower costs per unit of work which means lower IT costs.

Shared data is the equivalent to a multi-headed snowman. This is anathema to the Snowman Architecture

Scalability 

Scalability refers to our ability to support larger and larger workloads. Say we have designed a system to support 100 concurrent customers and then our system become so popular that we must support 500 concurrent customers. Our ability to adapt to the higher customer load is dependent on our scalability.

In the past, scalability was seen as a hardware power problem. It was assumed that to allow a system to process larger and larger workloads, it had to run on more and more powerful hardware. When the current system could no longer support the workload, the hardware would be upgraded. This could involve faster processors, more memory, or larger disk drives. In the worst case, this involved replacing smaller cheaper machines by larger expensive machines.This is the model that served the power computer companies like IBM and Sun so well.

Today, scalability is seen as a hardware numbers problem rather than a hardware power problem. We now assume that to process a larger workload we don't replace cheap machines with expensive machines, instead we get more cheap machines. This is the model that powers the most scalable systems in the world today such as Google. Google runs its entire system on inexpensive throw-away hardware and has for more than a decade5.

Given this modern view of scalability, there are three factors that determine how scalable a system is.

The first scalability factor is the compactness of the system. Smaller, more compact systems are easier to scale. Larger, more disperse systems are harder to scale.

The second scalability factor is the usage of asynchronous messages. The judicious use of asynchronous messages goes a long way toward making a system scalable. Think of an asynchronous message system as like a mailbox. As mail comes in faster, one adds more receivers. As long as any of the receiver's can process the mail, scalability becomes limited only by the number of receivers you can support.

As mail comes in faster, one adds more receivers.
The third scalability factor is the size of the database on which the system depends. Because databases have such specialized hardware requirements, they are the most difficult part of a system to scale up.

To compare the scalability of the Snowman Architecture versus a traditional IT architecture, we must start by defining the unit of scalability. In the Snowman Architecture, the unit of scalability is an individual snowman. In a traditional IT architecture, it is the entire system.

In comparing the two architectures, we see the Snowman Architecture outperforming the traditional large IT architecture in all three scalability factors. First, it is much more compact. Second, it uses asynchronous messaging in all the right places, at the boundaries to the snowmen. Third, it minimizes the size of the data pool that must be scaled by enforcing the concept of strict vertical partitioning.

As a result, the Snowman Architecture is much more amenable to scaling using the modern efficient approach to scalability, scaling by numbers. The traditional IT architecture is largely consigned to the much more expensive and inefficient approach to scalability, scaling by power. Today, scaling by power seems as quaint as vinyl records.

Flexibility 

Flexibility refers to our ability to modify the system as our business needs evolve. Say we have build our payment system to take credit cards and we now want to take debit cards. How easy is it to update our system to take debit cards as well as credit cards?

Our ability to modify our system depends on how complex the system is. The more complex the system, the more difficult the modifications will be to implement  Traditional large IT systems are very complex. They are therefore very difficult to modify. Frequently changes in one part of the system causes unexpected problems in other parts of the system. 

The Snowman Architecture is composed of a series of autonomous, self-contained, relatively simple snowmen. Because of the synergy algorithms used by SIP to partition business functionality across snowmen, it is highly likely than any modifications necessary for a specific business change will all be located within a single snowman. Since any given snowman is simple (certainly relative to a traditional IT system) we can expect that the modifications will be much more straightforward than they would be with a traditional architecture.

Cloud Effectiveness

The cloud is an attractive platform because of its "pay for what you eat when you eat it" pricing model. But to leverage this platform, it is important to structure your systems so that you eat the least amount possible to accomplish your work.

Traditional large IT systems are poorly organized to leverage this model. Because of their sprawling nature, all or most of the system must be running on the cloud to accomplish even the most trivial of tasks. This means that you are paying for all or most of the system even when you are using only a small part of it. Even worse, when you need to add new instances to handle larger workload, you are adding sprawling new instances that quickly drive the cost out of sight.

The Snowman Architecture is a collection of smaller snowmen, each dedicated to a group of closely related ("synergistic") tasks. In most scenarios, a given workload will require only a single snowman. This means that you are paying only for the resources that that snowman requires. And when you add new instances, you add them in small, inexpensive, snowman sized amounts.

Figure 5 contrasts the traditional IT architecture and the Snowman Architecture running on the cloud.
Figure 5. The Cloud: Traditional IT Architecture versus
The Snowman Architecture.

Vendor Lock-in

A system exhibits vendor lock-in when it is dependent on a single vendor for some aspect of its life support. Usually this vendor is the one providing the software platform.

Vendor lock-in is either good or bad, depending on your perspective. If you are the client, vendor lock-in is bad. It puts you in a weak bargaining position with your vendor. If you are the software platform provider, vendor lock-in is good. It puts you in a strong bargaining position with your customer.

The standard customer approach to avoiding vendor lock-in is through the use of standards. If the customer builds a system on a standard API, then the customer can easily port the system to another software platform that supports that same API. Or at least, that is the logic.

How do vendors achieve lock-in in the face of a plethora of standards covering everything from data storage to virtual systems? Vendors achieve lock-in through the tried and true process called embrace and extend. Embrace and extend is a two part process. First, the vendors embrace a particular standard. Then the vendor extends the standard in vendor specific ways. These extensions are the bait that draws in the customer. The goal is to make the extensions so powerful that they are irresistible. Once the customer has taken the bait, they are trapped. Lock-in is complete.

I have seen many customers try to resist the bait with corporate edicts forbidding the use of any vendor extension. In the end, resistance is futile. You will be assimilated.

The larger and the more complex the system, the more difficult it is to locate and remove the vendor extensions. This mean it is more difficult to port the system to another vendor. If you can't take your code to another vendor, you are locked-in. And your future is now in the hands of a company whose main goal is wringing as much money as possible out of you in the next contract negotiation.

As I said, resisting vendor extensions is pointless. The best strategy for avoiding vendor lock-in is to make it as easy as possible to locate and rewrite those sections of a system that have used the vendor extensions. Your ability to locate and rewrite those sections is dependent on how small and simple the system is. We are dealing with the same issues I discussed in the section on Modifiability. Small, simple systems are easy to modify. Large, complex systems are not.

Thus small and simple is your best defense against vendor lock-in. And if you want small and simple, don't look to standards. Look to snowmen.

Summary

In part one of this blog, I introduced the Snowman Architecture. In part two, I discussed the non-technical advantages of this architecture. In this part, I have discussed the many technical advantages of this architectural approach.

If you are building a large IT system (say, over $10M) the Snowman Architecture offers a huge number of compelling advantages over traditional approaches. These advantages range from better security to improved reliability to lower cost to greater flexibility. In fact, there is not a single non-functional requirement that will not benefit from the Snowman Architecture.

If, at this point, you are preparing to build a large IT system and you aren't seriously considering the Snowman Architecture, then I don't know what else I can say. One of us is crazy.

One of us is crazy.
Stay tuned for part four of this blog, in which I will discuss the arguments against the Snowman Architecture and why they are all flawed.

- Roger Sessions
Houston, Texas

Did you find any errors (even spelling) in this blog? Let me know. I'd love to correct them.

Would you like to subscribe to notifications about my blogs, white papers, and webshorts? Sign up here.

References

(1) See, for example, the Web Short SIP Methodology for Project Optimization by Roger Sessions. Available here.

(2) See, for example, the Web Short The Relationship Between IT Project Size and Failure Rates by Roger Sessions. Available here.

(3) PEOPLE FACTORS IN AGILE SOFTWARE DEVELOPMENT AND PROJECT MANAGEMENT by Vikash Lalsing, Somveer Kishnah and Sameerchand Pudaruth in International Journal of Software Engineering & Applications (IJSEA), Vol.3, No.1, January 2012. Available here.

(4) The Mathematics of IT Optimization by Roger Sessions. (White Paper). Available here.

(5) WEB SEARCH FOR A PLANET: THE GOOGLE CLUSTER ARCHITECTURE by by Luiz André Barroso, Jeffrey Dean, and Urs Hölzle in IEEE Micro March/April 2003 Available here.

Acknowledgements

The snowman photos are all from Flickr under Creative Commons license. The photographers are, in order of appearance: 


Legal Notices

This blog is copyright (c) 2012 by Roger Sessions. It may be copied, reposted, and printed as long as it is not modified in any way. Other than that, unauthorized usage prohibited. Ask, though. I'll probably agree.

SIP is a trademark (t) of ObjectWatch, Inc. ObjectWatch is a registered trademark of ObjectWatch, Inc. All other trademarks are owned by their respective companies.

Thursday, October 18, 2012

Snowman Architecture Part Two: Economic Benefits


This is the second part of a four part blog about The Snowman Architecture. The first part was The Snowman Architecture: An Overview. In this blog, I will be discussing the economic benefits of the architecture. But don't read this until you have read the overview!

In the next installment (part three) I will discuss The Technical Benefits of the Snowman Architecture. The fourth part, by the way, will be The Criticisms, in which I will describe the many criticisms of the Snowman Architecture and why they are all wrong.

Originally I had planned to cover all of the benefits (economic and technical) in one blog. It turns out there are just too many benefits for one blog so I have had to separate them into those that are more economic in nature (this blog) and those that are more technical in nature (the next blog.)

Review

The Snowman Architecture breaks down a large IT system into small vertically partitioned subsystems called snowmen. These snowmen interact with each other through asynchronous messages. Snowmen are designed to be as autonomous as possible from each other using a design methodology known as Simple Iterative Partitions1 (SIP). Figure 1 shows an IT system designed using the Snowman Architecture.


Figure 1. Snowman Architecture

The Snowman Architecture is in contrast to a traditional architecture that uses a methodology such as TOGAF2 to create a horizontally partitioned system. Figure 2 shows an IT system designed using traditional methodologies.


Figure 2. Traditional Horizontally Partitioned Architecture

Points of Contrast

There are several contrasts that immediately jump out in comparing the Snowman Architecture to the traditional approach. 

The first contrast is in the orientation of the partitioning. The Snowman Architecture uses a strong vertical orientation to the partitioning. The traditional approach uses a weak horizontal orientation to the partitioning.

The second contrast is in the number of subsets in the partition. The Snowman Architecture supports an unlimited number of vertically oriented subsets (snowmen). The transitional approach has exactly three horizontally oriented subsets (business architecture, technical/SOA architecture, and data architecture.)

The third contrast is in the strength of the partitioning. The strength of the partitioning refers to the porosity of the boundaries separating subsets. The more "stuff" that passes between subsets, the greater the porosity. Porosity weakens the partitions, so the greater the porosity, the weaker the partition. The Snowman Architecture partitioning is strong, indicated by the minimal number of connections between subsets. The traditional horizontal architecture partitioning is weak, indicated by the large number of almost random connections between subsets. 

Economic Benefits of Snowman Architecture

Okay, now that you remember the basic overview, let's look at the economic advantages of The Snowman Architecture.

Benefit 1: Linear Versus Exponential Complexity Curve

As an IT system gets larger it gets more complex. This is because complexity is driven both by the amount of functionality in a system and the number of connections in a system3. Both the Snowman Architecture and the traditional architecture gets more complex as the system increases in size but how they increase in complexity is quite different. The complexity of the tranditional system increases exponentially. The complexity of the Snowman Architecture increases linearly

For small IT systems, the difference between an exponential increase and a linear increase of complexity is not important. But as the size of the IT system exceeds $5M in cost, the difference becomes very important. 

Figure 3 show the relationship between complexity and project size of a traditional versus a Snowman Architecture. 

Figure 3. Complexity of Traditional Architecture versus Snowman Architecture

As shown in Figure 3, the complexity of a traditional IT architecture increases exponentially. It starts low and then enters the Risk Zone (the zone in which project failure is likely) when the size hits someplace around $8M. From there it rapidly ascends into the Failure Zone (the zone in which project failure is certain)4.  

In contrast, the complexity of the Snowman Architecture starts low (as does the traditional architecture) and then increases with a shallow linear slope. There is little difference between a shallow linear line and an exponential slope at low numbers. In Figure 3, you can see that at project sizes under $1M, there is effectively no difference between the Snowman Architecture and the traditional approach.

However this changes quickly as the project size increases. Traditional architectues are already in the Danger Zone by the time they hit $8M and by the time they hit $10M they are in the Failure Zone. In contrast, the shallow linear complexity slope allows the size of the Snowman Architecture to remain comfortably  in the Success Zone until well past $100M in project size. In fact, it isn't even clear that there is a size limitation with the Snowman Architecture.

The bottom line: a traditional architecture becomes likely to fail at around $5M whereas a Snowman Architecture has a high probability of success even at $100M.

Benefit 2: Return on Investment (ROI)

To compare the ROI of the Snowman Architecture versus the traditional horizontally partitioned architecture, let's take some reasonable project numbers for, say, a $20M project. 

Using a traditional architectural methodology (e.g. TOGAF) we can reasonably assume the $20M project will go over budget by at least 200% and will cost an additional 400% in lost opportunity costs5

Using the Snowman Architecture we won't be doing a single $20M project, we will be doing some number of smaller project of at most a few $M each. Projects of this size are well within the Success Zone (as shown in Figure 3.) Projects in this zone typically have no overruns and no lost opportunity costs. 

The Snowman approach requires an additional phase in the project life cycle, a pre-planning phase. This is where most of the work is done to design and plan the snowmen. In the worst case, this phase could add 10% to the overall cost of the project.

Of course, these numbers are just best guesses based on what I have seen of industry data. Feel free to plug in actual numbers from your own projects.  But based on these numbers, we can calculate the Snowman ROI.

Without using the Snowman architecture, we expect a total cost of

   $20M (planned cost)
+ $20M (200% overrun)
+ $40M (lost opportunity costs
-----------
$80M (total cost)
With the Snowman architecture we expect a total cost of 

  $20M (planned cost)
+ $2M (10% overhead for Snowman preplanning)
---------
$22M (total cost)

The difference between the two approaches is

  $80M (Cost of traditional approach)
- $22M (Cost of Snowman approach)
---------
  $58M (Difference between approaches)

The ROI of using the Snowman approach is thus

  $58M (Difference in Costs) 
/   $2M (Added cost of Snowman Approach) 
X 100
--------
2900% (Calculated ROI)

The bottom line: the Snowman approach returns a 2900% ROI. A 2900% ROI is excellent by any measure.

Benefit 3: Non Tangible Benefits

There are many benefits to delivering a project on time other than eliminating the lost opportunity costs. It is hard to measure these benefits, but they certainly include the following:

  • Predictability of IT deliverables.
  • Increased trust between Business and IT.
  • Better ability to use IT as a strategic asset.
As you can see, there are compelling reasons favoring the Snowman Architecture over traditional approaches. The reduction in complexity is huge and the ROI would make even the most seasoned CFO salivate  But the most compelling reasons favoring the Snowman Architecture may not be economic, they may be technical. But for those benefits, you must wait for the next installment of this blog.

Footnotes

(1) SIP is a patented methodology for autonomy optimized partitioning. It is described in a number of places, including the web short SIP Methodology for Project Optimization.

(2) TOGAF® is a methodology owned by The Object Management Group. It is described on the TOGAF 9.1 On-Line Documentation.

(3) If you are interested in the mathematical relationship between size, connections, and complexity, see my white paper The Mathematics of IT Simplification.

(4) I have written about the relationship between traditional IT project size and failure rates in a number of places including the web short The Relationship Between IT Project Size and Failure Rates.

(5) Unfortunately, we do not have good data on what these number are world-wide. These particular numbers came from averaging a number of large projects discussed in the Victorian Ombudsman Investigation into ICT Enabled Projects (2011).

Acknowledgements

Snowman picture by CileSuns92

Saturday, September 1, 2012

Snowman Architecture Part One: Overview


Introduction

This is the first of a three part blog. The parts will be laid out as follows:
  • Part One: Snowman Overview. The basics of the Snowman Architecture and why I claim it is critical for enterprise architects.
  • Part Two: Snowman Benefits. Validation for the claimed benefits of the Snowman Architecture over traditional architectural approaches.
  • Part Three: Snowman Apologetics. The arguments against the Snowman Architecture and why they are wrong.
As Enterprise Architects, there is no lack of problems deserving of our attention. We need to ensure our organizations are well positioned for the Cloud, can survive disasters, and have IT systems that can chassé in perfect time with the business. 

And then there is the whole area of IT failures. Too many of our systems go over budget, are delivered late, and end up depressing rather than supporting the business. If you have been reading any of my work, you know all about this.

But what if there was one approach to architecture that could meet most of our needs and solve the lion's share of our problems? I believe there is. I believe there is a single architectural style that is so important, I consider it a fundamental enterprise architectural pattern. I call this the Snowman Architecture.

In my last blog, I talked about Radical IT Transformation, a transformation that redefines the relationship between the business and IT. The Snowman Architecture is the IT side of this radical transformation.

Fundamentals

If Snowman Architecture sounds too informal to you, feel free to refer to it by its formal name: Vertically Aligned Synergistically Partitioned (VASP) Architecture. Figure 1 shows the four main segments of a VASP architecture.


Figure 1. Basic Vertically Aligned Synergistically Partitioned (VASP) Architecture

With a little imagination (or with the help of Figure 2) you can see why I refer to a VASP architecture as a Snowman Architecture. 


Figure 2. Snowman Architecture

Now your first reaction to the Snowman Architecture is probably, "Hey, that looks just like a services-oriented architecture (SOA)." A typical SOA is shown in Figure 3. And you can see that all of the components of the Snowman Architecture also appear in an SOA.


Figure 3. Typical SOA

Snowman: SOA with Constraints

The best way to think of the Snowman Architecture is that it is an SOA with some very tight constraints. It is these constrains that are critical to addressing all of the issues I mentioned earlier, so let's go through them.

Constraint 1: Vertical Alignment.

The contours of the business architecture (Snowman head) define the contours of the technical, services, and data architecture.  

In other words, there is a close relationship between the business, technical, services, and data architectures. Let's take these one by one.

At the technical level, there is package of technical systems (Snowman torso) that implements the package of business systems (Snowman head.) The technical package is complete with respect to the business package, that is, it fully implements the business package and doesn't implement anything other than the business package.

This vertical alignment is respected down to the data level (Snowman bottom.) In other words, there is a package of data that meets the needs of the package of technical systems (Snowman torso). This package of data fully meets the needs of the business package and doesn't meet the needs of any other package.

At the Service level, each messaging relationship supported at the services level implements one dependency at the business level. Further, all messaging relationships can be traced back to a business level dependency.

Constraint 2. Synergistic Partitioning.

The functions in the business package (Snowman head) are synergistic with respect to each other. 

Since the contours of the business package (Snowman head) define the contours of the lower level packages, it is important that the "right" functions be  located together. The overall choice of which business functions should co-habitat with which others should be directed to minimizing the overall system complexity. Elsewhere1 I have shown that the least complex overall system is attained when the choice as to co-habitation is based on the mathematical concept that I call synergy

While the concept of synergy has a precise mathematical definition, it also has a pragmatic definition. For those who don't care about the mathematics, just think of synergy is "closely related." That is, two functions are synergistic if they are closely related to each other, like deposit and withdraw. For those who do care about mathematics, see my White Paper1.

Given these two constrains, you can see why I call this a Vertically Aligned Synergistically Partitioned Architecture. And given the complexity of that description, you can see why I prefer the term Snowman Architecture.

Terminology

I use the term capability to refer to the closely related packages of business, technical, service, and data architecture. This is somewhat similar to the way the term capability is used in various enterprise architecture methodologies, although most don't include anything other than the business architecture in the notion of capability. So if I am being precise, I will refer to one related grouping of the four package types as a capability. When I am being informal, I will refer to that same  grouping as a  Snowman. So I might say the Checking-Account capability or the Checking-Account Snowman. Either of these would mean the business processes that deal with checking accounts, the technical systems that support those processes, the data that feeds those technical systems, and the services that provides interoperability with the outside world.

When I want to be clear that I am talking about my understanding of a capability rather than somebody else's, I will use the term autonomous business capability (ABC) . The word autonomous reflects the synergistic assignment of business functions and the word business refers to the central role of the business layer in defining the overall capability structure.

When I am discussing the business architecture of the ABC, I will refer to the business level of the ABCSimilarly I will use the terms technical, services, and data level to refer to those respective architectures. 

So the business level of the ABC contains some collection of business functions that are synergistic with respect to each other. The technical level of the ABC provides the technical support needed by those functions. The data level of the ABC provides the data that fuels the technical level. And the services level of the ABC implements dependencies between ABCs.

Relating this back to the Snowman Architecture, the business level of the ABC is the head, the technical level of the ABC is the torso, the data level of the ABC is the bottom, and the service level of the ABC is the arms. 

Scaling Up

Since the Snowman architecture is a subset of an SOA, creating larger and larger systems is easy. We just add more Snowmen (or ABCs, if you prefer) into the mix and make sure they are connected through messages as shown in Figure 4.


Figure 4. Scaling Up the Snowman Architecture

Benefits

Let's go back to my original claim, that the Snowman architecture solves many of the problems that plague the enterprise architect. Now I should inject a caution here. I consider the problem space of the enterprise architect the delivery of large (say, greater than $1M) systems2. If all we are building are small systems, then many of these claims don't apply. For that matter, there should be no need for an enterprise architect. 

Given this caveat, I make the following claims about the Snowman architecture in comparison to a traditional SOA or any traditional architectural approach:
  1. The Snowman architecture is cheaper to build.
  2. The Snowman architecture is more likely to be delivered on time.
  3. The Snowman architecture is more likely to satisfy the business when delivered.
  4. The Snowman architecture is easier to adapt to the changing needs of the business.
  5. The Snowman architecture is more amenable to Agile Development.
  6. The Snowman architecture is easier to debug.
  7. The Snowman architecture is more secure.
  8. The Snowman architecture is more resilient to failure.
  9. The Snowman architecture is easier to recover when system failure occurs.
  10. The Snowman architecture makes more efficient use of the Cloud.
There are a number of other benefits I could claim, but this should be sufficient to make the point. And I think it is fairly obvious that if all of my claims are true, it will be a compelling argument in favor of the Snowman Architecture.

In Part Two of this blog, I will validate each of these claims. Then in Part Three, I will discuss all of the arguments against the Snowman Architecture and show why they are wrong.

If you would like to be notified when the next installments are ready, you have two choices. If you just want to know about new blog posts, you can use the email signup on the right. If you would also like to know about my white papers, webshorts, and seminars, then use the ObjectWatch sign-up system at http://www.objectwatch.com/subscriptions.html.

Either way, stay tuned!

-------------------------------
Workshop Announcement: 
Radical IT Transformation with Roger Sessions and Sarah Runge
For my New Zealand and Australia followers, I will soon be doing a workshop with Sarah Runge, author of Stop Blaming the Software. We will be spending two days discussing our work in Radical IT Transformation, a better way to do IT.
Auckland: October 11-12 2012
Sydney: October 15-16 2012
Cairns: October 18-19 2012

Check out our Agenda or Register!
------------------------------

Notes

[1] See for example my paper, The Mathematics of IT Simplification at http://www.objectwatch.com/white_papers.htm#Math.

[2] In passing, I also note that I consider the problem space of the Enterprise Architect the delivery of the maximum possible return on IT investment. Many enterprise architects disagree with this job description. See for example the extensive discussion in LinkedIn on the subject of What is EA?

Acknowledgements

The two Snowmen pictures are by (in order of appearance) jcarwash31 and chris.corwin on Flickr, both are licensed under Creative Commons.

A Note on Comments

I welcome your questions/comments on this blog and I will try to respond quickly. A word of caution: I am not interested in comments along the lines of "This is not EA, this is EA-IT" or "EA is not concerned with delivering more value from IT." If you would like to have that conversation, I suggest you contribute to one of the discussions on LinkedIn, such as What is EA? Comments here are reserved for the topic at hand, discussing the Snowman Architecture, its claims, and the arguments against it. Thank you!