Wednesday, August 21, 2013

Two Roads Converge?

by Roger Sessions and Richard Hubert


Introduction

We (Richard Hubert and Roger Sessions) have a lot in common. We are both Fellows of the International Association of Software Architects (IASA). We have both written books and articles. And we are both well known proponents of a particular approach to Enterprise and IT Architectures. For Richard, this approach is called Convergence Architecture. For Roger, this approach is called The Snowman Practice. But it may turn out that we have more in common than we thought. Our approaches may complement each other in some interesting ways. But first, let’s take a look at what each of us has been doing.

Introduction to Richard’s work

Since the mid 1990’s I (Richard) have been developing and optimizing an architectural style that addresses the complexity of both the IT systems and the business processes. I call this holistic perspective Convergent Architecture (CA). I wrote about this in 2001 in my book Convergent Architecture (John Wiley N.Y. ISBN: 0471105600.) CA includes properties that  I consider to be inherent in any architectural style. The metamodel that I use includes the project design, the system design, and the business design. At a high level, this model is shown in the following diagram:


Figure 1. Coverage of a holistic architectural style

As you can see in the above diagram, the partitioning between Organization, Process, and Resource plays a significant role in the quality of the design. Experience and rules-of-thumb are adequate to handle many designs, but as systems get larger, a more formal approach is preferable, especially if it can be assisted by tools. This is where Roger’s work is a perfect fit.

Introduction  to Roger’s work

I (Roger) have been looking at how to validate an architecture. To do this, I have developed a mathematical model for what an ideal architecture looks like and a methodology for delivering an architecture that is as close to that ideal as possible. The starting point for this is to define what we mean by “ideal.” My definition of an ideal architecture is the least complex architecture that solves the business problem. This means that we also need a metric for measuring complexity, which, fortunately, comes out of the mathematical model. You can read about this mathematical model in this white paper.

It turns out that when you discover the ideal architecture for a given problem, it almost always has a characteristic shape: A collection of business functionality sitting on top of a collection of services sitting on top of a collection of data. In addition, these three tiers are separated from other collections by strong vertical partitions. There is a strong connection between business functions in the top tier, services in the middle tier, and data in the bottom tier. Where connections are required between tiers, these occur through asynchronous messages at the service level. This architecture is shown in the following diagram:


Figure 2. The Snowman Architecture Created By SIP

As you can see in the above diagram, the idealized architecture looks a lot like a snowman. The head, torso, and bottom of the snowman contain business functions, services, and data, respectively.

The methodology I (Roger) have developed to drive this architecture is called SIP, for Snowman Identification Process. Some of you may know it under its old name, Simple Iterative Partitions. You can get a good overview of The Snowman Practice from this video.

Synergy discovery

When we compared the architecture that is driven by the CA’s architectural metamodel (Figure 1)  to the architecture that is driven by the highly theoretical SIP (Figure 2) it was clear to us that significant commonalities are at hand. 

Both approaches are based on the fundamental Enterprise Architecture principle of IT-Business alignment. Both approaches define best practices concerning how this alignment can be effectively achieved and measured. Additionally, both approaches are based on rules and patterns that apply to the simplification of any system, whether large or small, organizational or purely technical. The Convergent Architecture, for instance, has been used to design IT-organizations which then use the same approach to design and simplify IT systems (this is conceptual isomorphism).

Lastly, and most important of all, we recognized a SIP approach can be applied to mathematically support and objectively drive both architectural styles. SIP thus enhances the design approach and process as both a tool and substantive mathematical proof needed to ascertain the simplest (least complex) of all possible system and organizational structures. . 

In essence, we now have CA showing that the SIP theory really does deliver a partition that stands up to the most demanding examination. And at the same time we have the SIP mathematics defining the vertical boundaries of a CA architecture that are mathematically not only sound, but as simple as possible.

The Future

Where will this take us? To be honest, we are still discussing this. But the possibilities are intriguing. Imagine, two mature methodologies that have such strong synergy where both the theoretical and the model-driven approaches seem to come up with such complementary solutions. Stay tuned for more information.

Wednesday, August 14, 2013

Addressing Data Center Complexity


If you have been following my work, you know how I feel about complexity. Complexity is the enemy. I have written a lot about how complexity causes software cost overruns, late deliveries, and poor business alignment.

In this blog, I decided to look at complexity from another perspective: the data center. This is the perspective of those that must support all of those complex systems the software group has managed to deliver.

The problem with complexity is that it magnifies as you move down the food chain. This is bad news for those at the bottom of the food chain, the data center.

Straightforward business processes become complex software systems. Complex software systems require very complex data stores. Very complex data stores run on extremely complex data centers. These extremely complex data centers are expensive to manage, run, and secure.

The numerous problems that complexity creates for data centers were highlighted in a recent survey by Symantec called State of the Data Center; Global Results [1]. The results of this survey should cause any CIO to break out in a cold sweat.

According to this survey, complexity is a huge problem for data centers. For example, the typical company surveyed had 16 data center outages per year at an average cost of $319K per outage or over $5M per year. And this does not include indirect costs, such as loss of customer confidence. The number and magnitude of these outages was directly attributed to data center complexity according to those surveyed. Complex data centers fail often and they fail hard.

But outages aren’t even the biggest complexity related headache for data centers. The most cited complexity related problem is the high cost of keeping the data center running on those increasingly rare days when there is no outage. Other problems attributed to data center complexity were security breaches, compliance incidents, missed service level agreements, lost data, and litigation exposure. Clearly complexity is a big problem for data centers.

How are data centers addressing this escalating complexity? According to this survey, the approach 90% of companies are taking is information governance. What is information governance? According to Debra Logan, a Gartner Research VP,

Information governance is the specification of decision rights and an accountability framework to encourage desirable behavior in the valuation, creation, storage, use, archival and deletion of information. It includes the processes, roles, standards and metrics that ensure the effective and efficient use of information in enabling an organization to achieve its goals [2]. 


Two points should be clear from the above definition. First, information governance is a vague concept.  Second, whatever information governance is, it has nothing to do with the problem that is vexing data centers, namely complexity. This is unfortunate, given that so many of the surveyed companies say they are pinning their hopes on information governance to solve their complexity related problems. These companies are headed for a major disappointment.

If information governance won’t solve complexity related data center problems, what will? The problem, as I stated earlier, is the magnification of complexity as it rolls down the food chain from business process to data center. This problem can only be solved with complexity governance. Complexity is the problem, not information.

How do I define complexity governance?

Complexity governance is a set of policies, guidelines, and procedures that ensure every business process is implemented with the simplest possible IT solution supported with the simplest possible data organization running on the simplest possible hardware configuration. 


This sounds good but what would it take to implement this?

Gartner’s Managing VP and Chief of Research, David Cappuccio, is on the right track when he says it is particularly important for more data center staff to understand the “cascade effect” of making changes in a complex environment [3]. Unfortunately, few IT staff are trained in IT complexity, a prerequisite to understanding the cascade effect to which Cappuccio eludes. And it stands to reason that if one does not understand how complexity cascades, one is woefully unprepared to do anything about it.

Here is my recommended plan for putting in place effective complexity governance.

  1. Train everybody to understand the importance of controlling complexity. Every person on the IT food chain should be able to recite these words in their sleep: Complexity is the Enemy.
  2. Train a select group that includes representatives from the business, IT, and data center in IT Complexity Analytics, the science of complexity as it relates to IT systems.
  3. Give this group a mandate to put in place strong complexity governance.
  4. Give this group the authority to enforce complexity governance.
  5. Hold this group responsible for delivering simpler IT systems that run on simpler data centers.
  6. Document the benefits complexity governance delivers.


I don’t claim that complexity governance is simple. The reality is that complexity governance requires a significant and sustained effort. But it is an effort that delivers substantial business value. If you don’t believe me, ask somebody who is in the middle of their tenth three hundred thousand dollar data center outage this year. They will tell you: Complexity is the enemy.

Sign Up

I'll be glad to let you know when I have new blogs, white papers, or videos available. Sign up here.

References

[1] https://hp.symantec.com/system/files/b-state-of-data-center-survey-global-results-09_2012.en-us.pdf

[2] http://blogs.gartner.com/debra_logan/2010/01/11/what-is-information-governance-and-why-is-it-so-hard/

[3] ] http://www.datacenterknowledge.com/archives/2012/12/04/gartner-it-complexity-staffing/

Acknowledgements

The photo is by Route79 on Flickr, licensed through Creative Commons. (http://www.flickr.com/photos/route79/)

Notices

This blog is copyright by Roger Sessions. This blog may be copied and reproduced as long as no changes are made and his authorship is acknowledged. All other rights are reserved.

Tuesday, July 30, 2013

Gartner: Complexity Reduction One of Top Ten Technical Trends



Gartner recently came out with their Top Ten Strategic Technical Trends Through 2015. Gartner states that one of the Top Ten Trends is IT Complexity. IT Complexity, Gartner says, is a major inhibitor “of an enterprise to get the most out of IT money spent.”

I have been writing for the last ten years about the need to reduce complexity and the bottom-line benefits of doing so. Gartner quotes one of my early papers, The Mathematics of IT Simplification. In this paper I introduced a modern, rational understanding of IT Complexity.

Until this white paper appeared, IT Complexity was largely ignored by Enterprise Architects. This is despite the fact that IT Complexity was responsible for numerous IT disasters and was costing the world economy trillions of dollars per year. IT Complexity was still treated in a random, ad hoc manner if it was treated at all.

This White Paper laid a new foundation for the science of IT Complexity. This paper showed how to model IT complexity, how to measure it, and how to manage it using verifiable, reproducible methodologies that are based on solid mathematics.

If we want to take IT Complexity as seriously as Gartner says we should, we need to start by understanding it. This paper is where it all began. Gartner read it. You should too.

You can find the White Paper, The Mathematics of IT Simplification [here].
You can view the Gartner presentation, The Top 10 Strategic Technology Trends for 2013 [here].
You can view a 20 minute overview of The Snowman Architecture, my answer to IT Complexity, [here]

Photo by Alan in Belfast, via Flickr and Creative Commons.

Friday, July 5, 2013

The 3000 Year Old IT Problem


It was first described 3000 years ago by Sun Tzu in his timeless book, The Art of War.

We can form a single united body, while the enemy must split up into fractions. Hence there will be a whole pitted against separate parts of a whole, which means that we shall be many to the enemy's few.1

This is the first description of a problem solving technique that would be named 600 years later by Julius Caesar as divide et impera. Or, as we know it, divide and conquer.

The Art of War is frequently named as one of the top ten must reads in business2. I have long been fascinated by The Art of War and especially the divide and conquer strategy. Back in the days when I was a researcher at the National Cancer Institute, divide and conquer was a common research strategy for testing drugs. Divide and conquer is used extensively in business, economics, legal, politics, and social sciences, to name a few.

Oddly enough, the only field I can think of that divide and conquer has not been used successfully is IT. Whereas virtually every other field has been able to solve large complex problems using a divide and conquer strategy, IT is alone in its failure. Try to divide and conquer a large IT system, and you get an interconnected web of small systems that have so many interdependencies that is practically impossible to coordinate their implementation.

This is the reason so many large IT systems are swamped by IT complexity3. The one problem solving strategy that universally works against complexity seems to be inexplicably inept when it comes to IT.

How can this be? What is so special about IT that makes it unable to apply this strategy that has found such widespread applicability?

There are only two possible answers. The first is that IT is completely different from every other field of human endeavor. The second is that IT doesn't understand how divide and conquer works. I think the second explanation is more likely.

To be fair, it is not all IT's fault. Some of the blame must be shared by Julius Caesar. He is the one, remember, who came up with the misleading name divide and conquer (or, as he said it, divide et impera.) Unfortunately, IT has taken this name all too literally.

What is the problem with the name divide and conquer? The name seems to imply that we conquer big problems by breaking them down into smaller problems. In fact, this is not really what we do. What we do do is learn to recognize the natural divisions that exist within the larger problem. So we aren't really dividing as much as we are observing.

Let's take a simple example of divide and conquer: delivering mail.

The U.S. Postal service delivers around 600 million letters each day. Any address can send a letter to any other address. The number of possible permutations of paths for any given letter are astronomical. So how has the Postal Service simplified this problem?

They have observed the natural boundaries and population densities in different areas of the country and divided the entire country up into about 43,000 geographic chunks of roughly equal postal load. Then they have assigned a unique zip code to each chunk.

In the following picture, you can see a zip code map for part of New York City.
Partial Zip Code Map for New York City

The areas that are assigned to zip codes are not equal sized. They vary depending on the population density. The area that includes Rockefeller Center (10020) is very dense, so the area is small. The area where I grew up (10011) has medium density and the area assigned to the zip code is consequently average sized.

If we blow up my home zip code we can see some other features of the system. Here is 10011 enlarged:
Zip Code 10011
You can see that the zip code boundary makes some interesting zigs and zags. For example, the left most boundary winds around the piers of the Hudson River. Towards the bottom, the boundary takes a sharp turn to the South about half way through the southern boundary. This is to follow Greenwich Avenue. On the right side, the boundary follows Fifth Avenue for quite a while until we hit the relatively chaotic Flatiron district at 20th street.

So zip codes are not randomly assigned. They take into account population densities, street layouts, and natural boundaries. The main point here is that a lot of observation takes place before the first zip code is assigned.

Suppose we created the zip code map by simply overlaying a regular grid on top of New York City. We would end up with zip codes in the middle of the Hudson River, zip codes darting back and forth across Fifth Avenue, and some zip codes with huge population densities while others would consist of only a few bored pigeons.

You can see why the so-called divide and conquer algorithm is probably better named observe and conquer.

The general rule of thumb with observe and conquer is that your ability to solve a large complex problem is highly dependent on your ability to observe the natural boundaries that define the sub problems.

Let's consider one other example, this time from warfare.

In about 50 BCE, Julius Caesar and 60,000 troops completed the defeat of Gaul, a region consisting of at least 300,000 troops. How did Caesar conquer 300,000 troops with one fifth that number? He did it through divide and conquer. Although the Gallic strength was 300,000 in total, this number was divided into a number of tribes that had a history of belligerence among themselves. Caesar was able to form alliances with some tribes, pit other tribes against each other, and pick off remaining tribes one by one. But he was only able to do this through carefully observing and understanding the "natural" political tribal boundaries. Had Caesar laid an arbitrary grid over Gaul and attempted to conquer one square at a time, he could never have been successful.

This brings us to the reason that divide and conquer has been so dismally unsuccessful in IT. Because IT hasn't understood that observation must occur before division.

If you don't believe me, try this simple experiment. The next time an IT person, say John,  suggests breaking up a large IT project into smaller pieces, ask John this question: what observations do we need to make to determine the natural boundaries of the smaller pieces? I can pretty much guarantee how John will respond: with a blank look. Not only will John not be able to answer the question, the chances are he will have no idea what you are talking about.

With such a result, is it any wonder that divide and conquer almost never works in IT? It is as if you assigned zip codes by throwing blobs of paint at a map of New York City and assigning all of the red blobs to one zip code and all of the green blobs to another.

For the first time in the history of IT, we now have a solution to this problem; a scientific, reproducible, and verifiable approach to gathering the observations necessary to make divide and conquer work. I call this approach synergistic partitioning. It is the basis for the IT architectural approach called The Snowman Practice. And if you are ever going to try to build a multi-million dollar IT system, Snowmen is where you need to start. I can assure you, it will work a lot better than throwing blobs of paint at the wall.

You can start reading about The Snowman Practice [here]. Or contact me [here]. What do Snowmen have to do with defining divide and conquer boundaries? Ask me. I'll be glad to tell you.

Subscription Information

Your one stop signup for information any new white papers, blogs, webcasts, or speaking engagements by Roger Sessions and The Snowman Methodology is [here].

References

(1) 6:14, translation from http://suntzusaid.com
(2) See, for example, Inc.'s list [here].
(3) I have written about the relationship between complexity and failure rates in a number of places, for example, my Web Short The Relationship Between IT Project Size and Failure available [here].

Legal Notices and Acknowledgements

Photograph from brainrotting on Flickr via Creative Commons.

This blog and all of these blogs are Copyright 2013 by Roger Sessions. This blog may be copied and reproduced as long as it is not altered in any way and that full attribution is given. 

Tuesday, May 21, 2013

SIP and TOGAF: A Natural Partnership

If you have ever heard me talk, you know the standard three-tier architecture is a bad approach for large complex IT projects. It has poor security. It is difficult to modify. It is highly likely to fail.

Instead, I recommend the Snowman Architecture. This architecture is great for large complex IT projects. It has excellent security. It is easy to modify. And it is highly likely to be delivered on time, on budget, and with all of the functionality the business needs.

I often show pictures contrasting these two different architectures, such as this:


The classic three tier architecture is generated by standard IT methodologies, such as TOGAF1. The Snowman Architecture is generated by the methodology known as SIP (Snowman Identification Process.)

From this discussion you might assume that traditional methodologies such as TOGAF are in conflict with SIP. But this is not the case. The ideal architecture comes not from SIP instead of TOGAF, but from an intimate dance between SIP and TOGAF2.

To see how this dance works, let me take you through the steps needed to define a Snowman Architecture and point out where SIP takes the lead and where TOGAF takes the lead.

The starting point for a Snowman Architecture is identifying the business functions. These are the smallest granular functions that are still recognizable to the business analysts. SIP drives the identification of these business functions.

The second step is the partitioning of these business functions into closely related groupings. We call these groupings synergy groups. SIP includes very precise mathematically based algorithms for identifying these synergy groups and it is critical to the project that this partitioning be done correctly.

The third step is the definition of the strong vertical boundaries that will separate the snowmen from each other. This again is SIP functionality. And now SIP takes a break and TOGAF takes over.

To review, these first three SIP lead steps are shown here:


Next we gather the requirements for the business architecture. The business architecture is the head of the snowman. From the SIP analysis, we know which functions are in each head, but we don't know much about their requirements. TOGAF has good processes in place for requirements gathering.

The fifth step is defining the technical architecture. TOGAF has been doing this for years, so SIP has nothing to add here.

The sixth step is defining the data architecture. Again, TOGAF does a great job. We don't need SIP for this.

These three steps, then, are the heart of the TOGAF contributions and are shown here:



The final two steps (seven and eight) involve tying the Snowmen together. First we define the business dependencies between synergy groups. While TOGAF has some information about how to define business dependencies, the SIP defined synergy groups are critical here because they greatly reduce the number of dependencies that will need to be implemented.

Once the business dependencies have been identified, then we are back to TOGAF to translate these business dependencies into a service oriented architecture. These last two steps are shown here:


As you can see, SIP and TOGAF complement each other. SIP depends on TOGAF to fill in the SIP defined Snowman structure. TOGAF depends on SIP to scale up to projects larger than one million dollars. Put the two together and we have a highly effective solution to building large, complex IT systems. The best of all worlds.

Footnotes

(1) TOGAF stands for The Open Group Architectural Framework. It is a trademark of the OMG.

(2) While SIP works well with TOGAF, SIP is  agnostic about which methodology is used to drive the non-SIP parts of the analysis. If your  organization is using some other architectural methodology rather than TOGAF, you can replace TOGAF in these diagrams and discussions by your favorite methodology.

Tuesday, December 4, 2012

The Snowman Architecture Part Three: The Technical Benefits


This is the third part of a four part blog.about the Snowman Architecture. The first part was The Snowman Architecture: An Overview. The second part was The Snowman Architecture: The Economic Benefits. In this part, I will be discussing the technical benefits of the Snowman Architecture. But don't read this until you have read the overview! And if you care about things like ROI, check out part two.

But whatever you do, don't miss this part! Over the next twelve pages or so (yes, I know it's a bit long) I'm going to take you through fourteen of the most compelling technical reasons why the Snowman Architecture is a huge improvement over today's approaches to large IT. You are now reading nothing less than my declaration of war on traditional IT methodologies.The first snowball has now been fired!

Review

I gave an overview of the Snowman Architecture in part one, but let's review briefly.

The Snowman Architecture breaks down a large IT system into small vertically partitioned subsystems called Snowmen. These snowmen interact with each other through asynchronous messages. Snowmen are designed to be as autonomous as possible using a design methodology known as Simple Iterative Partitions1 (SIP).

Snowmen come in three layers. The head of the snowman consists of the business functions that make up a capability. The torso of the snowman consists of the technical systems that support those business functions. The bottom of the snowman consists of the data that is used by those technical systems. Each of these layers is strongly partitioned based on the business functions that make up the head.

Snowmen reach out to each other through their arms, the asynchronous messaging system. Often this is implemented as an SOA.


Snowmen reach out to each other through their arms.

Contrast to Traditional Architectures

A traditional IT architecture is also implemented in three layers. These layers are the same as those of the Snowman Architecture: business architecture, technical architecture, and data architecture. What differentiates the Snowman Architecture from a traditional IT architecture is the strong vertical partitioning, as shown in Figure 1.

Figure 1. Traditional IT Architecture vs. Snowman Architecture

It turns out that this strong vertical partitioning has a major impact on the effectiveness of the architecture. Let's take a look at fourteen key non-functional attributes of a large IT system. As you will see, every single one of them is improved by the strong vertical partitioning that characterizes the Snowman Architecture.

In this analysis, I assume that the system we are evaluating is a large (greater than ten million dollar) system. This is the point at which traditional IT architectural methodologies are no longer able to keep up with the exponential increases in system complexity2. I also assume that SIP was used to assign business functions to the head of the snowman, an essential step to minimizing the overall complexity of the Snowman Architecture.

Okay, given these two assumptions, let's see why the Snowman Architecture outperforms all traditional approaches to large IT system design. I'll start by listing the fourteen attributes and then go through them one by one. The attributes I will look at are these:
  • Business Alignment 
  • Regulatory Compliance 
  • Auditing 
  • Security
  • Agile Friendliness 
  • Maintainability 
  • Testability 
  • Reliability 
  • Recovery 
  • Throughput 
  • Scalability 
  • Flexibility 
  • Cloud Effectiveness 
  • Vendor Lock in
You might as well do a quick check-point. Are any of these attributes important to your IT systems? If not, you might as well stop reading now. If one or more of these are of interest then keep reading.

Okay, now let's go through them one by one. Feel free to skip those you don't care about.

Business Alignment

A system is well aligned when it meets the needs of the business. You can think of business alignment as the Wow factor. When the system is delivered, does the business say, "Wow!" Or does it shake its collective head and reach for the nearest bottle of Tequila?

In any system design life cycle, there is a phase in which the business requirements are gathered. In the traditional approach (the left hand side of Figure 1) the requirements are gathered more or less immediately after the project has been approved and before the technical architecture is designed.

The size of the requirements document(s) is always proportional to the size of the project. Massive projects require lots of requirements documentation, often tens of thousands of pages. The larger the stack of requirements, the lower the chances are that those requirements accurately reflect what the business actually needs.



In SIP (the guiding design methodology for the Snowman Architecture) an additional project phase is introduced: the Partitioning Phase. This is when the basic shape of the Snowman is identified.  

This is when the basic shape of the Snowman is identified.
What is important from an alignment perspective is that this partitioning of the larger system into smaller autonomous snowmen takes place before the requirements have been gathered. Since a typical snowman rarely exceeds one million dollars in cost, it's requirements are modest. Since the requirements are relatively modest, it is more likely that those requirements accurately reflect the actual business need.

Since the Snowman Architecture has more accurate requirements than the traditional IT architecture, it is likely to actually meet the business need.

Regulatory Compliance 

An IT system is considered compliant when it can be shown to operate within the constraints of regulatory laws and regulations. Some enterprises such as the video gaming industry have few if any restraints. Others, such as financial organizations, have a complex web of laws and regulations. 

Our ability to show that a given IT system operates within its regulatory constrains is dependent on the complexity of the system. The more complex the system, the more difficult it is to prove compliance.

Large traditional IT systems (the left side of Figure 1.)  have a highly complex web of relationships between business functions, technical processes, and data. Thus it is very difficult to prove compliance.

The Snowman Architecture (the right side of Figure 1.) has a number of simple relationships between business functions, technical processes, and data. The architectural simplicity is guaranteed by the SIP directed partitioning of the business functions into snowmen heads and the strong vertical partitioning that occurs once the technical and data architectures are created.

Because each snowman is relatively simple, it is correspondingly easy to prove that it operates in compliance with any relevant regulatory constrains.

Auditing 

An IT system is considered auditable when we can accurately trace data changes back to technical processes, from there back to business functions, and from there back to human beings. The more paths there are to the data, the more difficult it is to trace these paths. 

The traditional IT architecture has a large number of complex paths to the data. It is nearly impossible for an examiner to determine which of these paths resulted in a particular item of data being updated .

The Snowman Architecture has a small number of simple paths to the data. An examiner can easily get the whole picture and then determine which of these few candidate paths resulted in a particular item of data being updated. 

An examiner can easily get the whole picture.

Security

We when talk about the security of a system, we are generally talking about our ability to protect data. Data, of course, resides in a database. So security comes down to our ability to configure the database so that unauthorized updates are not possible. In a traditional architecture, there are so many processes that need to update so many parts of the database under so many different circumstances that it is difficult to figure out a secure configuration. And even if one does manage a secure configuration, the next process that is added will change everything.

In the Snowman Architecture, database configuration is much easier. Because the partitioning of the head of the snowman (the business processes) dictates the partitioning of the technical layer and then the partitioning of the data layer, the only processes that will ever need to access the data in the snowman are the processes in the torso of the snowman. This makes configuration easy: allow the processes in the snowman torso to access the data in the snowman bottom and don't allow any process outside the snowman to access any data in the snowman. Viola. Done.

Agile Friendliness 

Many organizations are attracted to Agile Development Methodologies. I agree that Agile development has a lot of promise. However I also think it doesn't scale. 

A recent paper by Vikash Lalsing et al.3 indicates that Agile projects of 0.5 person years or less are excellent candidates for Agile development. They predict such projects will be less than 10% over budget. By the time the project  size reaches 3.6 person years, the budget overrun increases to 18%. And by the time the project size reaches 8.2 person years, the budget overrun increases to 66%.

The Snowman Architecture is ideally suited to projects of greater than $10M. This equates to an effort of close to 100 person years. This is more than ten times the project size that yielded the 66% budget overrun. 

In the Snowman Architecture, the larger project is broken down into relatively simple, autonomous chunks of project work. Each of these chunks becomes an individual snowman. The use of the SIP methodology ensures that not only is each snowman as simple as possible, but the relationships between snowmen are as simple as possible. 

The project size of any one snowmen is unlikely to exceed $1M and in many cases will be much less. A $1M project is around 7 person years. This is still large by Agile standards, but far closer to a workable agile number than a project that does not have the benefit of the Snowman Architecture.

Maintainability 

A system is maintainable when it is easy to locate the source of bugs. The more complex the system, the more difficult it is to find the source of bugs.

The complexity of the traditional architecture (the left side of Figure 1.) is much higher than the complexity of the Snowman Architecture. The maintainability of a traditional architecture is therefore much lower. The use of the SIP methodology guarantees that not only is the overall complexity of the Snowman Architecture low, it is as low as it can possibly be4. Simplicity is important when it comes to snowmen. A simple snowman is simple to maintain.

A simple snowman is simple to maintain.

Testability 

System bugs can manifest themselves at any point in the system life cycle. The later in the life cycle the bug is manifest, the more problems it causes. Our goal in system testing is to find bugs in the system as early as possible and definitely before the system is delivered to customers. The most common strategy for system testing is to write code or scripts that exercise the system and ensure it is working correctly. 

To be sure a system is working correctly you must write test code that exercises every possible logical path through the system. The more logical paths there are through the system, the more difficult it will be to create the test code and the more likely it will be that you will have missed an important path. This translates to a greater likelihood that you will ship buggy code.

There are two reasons the Snowman Architecture is more testable than the traditional architecture.  

The first reason the Snowman Architecture is more testable has to do with the number of paths. 

A traditional IT architecture has many possible paths. By the time the system reaches a few million dollars in size, it effectively has an infinite number of paths and there is no way they can all be tested.

In the Snowman Architecture, each snowman can be tested independently. Since each snowman is relatively small and simple, there are relatively few paths through the snowman. Once you have tested all of the snowman and the connections between them, you have effectively tested the system as a whole. Thus your chances of shipping buggy code are greatly reduced if you are using the Snowman Architecture.

The second reason the Snowman Architecture is more testable has to do with how pieces of the system are connected together. In a traditional IT architecture, segments of code are often connected by shared data in a database. In the Snowman Architecture, snowmen are almost always connected through asynchronous messages. 

These two approaches to connections are very different from a testability perspective. Shared data connections are almost impossible to test. There are just too many ways the data can be accessed. Asynchronous messages, in contrast, are very easy to test. One need only write a messaging harness, a common practice among service-oriented architectures, and the connection points become easily tested. 

So we see two reasons the Snowman Architecture is so easier to test than the traditional IT architecture. First, it has fewer code paths. Second, it uses asynchronous messages for its connection points. It is hard to test a traditional IT architecture. It is easy to test a Snowman Architecture.

It is easy to test a Snowman Architecture.

Reliability 

Reliability is a measure of the typical amount of time a system will remain running before it unexpectedly drops dead. Reliability is often described as mean time between failures. 

Reliability is related to testability, the last attribute I discussed. The more testable a system is, the less likely it is to have post-delivery bugs. It is these post-delivery bugs that cause systems to fail. The fewer bugs, the less likely the system is to fail. Since the Snowman Architecture is easier to test than the traditional IT architecture, is is likely to have fewer bugs and thus will be less likely to fail.

But there is another factor that favors reliability of the Snowman Architecture. This has to do with how easy it is to quarantine a bug. Consider Figure 2, which is a blowup of the left hand side of Figure 1 with some labels added for reference.

Figure 2. Blow-up of Traditional IT Architecture.

Assume that database D crashes. We have three processes dependent on D, namely, n, o, and p. So these three processes crash. Processes g and i are both dependent on n, so they both go down. Process i is also dependent on o, but since it has already crashed, we need not worry about it further. Processes i, d, and f are all dependent on p. Process i is already down, but now d and f join the fun. So now we have D, n, o, p, i, d, and f all down. This can corrupt any databases they are involved with which includes C and E. This brings down their dependent processes b and h. Which in turn... you get the picture. There is no quarantine, so when one part of a system catches a bug, that bug can rapidly propagate to the entire system.

Contrast this to Figure 3., which shows a closeup of the Snowman Architecture.

Figure 3. Closeup of Snowman Architecture

Assume in Figure 3. that database B crashes. It can bring down processes d, e, f, and g. But that's it. The boundaries of the snowman have effectively quarantined the bug from spreading further. The only connections between d, e, f, g and other processes are through asynchronous messages, and these channels can easily be protected. So while the bug in B may crash the entire snowman, there is no pathway for the bug to spread further.

The bottom line is that bugs occur less frequently with the Snowman Architecture (because it is easier to test) and when they do occur they tend to have only a local impact. In the traditional architecture, bugs occur more frequently (because it is harder to test) and when they do occur, they tend to have a global impact.

Recovery 

Recovery is related to reliability (the last section.) Whereas reliability measures how often the system fails, recovery measures how long the failure lasts. In an ideal system, we have high reliability and fast recovery, meaning that the system rarely crashes and when it does, the crash doesn't last long.

It is difficult to develop an effective recovery strategy for a traditional IT architecture. There are too many databases, too many processes, and too many ways everything can be related to each other. When this web of relationships goes down, what do you do? You try to protect the entire system but this is difficult because the system is a large, it is complex, and it is a moving target. 

In contrast, it is easy to develop an effective recovery strategy for a Snowman architecture. All you need to do is shadow any requests to the snowman to a backup snowman. Then if a failure occurs, reroute all new requests to the backup. This rerouting can occur as quickly as one can notice that the primary snowman has failed. This is shown if Figure 4.

Figure 4. Recovery Mechanism for Snowman
Taking this and the last two sections together, I can make the following claims about the Snowman Architecture relative to the traditional IT architecture:
  • The Snowman Architecture will have fewer bugs.
  • The bugs will have less impact.
  • Recovery from that impact will be faster.

Throughput 

Throughput refers to the amount of work a system can process in a unit of time. Often we measure throughput in transactions per minute. Throughput should not be confused with response time which measures how long a single user waits for work to be completed.

Throughput is important because it directly influences cost. If a system has low throughput, then a lot of resources are needed to process a given workload. If a system has high throughput, the number of resources needed to process the same workload is much less. 

There are two architectural factors that strongly influence throughput: the number of synchronous connections and the amount of shared data. Synchronous connections slow down throughput by blocking processes until connected processes have completed their work. Shared data slows down throughput by blocking databases.

Both of these factors come together in the traditional IT architecture. These systems heavily favor synchronous connections and make extensive use of shared data. Between the two, throughput is substantially degraded.

In the Snowman Architecture, synchronous connections are only used within a snowman. All (or almost all) connections between snowman occur through asynchronous messaging. Which means no blocked processes.

In the Snowman Architecture, shared data is the equivalent of a multi-headed snowman. This is anathema to the Snowman Architecture. The only processes that are allowed to share data are those that live within a single snowman. Since only a few processes ever share data, database blocking is kept to a minimum.

Between the judicious use of asynchronous messaging and non-shared data, the Snowman Architecture performs at a much higher throughput that does the traditional IT architecture. This means lower costs per unit of work which means lower IT costs.

Shared data is the equivalent to a multi-headed snowman. This is anathema to the Snowman Architecture

Scalability 

Scalability refers to our ability to support larger and larger workloads. Say we have designed a system to support 100 concurrent customers and then our system become so popular that we must support 500 concurrent customers. Our ability to adapt to the higher customer load is dependent on our scalability.

In the past, scalability was seen as a hardware power problem. It was assumed that to allow a system to process larger and larger workloads, it had to run on more and more powerful hardware. When the current system could no longer support the workload, the hardware would be upgraded. This could involve faster processors, more memory, or larger disk drives. In the worst case, this involved replacing smaller cheaper machines by larger expensive machines.This is the model that served the power computer companies like IBM and Sun so well.

Today, scalability is seen as a hardware numbers problem rather than a hardware power problem. We now assume that to process a larger workload we don't replace cheap machines with expensive machines, instead we get more cheap machines. This is the model that powers the most scalable systems in the world today such as Google. Google runs its entire system on inexpensive throw-away hardware and has for more than a decade5.

Given this modern view of scalability, there are three factors that determine how scalable a system is.

The first scalability factor is the compactness of the system. Smaller, more compact systems are easier to scale. Larger, more disperse systems are harder to scale.

The second scalability factor is the usage of asynchronous messages. The judicious use of asynchronous messages goes a long way toward making a system scalable. Think of an asynchronous message system as like a mailbox. As mail comes in faster, one adds more receivers. As long as any of the receiver's can process the mail, scalability becomes limited only by the number of receivers you can support.

As mail comes in faster, one adds more receivers.
The third scalability factor is the size of the database on which the system depends. Because databases have such specialized hardware requirements, they are the most difficult part of a system to scale up.

To compare the scalability of the Snowman Architecture versus a traditional IT architecture, we must start by defining the unit of scalability. In the Snowman Architecture, the unit of scalability is an individual snowman. In a traditional IT architecture, it is the entire system.

In comparing the two architectures, we see the Snowman Architecture outperforming the traditional large IT architecture in all three scalability factors. First, it is much more compact. Second, it uses asynchronous messaging in all the right places, at the boundaries to the snowmen. Third, it minimizes the size of the data pool that must be scaled by enforcing the concept of strict vertical partitioning.

As a result, the Snowman Architecture is much more amenable to scaling using the modern efficient approach to scalability, scaling by numbers. The traditional IT architecture is largely consigned to the much more expensive and inefficient approach to scalability, scaling by power. Today, scaling by power seems as quaint as vinyl records.

Flexibility 

Flexibility refers to our ability to modify the system as our business needs evolve. Say we have build our payment system to take credit cards and we now want to take debit cards. How easy is it to update our system to take debit cards as well as credit cards?

Our ability to modify our system depends on how complex the system is. The more complex the system, the more difficult the modifications will be to implement  Traditional large IT systems are very complex. They are therefore very difficult to modify. Frequently changes in one part of the system causes unexpected problems in other parts of the system. 

The Snowman Architecture is composed of a series of autonomous, self-contained, relatively simple snowmen. Because of the synergy algorithms used by SIP to partition business functionality across snowmen, it is highly likely than any modifications necessary for a specific business change will all be located within a single snowman. Since any given snowman is simple (certainly relative to a traditional IT system) we can expect that the modifications will be much more straightforward than they would be with a traditional architecture.

Cloud Effectiveness

The cloud is an attractive platform because of its "pay for what you eat when you eat it" pricing model. But to leverage this platform, it is important to structure your systems so that you eat the least amount possible to accomplish your work.

Traditional large IT systems are poorly organized to leverage this model. Because of their sprawling nature, all or most of the system must be running on the cloud to accomplish even the most trivial of tasks. This means that you are paying for all or most of the system even when you are using only a small part of it. Even worse, when you need to add new instances to handle larger workload, you are adding sprawling new instances that quickly drive the cost out of sight.

The Snowman Architecture is a collection of smaller snowmen, each dedicated to a group of closely related ("synergistic") tasks. In most scenarios, a given workload will require only a single snowman. This means that you are paying only for the resources that that snowman requires. And when you add new instances, you add them in small, inexpensive, snowman sized amounts.

Figure 5 contrasts the traditional IT architecture and the Snowman Architecture running on the cloud.
Figure 5. The Cloud: Traditional IT Architecture versus
The Snowman Architecture.

Vendor Lock-in

A system exhibits vendor lock-in when it is dependent on a single vendor for some aspect of its life support. Usually this vendor is the one providing the software platform.

Vendor lock-in is either good or bad, depending on your perspective. If you are the client, vendor lock-in is bad. It puts you in a weak bargaining position with your vendor. If you are the software platform provider, vendor lock-in is good. It puts you in a strong bargaining position with your customer.

The standard customer approach to avoiding vendor lock-in is through the use of standards. If the customer builds a system on a standard API, then the customer can easily port the system to another software platform that supports that same API. Or at least, that is the logic.

How do vendors achieve lock-in in the face of a plethora of standards covering everything from data storage to virtual systems? Vendors achieve lock-in through the tried and true process called embrace and extend. Embrace and extend is a two part process. First, the vendors embrace a particular standard. Then the vendor extends the standard in vendor specific ways. These extensions are the bait that draws in the customer. The goal is to make the extensions so powerful that they are irresistible. Once the customer has taken the bait, they are trapped. Lock-in is complete.

I have seen many customers try to resist the bait with corporate edicts forbidding the use of any vendor extension. In the end, resistance is futile. You will be assimilated.

The larger and the more complex the system, the more difficult it is to locate and remove the vendor extensions. This mean it is more difficult to port the system to another vendor. If you can't take your code to another vendor, you are locked-in. And your future is now in the hands of a company whose main goal is wringing as much money as possible out of you in the next contract negotiation.

As I said, resisting vendor extensions is pointless. The best strategy for avoiding vendor lock-in is to make it as easy as possible to locate and rewrite those sections of a system that have used the vendor extensions. Your ability to locate and rewrite those sections is dependent on how small and simple the system is. We are dealing with the same issues I discussed in the section on Modifiability. Small, simple systems are easy to modify. Large, complex systems are not.

Thus small and simple is your best defense against vendor lock-in. And if you want small and simple, don't look to standards. Look to snowmen.

Summary

In part one of this blog, I introduced the Snowman Architecture. In part two, I discussed the non-technical advantages of this architecture. In this part, I have discussed the many technical advantages of this architectural approach.

If you are building a large IT system (say, over $10M) the Snowman Architecture offers a huge number of compelling advantages over traditional approaches. These advantages range from better security to improved reliability to lower cost to greater flexibility. In fact, there is not a single non-functional requirement that will not benefit from the Snowman Architecture.

If, at this point, you are preparing to build a large IT system and you aren't seriously considering the Snowman Architecture, then I don't know what else I can say. One of us is crazy.

One of us is crazy.
Stay tuned for part four of this blog, in which I will discuss the arguments against the Snowman Architecture and why they are all flawed.

- Roger Sessions
Houston, Texas

Did you find any errors (even spelling) in this blog? Let me know. I'd love to correct them.

Would you like to subscribe to notifications about my blogs, white papers, and webshorts? Sign up here.

References

(1) See, for example, the Web Short SIP Methodology for Project Optimization by Roger Sessions. Available here.

(2) See, for example, the Web Short The Relationship Between IT Project Size and Failure Rates by Roger Sessions. Available here.

(3) PEOPLE FACTORS IN AGILE SOFTWARE DEVELOPMENT AND PROJECT MANAGEMENT by Vikash Lalsing, Somveer Kishnah and Sameerchand Pudaruth in International Journal of Software Engineering & Applications (IJSEA), Vol.3, No.1, January 2012. Available here.

(4) The Mathematics of IT Optimization by Roger Sessions. (White Paper). Available here.

(5) WEB SEARCH FOR A PLANET: THE GOOGLE CLUSTER ARCHITECTURE by by Luiz André Barroso, Jeffrey Dean, and Urs Hölzle in IEEE Micro March/April 2003 Available here.

Acknowledgements

The snowman photos are all from Flickr under Creative Commons license. The photographers are, in order of appearance: 


Legal Notices

This blog is copyright (c) 2012 by Roger Sessions. It may be copied, reposted, and printed as long as it is not modified in any way. Other than that, unauthorized usage prohibited. Ask, though. I'll probably agree.

SIP is a trademark (t) of ObjectWatch, Inc. ObjectWatch is a registered trademark of ObjectWatch, Inc. All other trademarks are owned by their respective companies.

Thursday, October 18, 2012

Snowman Architecture Part Two: Economic Benefits


This is the second part of a four part blog about The Snowman Architecture. The first part was The Snowman Architecture: An Overview. In this blog, I will be discussing the economic benefits of the architecture. But don't read this until you have read the overview!

In the next installment (part three) I will discuss The Technical Benefits of the Snowman Architecture. The fourth part, by the way, will be The Criticisms, in which I will describe the many criticisms of the Snowman Architecture and why they are all wrong.

Originally I had planned to cover all of the benefits (economic and technical) in one blog. It turns out there are just too many benefits for one blog so I have had to separate them into those that are more economic in nature (this blog) and those that are more technical in nature (the next blog.)

Review

The Snowman Architecture breaks down a large IT system into small vertically partitioned subsystems called snowmen. These snowmen interact with each other through asynchronous messages. Snowmen are designed to be as autonomous as possible from each other using a design methodology known as Simple Iterative Partitions1 (SIP). Figure 1 shows an IT system designed using the Snowman Architecture.


Figure 1. Snowman Architecture

The Snowman Architecture is in contrast to a traditional architecture that uses a methodology such as TOGAF2 to create a horizontally partitioned system. Figure 2 shows an IT system designed using traditional methodologies.


Figure 2. Traditional Horizontally Partitioned Architecture

Points of Contrast

There are several contrasts that immediately jump out in comparing the Snowman Architecture to the traditional approach. 

The first contrast is in the orientation of the partitioning. The Snowman Architecture uses a strong vertical orientation to the partitioning. The traditional approach uses a weak horizontal orientation to the partitioning.

The second contrast is in the number of subsets in the partition. The Snowman Architecture supports an unlimited number of vertically oriented subsets (snowmen). The transitional approach has exactly three horizontally oriented subsets (business architecture, technical/SOA architecture, and data architecture.)

The third contrast is in the strength of the partitioning. The strength of the partitioning refers to the porosity of the boundaries separating subsets. The more "stuff" that passes between subsets, the greater the porosity. Porosity weakens the partitions, so the greater the porosity, the weaker the partition. The Snowman Architecture partitioning is strong, indicated by the minimal number of connections between subsets. The traditional horizontal architecture partitioning is weak, indicated by the large number of almost random connections between subsets. 

Economic Benefits of Snowman Architecture

Okay, now that you remember the basic overview, let's look at the economic advantages of The Snowman Architecture.

Benefit 1: Linear Versus Exponential Complexity Curve

As an IT system gets larger it gets more complex. This is because complexity is driven both by the amount of functionality in a system and the number of connections in a system3. Both the Snowman Architecture and the traditional architecture gets more complex as the system increases in size but how they increase in complexity is quite different. The complexity of the tranditional system increases exponentially. The complexity of the Snowman Architecture increases linearly

For small IT systems, the difference between an exponential increase and a linear increase of complexity is not important. But as the size of the IT system exceeds $5M in cost, the difference becomes very important. 

Figure 3 show the relationship between complexity and project size of a traditional versus a Snowman Architecture. 

Figure 3. Complexity of Traditional Architecture versus Snowman Architecture

As shown in Figure 3, the complexity of a traditional IT architecture increases exponentially. It starts low and then enters the Risk Zone (the zone in which project failure is likely) when the size hits someplace around $8M. From there it rapidly ascends into the Failure Zone (the zone in which project failure is certain)4.  

In contrast, the complexity of the Snowman Architecture starts low (as does the traditional architecture) and then increases with a shallow linear slope. There is little difference between a shallow linear line and an exponential slope at low numbers. In Figure 3, you can see that at project sizes under $1M, there is effectively no difference between the Snowman Architecture and the traditional approach.

However this changes quickly as the project size increases. Traditional architectues are already in the Danger Zone by the time they hit $8M and by the time they hit $10M they are in the Failure Zone. In contrast, the shallow linear complexity slope allows the size of the Snowman Architecture to remain comfortably  in the Success Zone until well past $100M in project size. In fact, it isn't even clear that there is a size limitation with the Snowman Architecture.

The bottom line: a traditional architecture becomes likely to fail at around $5M whereas a Snowman Architecture has a high probability of success even at $100M.

Benefit 2: Return on Investment (ROI)

To compare the ROI of the Snowman Architecture versus the traditional horizontally partitioned architecture, let's take some reasonable project numbers for, say, a $20M project. 

Using a traditional architectural methodology (e.g. TOGAF) we can reasonably assume the $20M project will go over budget by at least 200% and will cost an additional 400% in lost opportunity costs5

Using the Snowman Architecture we won't be doing a single $20M project, we will be doing some number of smaller project of at most a few $M each. Projects of this size are well within the Success Zone (as shown in Figure 3.) Projects in this zone typically have no overruns and no lost opportunity costs. 

The Snowman approach requires an additional phase in the project life cycle, a pre-planning phase. This is where most of the work is done to design and plan the snowmen. In the worst case, this phase could add 10% to the overall cost of the project.

Of course, these numbers are just best guesses based on what I have seen of industry data. Feel free to plug in actual numbers from your own projects.  But based on these numbers, we can calculate the Snowman ROI.

Without using the Snowman architecture, we expect a total cost of

   $20M (planned cost)
+ $20M (200% overrun)
+ $40M (lost opportunity costs
-----------
$80M (total cost)
With the Snowman architecture we expect a total cost of 

  $20M (planned cost)
+ $2M (10% overhead for Snowman preplanning)
---------
$22M (total cost)

The difference between the two approaches is

  $80M (Cost of traditional approach)
- $22M (Cost of Snowman approach)
---------
  $58M (Difference between approaches)

The ROI of using the Snowman approach is thus

  $58M (Difference in Costs) 
/   $2M (Added cost of Snowman Approach) 
X 100
--------
2900% (Calculated ROI)

The bottom line: the Snowman approach returns a 2900% ROI. A 2900% ROI is excellent by any measure.

Benefit 3: Non Tangible Benefits

There are many benefits to delivering a project on time other than eliminating the lost opportunity costs. It is hard to measure these benefits, but they certainly include the following:

  • Predictability of IT deliverables.
  • Increased trust between Business and IT.
  • Better ability to use IT as a strategic asset.
As you can see, there are compelling reasons favoring the Snowman Architecture over traditional approaches. The reduction in complexity is huge and the ROI would make even the most seasoned CFO salivate  But the most compelling reasons favoring the Snowman Architecture may not be economic, they may be technical. But for those benefits, you must wait for the next installment of this blog.

Footnotes

(1) SIP is a patented methodology for autonomy optimized partitioning. It is described in a number of places, including the web short SIP Methodology for Project Optimization.

(2) TOGAF® is a methodology owned by The Object Management Group. It is described on the TOGAF 9.1 On-Line Documentation.

(3) If you are interested in the mathematical relationship between size, connections, and complexity, see my white paper The Mathematics of IT Simplification.

(4) I have written about the relationship between traditional IT project size and failure rates in a number of places including the web short The Relationship Between IT Project Size and Failure Rates.

(5) Unfortunately, we do not have good data on what these number are world-wide. These particular numbers came from averaging a number of large projects discussed in the Victorian Ombudsman Investigation into ICT Enabled Projects (2011).

Acknowledgements

Snowman picture by CileSuns92