Wednesday, February 12, 2014

Wake Up Call for the Banking Industry

In a discussion thread on the LinkedIn group Simpler IT, Marc Lankhorst mentioned that the Dutch Bank (The Dutch banking regulatory board) recently came out with a new report that discussed the stability of Dutch banks. The report is titled Naar een Dienstbaar en Stabiel Bankwezen or To a Serviceable and Stable Banking System (as translated by Bing.)

Appendix 7 of the report discussed the critical relationship between banking IT systems, Enterprise Architecture, and complexity management. The report is in Dutch. I ran it through Google translate. The translation was very rough, but even in the Google translation, it was obvious that the report was nothing short of a wake up call to the banking industry.

One of the members of the Simpler IT group arranged for a group of humans to translate Appendix 7. This group did a great job and kindly gave me permission to reprint their translation here. The primary translator was Sasja McCann with help by Andi McCann and Peter Mcelwaine-Johnn.

The original (in Dutch) is by the Dutch Ministry of Finance and is here.

If you would like to discuss this report, I recommend discussing this on the appropriate thread at Simpler IT.

Appendix 7: Resolution and Recovery of IT systems in Banking

Information Technology (IT) is critical for many businesses and non-profit organisations. Business processes are so dependent on automated information systems that, in many cases, the performance of those systems are for a large part responsible for the success of an organisation. Many business procedures cannot operate anymore without the need for information systems. 

This is no different in the financial sector, and particularly the banking sector. Banks are sometimes referred to as glorified IT companies. IT plays a critical role in nearly all banking processes and has done so for a long time. Already in the 1960s, bulk processes were automated on a large scale, such as the automated payment system for instance. Over the years, most banks have created a variety of information systems, most of them geared towards automating administrative processes, but in the meantime, many other forms of information processing have been created.

Information systems in the Dutch banking sector, and the IT function that is responsible for those systems, have the following characteristics:

Information systems are an integral part of business, more so than in other sectors;

  • The budgets for their development and maintenance, and budgets for the IT function in general, are therefore correspondingly higher;
  • The proportion of older systems (legacy) is still relatively high, with relatively high additional maintenance costs;
  • The complexity of systems is relatively high, partly due to their relative old age and the dependency of banking processes on those systems. “Everything is dependent on everything else.”
  • The diversity of systems is relatively high, and also the number of systems is high;
  • IT functions are generally quite mature. A lot of investments have been made in processes and personnel;
  • Personnel working in the IT function are well-educated and have a high experience level;
  • The management and maintenance of information systems is often outsourced to specialized IT companies (e.g. IBM, Accenture, Cognizant) and is often operated from India or Eastern Europe;
  • Information systems extend to the customers, both business and retail customers. Much use is made of “electronic banking”
  • The customer is an extension of the information systems of the bank, and partly for this reason IT in the banking sector needs to comply with strict security regulations / conditions. This aspect is an important component of the trust customers have in a bank
Information technology is, in general, characterised by big changes and fast dynamics. For the purpose of this report it would be too much to discuss social media, cloud computing and big data in detail, but obviously banks will continue to invest in these areas, not only to be able to offer an attractive proposition to their customers and shareholders, but also to continue to comply with laws and regulations. At the same time all these IT developments offer the chance to reduce the complexity of information systems and to enhance their effectiveness. However, these developments do all need to comply with the relevant IT Governance.

In the remainder of this text we discuss the resolution and recovery of IT systems in the context of M&A activity. We describe some principles (the preconditions which we discussed in the previous paragraph) to which information systems must comply in order to be capable of resolution and recovery (i.e. splitting IT systems). We start with a brief introduction to the discipline to which these principles belong, called Enterprise Architecture. Enterprise Architecture is a tool for the governance, including IT Governance, of an organisation.

Enterprise Architecture

An Enterprise Architecture (EA) is a coherent set of principles, models and patterns focused on the design, development and management of an organisation. An Enterprise Architecture is like a blueprint (or map) of an organisation and its information systems. Strictly speaking EA is not a specific IT tool – in practice, however, it is a key tool to assure IT Governance. It describes the business functions and processes, their relationships and their information needs, and it outlines the information systems that meet that need.

An Enterprise Architecture structures the IT landscape, makes it possible to describe the current and necessary information systems in an orderly and consistent manner, and to take decisions based on these descriptions. These decisions are aimed generally at the new development, modification or replacement of information systems.

The discipline that deals with EA has developed in recent years in response to the increasing complexity of existing information systems, and the associated problems of large, unmanageable IT projects and dilemmas that many organisations face as a consequence of the fast dynamics and speed of information technology.

Due to the structuring and complexity-reducing character of Enterprise Architecture, this instrument is the means to achieve resolution and recovery of information systems. 

Enterprise Architecture and Dutch banks

Because of the aforementioned characteristics of information systems in the banking sector, Enterprise Architecture is highly relevant to banks. This is the reason why most Dutch banks have invested in the development of Enterprise Architecture functions and departments.

In theory, the banks already have a tool that enables high quality of information systems. One of the aspects of high quality is that resolution and recovery of information can take place in a controlled manner. However, given the quality problems that banking systems face at the moment, the reality is often different: insufficient availability, high security risks and lack of maintainability. This contributes to high cost of maintenance and adjustment costs of banking information systems and also means that successful resolution and recovery is very difficult to achieve.

Why then, given all the promises, is Enterprise Architecture still underutilised? Reasons for this are:

  1. Opportunism of the ‘business’: often driven by circumstances there are often “quick and dirty” information systems developed that do not conform to the EA. These systems usually live a longer life than originally envisaged. It is often these systems that cause the most problems. 
  2. Backlog: we’ve already highlighted the legacy problems of banks. It takes a lot of time and effort to clean up and replace legacy systems. 
  3. Unnecessary complexity: sometimes there is an atmosphere of mystique around Enterprise Architecture that makes it unnecessarily complicated, resulting in lack of understanding by the people that need to understand it. Furthermore, the programmes that are implemented through Enterprise Architecture are often large and complex, which increases the risk of failure. 
  4. Insufficient overview: partly because of the complexity and scale of information systems there is no clear overview to actually develop a clear ‘map’. The result is often a very complex diagram that no one understands anymore. 
  5. Mandate: The staff in the Enterprise Architecture discipline (“architects”) have insufficient mandate from the organisation to achieve effective “compliance” with the architecture. Sometimes architects are not sufficiently able to express (the importance of) EA. 
  6. Contracts and Service Level Agreements: vendors are sometimes unable to comply with EA or do not want to comply, e.g. if cost justifications are introduced. Until recently, there were no standards for suppliers or banks to adhere to. 
  7. Each bank has in the past tried to re-invent the wheel at least once, under the assumption that banking processes differ greatly. Obviously this is not the case. It has, however, led to costly programmes and projects that have resulted in a healthy apathy towards IT at senior levels within the banks. 
The last two reasons led to the realisation that there is a need for EA standards for banks. This standard has recently been developed by the Banking Industry Architecture Network (BIAN), established by a number of major banks, along with several established IT vendors[1]. In the Netherlands, ABN AMRO, ING and Rabobank are members of BIAN. Other members are several European, Asian and American banks and its membership is expanding rapidly. The standard, the so called BIAN model, describes all the services that a bank offers, including IT support required for this. The advantage of such a standard is that banks do not have to reinvent the wheel themselves. This not only reduces costs but also increases the quality of the IT landscape, and facilitates M&A activity amongst banks. Figure 1 shows the complete model (at the highest level)[2].


Figure 1 BIAN Service Landscape 2.5 

Resolution and Recovery Principles

Regarding unbundling, an Enterprise Architecture should give priority to the following three principles. This means that all the information systems of a bank are structured and arranged such that they conform to these principles. Note that the principles can be further “unravelled” - in order to avoid complexity as much as possible, we describe them at an aggregate level in this report.

We have sought to minimize the number of principles. This does not mean that we discourage additions or refinement of the above three principles – in practice, banks often use more principles. A minimal set enables clarity, and also allows for the acceptance and implementation of the EA principles.

Principle 1: Compartmentalisation of information systems.

The background of this principle is that business functions/departments must be able to operate as independently as possible from each other and that the information system of one function does not interfere with that of another function. The bank defines its business functions to be as detailed as possible, and also defines the relationships between business functions as clearly as possible. The information systems of a business function should not support other business functions, but communicate (via so-called “services”) and exchange data with information systems from other business functions – they are compartmentalised. Compartmentalisation is achieved in practice by, inter alia:

  • Virtualisation of information systems, which means that users share hardware and software in a controlled way. Special software (virtualisation software) ensures the compartmentalisation; authorisation and authentication play an important role in this
  • Develop and analyse information systems with a “service-oriented” view. “Service-orientation” refers to ensuring a system is developed with the end-user and the purpose of the service in mind.
  • Developing information systems using components with well-defined functionality. Each component should have a clearly defined service. Components should be standardised and documented and may be reused.
  • Multiple layers of information systems, in which, for example the presentation of data is separated from the processing of data. 

A beneficial side effect of compartmentalisation is the reduction of complexity, which in itself simplifies resolution and recovery. In addition, the number of links (interfaces) between systems is reduced, making maintenance easier. The success of compartmentalisation depends on carefully thought-through and well-documented Enterprise Architecture.

Principle 2: Data has one owner who is responsible for the storage, description, sharing and destruction of the data.

This principle should ensure the quality of the data of a bank; for example, it should prevent inconsistencies and data unreliability caused by copying data and then editing the copied data. The data owner is responsible for the quality of the data. Data quality is crucial in any potential split of activity. In the case of a resolution of an information system as a result of M&A activity, two cases can arise:

  • The split entities are no longer part of the same holding. A predetermined copy of the data is made to be used by both entities. Each entity then applies principle 2 in their own entity. 
  • The split entities are part of the same holding. In that case, they can use the same data, and principle 2 still applies, i.e. one data owner.

Principle 3: An information system has one owner, who is responsible for both the quality of the information system and its components, as well as the quality of the services provided by the information system.

The application of Principle 3 ensures clarity of ownership of an information system. In any split this clarity is crucial. Even if there is no split, it is important that an information system has an owner, with a budget to develop the information system to a required level and to keep it there, to ensure business processes are supported optimally and that resolution and recovery is possible. Incidentally, this is also one of the guiding principles of Sarbanes-Oxley (SOX).

Preconditions

Earlier in this text we have already stated that many banks already use Enterprise Architecture, including resolution and recovery principles, and that specific roles, disciplines and processes have been defined. We argue that the Enterprise Architecture discipline needs to take a stronger position within the bank. This means that:

  • The staff in the discipline (architects) have excellent content and communication skills. They know the banking business, the information systems of the bank and the relevant information technology for the bank through and through, and can clearly convey that knowledge verbally and in writing. They are able to capture and define an Enterprise Architecture in understandable language and / or models, and can express the importance of Enterprise Architecture effectively.
  • The discipline reports to top management in the bank. Enterprise Architecture comprises the entire bank and the information provision of the whole bank – it is therefore important that this broad scope is reflected in the weight of the discipline within the organisation. The discipline not only has a close relationship with IT in particular, but also with the operation of the bank in general, and with the risk management discipline. A close relationship with the COO and CRO, in addition to the relationship with the CIO, is therefore necessary.
  • The Enterprise Architecture discipline has the mandate to test the current and future information systems against the Enterprise Architecture. The discipline also has the mandate to escalate to the highest level in the case of non-compliance, with the obligation to indicate what action should be taken to eliminate non-compliance. This mandate also extends to suppliers and vendors – it should be contractually specified that suppliers and vendors are to conform to the Enterprise Architecture.
  • It is advisable to rest the accountability for the Enterprise Architecture discipline with one person: the Chief Enterprise Architect.

Measures

Following the above, we propose the following steps to ensure successful resolution and recovery of banking information systems:

  1. An important means of ensuring resolution and recovery is to establish Enterprise Architecture disciplines. Establish a number of clear principles, wherein the three principles as described in this document are a minimum. Become a member of an industry body or adhere to a standard in this area – BIAN seems obvious.
  2. Strengthen the Enterprise Architecture discipline in the bank by appointing a Chief Enterprise Architect with knowledge of the banking business and an overview of the IT landscape of the bank. 
  3. Let the Chief Enterprise Architect report to top management. 
  4. Make resolution and recovery the Chief Enterprise Architect’s responsibility, even if only with regard to the IT landscape
  5. Give the Chief Enterprise Architect the mandate and the tools to assess changes and new developments in the IT landscape, to comment on them and, if necessary, to stop them.
  6. Give the Chief Enterprise Architect the mandate and the tools, including a number of enterprise architects with excellent communication skills and experience in the banking industry, to initiate activities that enable successful resolution and recovery of the IT landscape.
  7. Increase EA knowledge and skills of supervisors/senior management. This applies to risk management, the Supervisory Board and DNB (Dutch National Bank). It has been observed that the latter has little to no ability to test an Enterprise Architecture. In addition, currently there is no reference model to benchmark any testing. The aforementioned BIAN model can fulfill the role of this reference model.
Note that these measures are not only beneficial for the ability to successfully resolve and recover, but also increase the quality and maintainability of information systems in general.

[1] For more information, see www.bian.org

[2] Updated to show v2.5 of the BIAN model. Original report showed v2.0.

Wednesday, November 13, 2013

The Math of Agile Development and Snowmen


I was recently asked about the relationship between Agile Development and Snowmen. If you aren't familiar with The Snowman Architecture, see this earlier blog.

I'm not sure The Snowman Practice has much to offer Agile on small projects (say <$1M). These projects always seem to do reasonably on their own. However once a project goes much above $1M, the Agile approach can no longer keep up with the increasing project complexity.

This is where The Snowman Practice has much to offer Agile. The Snowman Practice offers a methodology to break a large project into smaller highly targeted autonomous projects that have minimal dependencies to each other and are highly aligned to the business architectural organization.

Smaller highly targeted autonomous projects 

There is actually a mathematical explanation as to why Agile over $1M needs The Snowman Practice. As projects increase in functionality, their complexity increases exponentially. Agile however is a linear solution. This means that the amount of work an Agile team can produce is at best linearly related to the size of the group.

At small sizes, a linear solution (Agile) can contain an exponential problem (complexity). But at some point the exponential problem crosses over the ability of the linear solution's ability to provide containment. For Agile projects, this seems to happen someplace around $1M.

At some point the exponential problem crosses over
the ability of the linear solution's ability to provide containment
The Snowman Practice solves this problem by keeping the size of each autonomous sub project under the magic $1M crossover point. And that is a project size that is well suited to the Agile development approach. Or any other linear constrained approach.

Acknowledgements

The Snowman photos are, in order of appearance, by Pat McGrath and smallape and made available via Flickr through licensing of Creative Commons.

Saturday, October 19, 2013

Wednesday, August 21, 2013

Two Roads Converge?

by Roger Sessions and Richard Hubert


Introduction

We (Richard Hubert and Roger Sessions) have a lot in common. We are both Fellows of the International Association of Software Architects (IASA). We have both written books and articles. And we are both well known proponents of a particular approach to Enterprise and IT Architectures. For Richard, this approach is called Convergence Architecture. For Roger, this approach is called The Snowman Practice. But it may turn out that we have more in common than we thought. Our approaches may complement each other in some interesting ways. But first, let’s take a look at what each of us has been doing.

Introduction to Richard’s work

Since the mid 1990’s I (Richard) have been developing and optimizing an architectural style that addresses the complexity of both the IT systems and the business processes. I call this holistic perspective Convergent Architecture (CA). I wrote about this in 2001 in my book Convergent Architecture (John Wiley N.Y. ISBN: 0471105600.) CA includes properties that  I consider to be inherent in any architectural style. The metamodel that I use includes the project design, the system design, and the business design. At a high level, this model is shown in the following diagram:


Figure 1. Coverage of a holistic architectural style

As you can see in the above diagram, the partitioning between Organization, Process, and Resource plays a significant role in the quality of the design. Experience and rules-of-thumb are adequate to handle many designs, but as systems get larger, a more formal approach is preferable, especially if it can be assisted by tools. This is where Roger’s work is a perfect fit.

Introduction  to Roger’s work

I (Roger) have been looking at how to validate an architecture. To do this, I have developed a mathematical model for what an ideal architecture looks like and a methodology for delivering an architecture that is as close to that ideal as possible. The starting point for this is to define what we mean by “ideal.” My definition of an ideal architecture is the least complex architecture that solves the business problem. This means that we also need a metric for measuring complexity, which, fortunately, comes out of the mathematical model. You can read about this mathematical model in this white paper.

It turns out that when you discover the ideal architecture for a given problem, it almost always has a characteristic shape: A collection of business functionality sitting on top of a collection of services sitting on top of a collection of data. In addition, these three tiers are separated from other collections by strong vertical partitions. There is a strong connection between business functions in the top tier, services in the middle tier, and data in the bottom tier. Where connections are required between tiers, these occur through asynchronous messages at the service level. This architecture is shown in the following diagram:


Figure 2. The Snowman Architecture Created By SIP

As you can see in the above diagram, the idealized architecture looks a lot like a snowman. The head, torso, and bottom of the snowman contain business functions, services, and data, respectively.

The methodology I (Roger) have developed to drive this architecture is called SIP, for Snowman Identification Process. Some of you may know it under its old name, Simple Iterative Partitions. You can get a good overview of The Snowman Practice from this video.

Synergy discovery

When we compared the architecture that is driven by the CA’s architectural metamodel (Figure 1)  to the architecture that is driven by the highly theoretical SIP (Figure 2) it was clear to us that significant commonalities are at hand. 

Both approaches are based on the fundamental Enterprise Architecture principle of IT-Business alignment. Both approaches define best practices concerning how this alignment can be effectively achieved and measured. Additionally, both approaches are based on rules and patterns that apply to the simplification of any system, whether large or small, organizational or purely technical. The Convergent Architecture, for instance, has been used to design IT-organizations which then use the same approach to design and simplify IT systems (this is conceptual isomorphism).

Lastly, and most important of all, we recognized a SIP approach can be applied to mathematically support and objectively drive both architectural styles. SIP thus enhances the design approach and process as both a tool and substantive mathematical proof needed to ascertain the simplest (least complex) of all possible system and organizational structures. . 

In essence, we now have CA showing that the SIP theory really does deliver a partition that stands up to the most demanding examination. And at the same time we have the SIP mathematics defining the vertical boundaries of a CA architecture that are mathematically not only sound, but as simple as possible.

The Future

Where will this take us? To be honest, we are still discussing this. But the possibilities are intriguing. Imagine, two mature methodologies that have such strong synergy where both the theoretical and the model-driven approaches seem to come up with such complementary solutions. Stay tuned for more information.

Wednesday, August 14, 2013

Addressing Data Center Complexity


If you have been following my work, you know how I feel about complexity. Complexity is the enemy. I have written a lot about how complexity causes software cost overruns, late deliveries, and poor business alignment.

In this blog, I decided to look at complexity from another perspective: the data center. This is the perspective of those that must support all of those complex systems the software group has managed to deliver.

The problem with complexity is that it magnifies as you move down the food chain. This is bad news for those at the bottom of the food chain, the data center.

Straightforward business processes become complex software systems. Complex software systems require very complex data stores. Very complex data stores run on extremely complex data centers. These extremely complex data centers are expensive to manage, run, and secure.

The numerous problems that complexity creates for data centers were highlighted in a recent survey by Symantec called State of the Data Center; Global Results [1]. The results of this survey should cause any CIO to break out in a cold sweat.

According to this survey, complexity is a huge problem for data centers. For example, the typical company surveyed had 16 data center outages per year at an average cost of $319K per outage or over $5M per year. And this does not include indirect costs, such as loss of customer confidence. The number and magnitude of these outages was directly attributed to data center complexity according to those surveyed. Complex data centers fail often and they fail hard.

But outages aren’t even the biggest complexity related headache for data centers. The most cited complexity related problem is the high cost of keeping the data center running on those increasingly rare days when there is no outage. Other problems attributed to data center complexity were security breaches, compliance incidents, missed service level agreements, lost data, and litigation exposure. Clearly complexity is a big problem for data centers.

How are data centers addressing this escalating complexity? According to this survey, the approach 90% of companies are taking is information governance. What is information governance? According to Debra Logan, a Gartner Research VP,

Information governance is the specification of decision rights and an accountability framework to encourage desirable behavior in the valuation, creation, storage, use, archival and deletion of information. It includes the processes, roles, standards and metrics that ensure the effective and efficient use of information in enabling an organization to achieve its goals [2]. 


Two points should be clear from the above definition. First, information governance is a vague concept.  Second, whatever information governance is, it has nothing to do with the problem that is vexing data centers, namely complexity. This is unfortunate, given that so many of the surveyed companies say they are pinning their hopes on information governance to solve their complexity related problems. These companies are headed for a major disappointment.

If information governance won’t solve complexity related data center problems, what will? The problem, as I stated earlier, is the magnification of complexity as it rolls down the food chain from business process to data center. This problem can only be solved with complexity governance. Complexity is the problem, not information.

How do I define complexity governance?

Complexity governance is a set of policies, guidelines, and procedures that ensure every business process is implemented with the simplest possible IT solution supported with the simplest possible data organization running on the simplest possible hardware configuration. 


This sounds good but what would it take to implement this?

Gartner’s Managing VP and Chief of Research, David Cappuccio, is on the right track when he says it is particularly important for more data center staff to understand the “cascade effect” of making changes in a complex environment [3]. Unfortunately, few IT staff are trained in IT complexity, a prerequisite to understanding the cascade effect to which Cappuccio eludes. And it stands to reason that if one does not understand how complexity cascades, one is woefully unprepared to do anything about it.

Here is my recommended plan for putting in place effective complexity governance.

  1. Train everybody to understand the importance of controlling complexity. Every person on the IT food chain should be able to recite these words in their sleep: Complexity is the Enemy.
  2. Train a select group that includes representatives from the business, IT, and data center in IT Complexity Analytics, the science of complexity as it relates to IT systems.
  3. Give this group a mandate to put in place strong complexity governance.
  4. Give this group the authority to enforce complexity governance.
  5. Hold this group responsible for delivering simpler IT systems that run on simpler data centers.
  6. Document the benefits complexity governance delivers.


I don’t claim that complexity governance is simple. The reality is that complexity governance requires a significant and sustained effort. But it is an effort that delivers substantial business value. If you don’t believe me, ask somebody who is in the middle of their tenth three hundred thousand dollar data center outage this year. They will tell you: Complexity is the enemy.

Sign Up

I'll be glad to let you know when I have new blogs, white papers, or videos available. Sign up here.

References

[1] https://hp.symantec.com/system/files/b-state-of-data-center-survey-global-results-09_2012.en-us.pdf

[2] http://blogs.gartner.com/debra_logan/2010/01/11/what-is-information-governance-and-why-is-it-so-hard/

[3] ] http://www.datacenterknowledge.com/archives/2012/12/04/gartner-it-complexity-staffing/

Acknowledgements

The photo is by Route79 on Flickr, licensed through Creative Commons. (http://www.flickr.com/photos/route79/)

Notices

This blog is copyright by Roger Sessions. This blog may be copied and reproduced as long as no changes are made and his authorship is acknowledged. All other rights are reserved.

Tuesday, July 30, 2013

Gartner: Complexity Reduction One of Top Ten Technical Trends



Gartner recently came out with their Top Ten Strategic Technical Trends Through 2015. Gartner states that one of the Top Ten Trends is IT Complexity. IT Complexity, Gartner says, is a major inhibitor “of an enterprise to get the most out of IT money spent.”

I have been writing for the last ten years about the need to reduce complexity and the bottom-line benefits of doing so. Gartner quotes one of my early papers, The Mathematics of IT Simplification. In this paper I introduced a modern, rational understanding of IT Complexity.

Until this white paper appeared, IT Complexity was largely ignored by Enterprise Architects. This is despite the fact that IT Complexity was responsible for numerous IT disasters and was costing the world economy trillions of dollars per year. IT Complexity was still treated in a random, ad hoc manner if it was treated at all.

This White Paper laid a new foundation for the science of IT Complexity. This paper showed how to model IT complexity, how to measure it, and how to manage it using verifiable, reproducible methodologies that are based on solid mathematics.

If we want to take IT Complexity as seriously as Gartner says we should, we need to start by understanding it. This paper is where it all began. Gartner read it. You should too.

You can find the White Paper, The Mathematics of IT Simplification [here].
You can view the Gartner presentation, The Top 10 Strategic Technology Trends for 2013 [here].
You can view a 20 minute overview of The Snowman Architecture, my answer to IT Complexity, [here]

Photo by Alan in Belfast, via Flickr and Creative Commons.

Friday, July 5, 2013

The 3000 Year Old IT Problem


It was first described 3000 years ago by Sun Tzu in his timeless book, The Art of War.

We can form a single united body, while the enemy must split up into fractions. Hence there will be a whole pitted against separate parts of a whole, which means that we shall be many to the enemy's few.1

This is the first description of a problem solving technique that would be named 600 years later by Julius Caesar as divide et impera. Or, as we know it, divide and conquer.

The Art of War is frequently named as one of the top ten must reads in business2. I have long been fascinated by The Art of War and especially the divide and conquer strategy. Back in the days when I was a researcher at the National Cancer Institute, divide and conquer was a common research strategy for testing drugs. Divide and conquer is used extensively in business, economics, legal, politics, and social sciences, to name a few.

Oddly enough, the only field I can think of that divide and conquer has not been used successfully is IT. Whereas virtually every other field has been able to solve large complex problems using a divide and conquer strategy, IT is alone in its failure. Try to divide and conquer a large IT system, and you get an interconnected web of small systems that have so many interdependencies that is practically impossible to coordinate their implementation.

This is the reason so many large IT systems are swamped by IT complexity3. The one problem solving strategy that universally works against complexity seems to be inexplicably inept when it comes to IT.

How can this be? What is so special about IT that makes it unable to apply this strategy that has found such widespread applicability?

There are only two possible answers. The first is that IT is completely different from every other field of human endeavor. The second is that IT doesn't understand how divide and conquer works. I think the second explanation is more likely.

To be fair, it is not all IT's fault. Some of the blame must be shared by Julius Caesar. He is the one, remember, who came up with the misleading name divide and conquer (or, as he said it, divide et impera.) Unfortunately, IT has taken this name all too literally.

What is the problem with the name divide and conquer? The name seems to imply that we conquer big problems by breaking them down into smaller problems. In fact, this is not really what we do. What we do do is learn to recognize the natural divisions that exist within the larger problem. So we aren't really dividing as much as we are observing.

Let's take a simple example of divide and conquer: delivering mail.

The U.S. Postal service delivers around 600 million letters each day. Any address can send a letter to any other address. The number of possible permutations of paths for any given letter are astronomical. So how has the Postal Service simplified this problem?

They have observed the natural boundaries and population densities in different areas of the country and divided the entire country up into about 43,000 geographic chunks of roughly equal postal load. Then they have assigned a unique zip code to each chunk.

In the following picture, you can see a zip code map for part of New York City.
Partial Zip Code Map for New York City

The areas that are assigned to zip codes are not equal sized. They vary depending on the population density. The area that includes Rockefeller Center (10020) is very dense, so the area is small. The area where I grew up (10011) has medium density and the area assigned to the zip code is consequently average sized.

If we blow up my home zip code we can see some other features of the system. Here is 10011 enlarged:
Zip Code 10011
You can see that the zip code boundary makes some interesting zigs and zags. For example, the left most boundary winds around the piers of the Hudson River. Towards the bottom, the boundary takes a sharp turn to the South about half way through the southern boundary. This is to follow Greenwich Avenue. On the right side, the boundary follows Fifth Avenue for quite a while until we hit the relatively chaotic Flatiron district at 20th street.

So zip codes are not randomly assigned. They take into account population densities, street layouts, and natural boundaries. The main point here is that a lot of observation takes place before the first zip code is assigned.

Suppose we created the zip code map by simply overlaying a regular grid on top of New York City. We would end up with zip codes in the middle of the Hudson River, zip codes darting back and forth across Fifth Avenue, and some zip codes with huge population densities while others would consist of only a few bored pigeons.

You can see why the so-called divide and conquer algorithm is probably better named observe and conquer.

The general rule of thumb with observe and conquer is that your ability to solve a large complex problem is highly dependent on your ability to observe the natural boundaries that define the sub problems.

Let's consider one other example, this time from warfare.

In about 50 BCE, Julius Caesar and 60,000 troops completed the defeat of Gaul, a region consisting of at least 300,000 troops. How did Caesar conquer 300,000 troops with one fifth that number? He did it through divide and conquer. Although the Gallic strength was 300,000 in total, this number was divided into a number of tribes that had a history of belligerence among themselves. Caesar was able to form alliances with some tribes, pit other tribes against each other, and pick off remaining tribes one by one. But he was only able to do this through carefully observing and understanding the "natural" political tribal boundaries. Had Caesar laid an arbitrary grid over Gaul and attempted to conquer one square at a time, he could never have been successful.

This brings us to the reason that divide and conquer has been so dismally unsuccessful in IT. Because IT hasn't understood that observation must occur before division.

If you don't believe me, try this simple experiment. The next time an IT person, say John,  suggests breaking up a large IT project into smaller pieces, ask John this question: what observations do we need to make to determine the natural boundaries of the smaller pieces? I can pretty much guarantee how John will respond: with a blank look. Not only will John not be able to answer the question, the chances are he will have no idea what you are talking about.

With such a result, is it any wonder that divide and conquer almost never works in IT? It is as if you assigned zip codes by throwing blobs of paint at a map of New York City and assigning all of the red blobs to one zip code and all of the green blobs to another.

For the first time in the history of IT, we now have a solution to this problem; a scientific, reproducible, and verifiable approach to gathering the observations necessary to make divide and conquer work. I call this approach synergistic partitioning. It is the basis for the IT architectural approach called The Snowman Practice. And if you are ever going to try to build a multi-million dollar IT system, Snowmen is where you need to start. I can assure you, it will work a lot better than throwing blobs of paint at the wall.

You can start reading about The Snowman Practice [here]. Or contact me [here]. What do Snowmen have to do with defining divide and conquer boundaries? Ask me. I'll be glad to tell you.

Subscription Information

Your one stop signup for information any new white papers, blogs, webcasts, or speaking engagements by Roger Sessions and The Snowman Methodology is [here].

References

(1) 6:14, translation from http://suntzusaid.com
(2) See, for example, Inc.'s list [here].
(3) I have written about the relationship between complexity and failure rates in a number of places, for example, my Web Short The Relationship Between IT Project Size and Failure available [here].

Legal Notices and Acknowledgements

Photograph from brainrotting on Flickr via Creative Commons.

This blog and all of these blogs are Copyright 2013 by Roger Sessions. This blog may be copied and reproduced as long as it is not altered in any way and that full attribution is given.