The Elusive High Availability in the Digital Age

September 19, 2016 Jim D

Well, the summer is over, even if we have had great weather into September. My apologies for the delay in a new post, and I know I have several topic requests to fulfill 🙂 Given our own journey at Danske Bank on availability, I thought it was best to re-touch this topic and then come back around to other requests in my next posts. Enjoy and look forward to your comments!

It has been a tough few months for some US airlines with their IT systems availability. Hopefully, you were not caught up in the major delays and frustrations. Both Southwest and Delta suffered major outages in August and September. Add in power outages affecting equipment and multiple airlines recently in Newark, and you have many customers fuming over delays and cancelled flights. And the cost to the airlines was huge — Delta’s outage alone is estimated at $100M to $150M and that doesn’t include the reputation impact. And such outages are not limited to the US airlines, with British Airways also suffering a major outage in September. Delta and Southwest are not unique in their problems, both United and American suffered major failures and widespread impacts in 2015. Even with large IT budgets, and hundreds of millions invested in upgrades over the past few years, airlines are struggling to maintain service in the digital age. The reasons are straightforward:

At their core, services are based on antiquated systems that have been partially refitted and upgraded over decades (the core reservation system is from the 1960s)
Airlines have struggled earlier this decade to make a profit due to oil prices, and minimally invested in the IT systems to attack the technical debt. This was further complicated by multiple integrations that had to be executed due to mergers.
As they have digitalized their customer interfaces and flight checkout procedures, the previous manual procedures are now backup steps that are infrequently exercised and woefully undermanned when IT systems do fail, resulting in massive service outages.

With digitalization reaching even further into the customer interfaces and operations, airlines, like many other industries, must invest in stabilizing their systems, address their technical debt, and get serious about availability. Some should start with the best practices in the previous post on Improving Availability, Where to Start. Others, like many IT shops, have decent availability but still have much to do to get to first quartile availability. If you have made good progress but realize that three 9’s or preferably four 9’s of availability on your key channels is critical for you to win in the digital age this post covers what you should do.

Let’s start with the foundation. If you can deliver consistently good availability, then your team should already understand:

Availability is about quality. Poor availability is a quality issue. You must have a quality culture that emphasizes quality as a desired outcome and doing things right if you wish to achieve high availability.
Most defects — which then cause outages — are injected by change. Thus, strong change management processes that identify and eliminate defects are critical to further reduce outages.
Monitor and manage to minimize impact. A capable command center with proper monitoring feeds and strong incident management practices may not prevent the defect from occurring but it can greatly reduce the time to restore and the overall customer impact. This directly translates into higher availability.
You must learn and improve from the issues. Your incident management process must be coupled with a disciplined root cause analysis that ensures teams identify and correct underlying causes that will avoid future issues. This continuous learning and improvement is key to reaching high performance.

With this base understanding, and presumably with only smoldering areas of problems for IT shop left, there are excellent extensions that will enable your team to move to first quartile availability with moderate but persistent effort. For many enterprises, this is now a highly desirable business goal. Reliable systems translate to reliable customer interfaces as customers access the heart of most companies systems now through internet and mobile applications, typically on a 7×24 basis. Your production performance becomes very evident, very fast to your customers. And if you are down, they cannot transact, you cannot service them, your company loses real revenue, and more importantly, damages it’s reputation, often badly. It is far better to address these problems and gain a key edge in the market by consistently meeting or exceeding costumer availability expectations.

First, if you have moved up from regularly fighting fires, then just because outages are not everyday, does not mean that IT leadership no longer needs to emphasize quality. Delivering high quality must be core to your culture and your engineering values. As IT leaders, you must continue to reiterate the importance of quality and demonstrate your commitment to these values by your actions. When there is enormous time pressure to deliver a release, but it is not ready, you delay it until the quality is appropriate. Or you release a lower quality pilot version, with properly set customer and business expectations, that is followed in a timely manner by a quality release. You ensure adequate investment in foundational quality by funding system upgrades and lifecycle efforts so technical debt does not increase. You reward teams for high quality engineering, and not for fire-fighting. You advocate inspections, or agile methods, that enable defects to be removed earlier in the lifecycle at lower cost. You invest in automated testing and verification that enables work to be assured of higher quality at much lower cost. You address redundancy and ensure resiliency in core infrastructure and systems. Single power cord servers still in your data center? Really?? Take care of these long-neglected issues. And if you are not sure, go look for these typical failure points (another being SPOF network connections). We used to call these ‘easter eggs’, as in the easter eggs that no one found in a preceding year’s easter egg hunt and then you find the old, and quite rotten, easter egg on your watch. It’s no fun, but it is far better to find them before they cause an outage.

Remember that quality is not achieved by not making mistakes — a zero defect goal is not the target — instead, quality is achieved by a continuous improvement approach where defects are analyzed and causes eliminated, where your team learns and applies best practices. Your target goal should be 1st quartile quality for your industry, that will provide competitive advantage. When you update the goals, also revisit and ensure you have aligned the rewards of your organization to match these quality goals.

Second, you should build on your robust change management process. To get to median capability, you should have already established clear change review teams, proper change windows and moved to deliveries through releases. Now, use the data to identify which groups are late in their preparation for changes, or where change defects are clustered around and why. These understandings can improve and streamline the change processes (yes, some of the late changes could be due to too many approvals required for example). Further clusters of issues may be due to specific steps being poorly performed or inadequate tools. For example, often verification is done as cursory task and thus seldom catches critical change defects. The result is that the defect is then only discovered in production, hours later, when your entire customer base is trying but cannot use the system. Of course, it is likely such an outage was entirely avoidable with adequate verification because you would have known at the time of the change that it had failed and could have take action then to back out the change. The failed change data is your gold mine of information to understand which groups need to improve and where they should improve. Importantly, be transparent with the data, publish the results by team and by root cause clusters. Transparency improves accountability. As an IT leader, you must then make the necessary investments and align efforts to correct the identified deficiencies and avoid future outages.

Further, you can extend the change process by introducing production ready. Production ready is when a system or major update can be introduced into production because it is ready on all the key performance aspects: security, recoverability, reliability, maintainability, usability, and operability. In our typical rush to deliver key features or products, the sustainability of the system is often neglected or omitted. By establishing the Operations team as the final approval gate for a major change to go into production, and leveraging the production ready criteria, organizations can ensure that these often neglected areas are attended to and properly delivered as part of the normal development process. These steps then enable a much higher performing system in production and avoid customer impacts. For a detailed definition of the production ready process, please see the reference page.

Third, ensure you have consolidated your monitoring and all significant customer impacting problems are routed through an enterprise command center via an effective incident management process. An Enterprise Command Center (ECC) is basically an enterprise version of a Network Operations Center or NOC, where all of your systems and infrastructure are monitored (not just networks). This modern ECC also has capability to facilitate and coordinate triage and resolution efforts for production issues. An effective ECC can bring together the right resources from across the enterprise and supporting vendors to diagnose and fix production issues while providing communication and updates to the rest of the enterprise. Delivering highly available systems requires an investment into an ECC and the supporting diagnostic and monitoring systems. Many companies have partially constructed the diagnostics or have siloed war rooms for some applications or infrastructure components. To fully and properly handle production issues these capabilities must be consolidated and integrated. Once you have an integrated ECC, you can extend it by moving from component monitoring to full channel monitoring. Full channel monitoring is where the entire stack for a critical customer channel (e.g. online banking for financial services or customer shopping for a retailer) has been instrumented so that a comprehensive view can be continuously monitored within the ECC. The instrumentation is such that not only are all the infrastructure components fully monitored but the databases, middleware, and software components are instrumented as well. Further, proxy transactions can and are run at a periodic basis to understand performance and if there are any issues. This level of instrumentation requires considerable investment — and thus is normally done only for the most critical channels. It also requires sophisticated toolsets such as AppDynamics. But full channel monitoring enables immediate detection of issues or service failures, and most importantly, enables very rapid correlation of where the fault lies. This rapid correlation can take incident impact from hours to minutes or even seconds. Automated recovery routines can be built to accelerate recovery from given scenarios and reduce impact to seconds. If your company’s revenue or service is highly dependent on such a channel, I would highly recommend the investment. A single severe outage that is avoided or greatly reduced can often pay for the entire instrumentation cost.

Fourth, you cannot be complacent about learning and improving. Whether from failed changes, incident pattern analysis, or industry trends and practices, you and your team should always be seeking to identify improvements. High performance or here, high quality, is never reached in one step, but instead in a series of many steps and adjustments. And given our IT systems themselves are dynamic and changing over time, we must be alert to new trends, new issues, and adjust.

Often, where we execute strong root cause and followup, we end up focused only at the individual issue or incident level. This can be all well and good for correcting the one issue, but if we miss broader patterns we can substantially undershoot optimal performance. As IT leaders, we must always consider the trees and the forest. It is important to not just get focused on fixing the individual incident and getting to root cause for that one incident but to also look for the overall trends and patterns of your issues. Do they cluster with one application or infrastructure component? Does a supplier contribute far too many issues? Is inadequate testing a common thread among incidents? Do you have some teams that create far more defects than the norm? Are your designs too complex? Are you using the products in a mainstream or unique manner – especially if you are seeing many OS or product defects? Use these patterns and analysis to identify the systemic issues your organization must fix. They may be process issues (e.g. poor testing), application or infrastructure issues (e.g., obsolete hardware), or other issues (e.g., lack of documentation, incompetent staff). Discuss these issues and analysis with your management team and engineering leads. Tackle fixing them as a team, with your quality goals prioritizing the efforts. By correcting things both individually and systemically you can achieve far greater progress. Again, the transparency of the discussions will increase accountability and open up your teams so everyone can focus on the real goals as opposed to hiding problems.

These four extensions to your initial efforts will set your team on a course to achieve top quartile availability. Of course, you must couple these efforts with diligent engagement by senior management, adequate investment, and disciplined execution. Unfortunately, even with all the right measures, providing robust availability for your customers is rarely a straight-line improvement. It is a complex endeavor that requires persistence and adjustment along the way. But by implementing these steps, you can enable sustainable and substantial progress and achieve top quartile performance to provide business advantage in today’s 7×24 digital world.

If your shop is struggling with high availability or major outages, look to apply these practices (or send your CIO the link to this page 🙂 ).

Best, Jim Ditmore

The Key Growth Constraints on Your Business

February 22, 2015 Jim D

We have seen the accelerating impact of technology on a wide variety of industries in the past decade. We have witnessed the impact of internet shopping on retail industries, and seen the impact of digital content and downloads on the full spectrum of traditional media companies across books and newspapers as well as movies and games. Traditional enterprises are struggling to anticipate and stay abreast of the advances and changes. Even in those industries far away from the digital world, where they seem very physical, it is critical to leverage ‘smart’ technologies to improve output and products.

Let’s take logging and sawmills as an example. Certainly there have been physical advances from improved hydraulic systems to better synthetic ropes, but playing an increasingly prominent role is the use of digital technology to assist operators to drive complex integrated machines to optimize the entire logging and sawing process. The latest purpose-built forestry machines operate at the roadside or nearby off-road cutting logs from whole trees combining steps and eliminating manual labor. These integrated machines are guided on-board computers and electronic controls. This enables the operator to optimize log products which are machine cut by skillfully delimbing and “bucking” the whole trees into the best log lengths and loading them onto trucks. Subsequently, the logs are take to modern sawmills, where new scanning technologies and computers analyze each log and determine how to optimize the dimensional lumber cut from each log. Not only does this dramatically reduce manual labor and waste, but it improves safety and increases log product value by 20 or 30% from previous methods. And it is not just digital machinery leveraging computers to analyze and cut, but it is also mobile apps with mapping and image analysis so better decisions are made when and where to log in the field. When digitalization is revolutionizing even ‘physical’ industries like logging and sawmills, it is clear that the pace and potential to disrupt industries by applying information technology has increased dramatically. Below is a chart that represents the pace of disruption or ‘gain’ possible by digitalization over the mid-term horizon (7 to 10 years).

It is clear that digitalization has dramatically changed the travel and media industries already. Digital disruption has been moving down into other industries as either their components move from physical to digital (e.g., cameras, film) or industry leaders apply digital techniques to take advantage (e.g., Amazon, Ameritrade, Uber). Unfortunately, many companies do not have in place the key components necessary to apply and leverage technology to digitalize in rapid or disruptive ways. The two most important ingredients to successfully digitalize are software development capacity and business process engineering skill. Even for large companies with sizable IT budgets there are typically major constraints on both software development and business process engineering. And ample quantities of both are required for significant and rapid progress in digitalization.

Starting with software development, common constraints on this capability are:

a large proportion of legacy systems that consume an oversize portion of resources to maintain them
inadequate development toolsets and test environments
overworked teams with a focus on schedule delivery
problematic architectures that limit digital interfaces and delivery speed
software projects that are heavily oriented to incremental product improvement versus disruptive customer-focused efforts

And even if there are adequate resources, there must be a persistent corporate focus on the discipline, productivity and speed needed for breakout efforts.

Perhaps even more lacking are the necessary business process engineering in many companies. Here the issue is often not capacity or productivity but inadequate skill and improper focus. Most corporate investment agendas are controlled by ‘product’ teams whose primary focus is on incrementally improving their product’s features and capabilities rather than end to end service or process views that truly impact the customer. Further, process engineering skills are not a hallmark of service industry product teams. Most senior product leaders ‘grew up’ in a product focused environment, and unless they have a manufacturing background, usually do not have process improvement experience or knowledge. Typically, product team expertise lies primarily in the current product and its previous generations and not in the end-to-end process supporting the actual product. Too often the focus is on a next quarter product release with incremental features as opposed to fully reworking the customer interface from the customer’s point of view and reworking end-to-end the supporting business process to take full advantage of digitalization and new customer interfaces. There is far too much product tinkering versus customer experience design and business process engineering. Yet, the end-to-end process is actually what drives the digital customer experience versus the product features. Service firms that excel at the customer experience utilize the end-to-end process design from the customer viewpoint while taking full advantage of the digital opportunities. This yields a far better customer experience that is relatively seamless and easy. Further, the design normally incorporates a comprehensive interface approach that empowers each of the customer interaction points with the required knowledge about the customer and their next step. The end result is a compelling digital platform that enables them to win in the market.

As an IT leader certainly you must identify and sponsor the key digitalization projects for your company, but you must also build and develop the two critical capabilities to sustain digitalization. It is paramount that you build a software development factory that leverages modern methods on top of discipline and maturity so you have predictable and high quality software deliverables. And ensure you are building on an architecture that is both flexible and scalable so precious effort is not wasted on arcane internal interfaces or siloed features that must be replicated across your estate.

Work with your business partners to establish continuous process improvement and process engineering as desired and highly valued skills in both IT and the business team. Establish customer experience and user experience design as important competencies for product managers. Then take the most critical processes serving customers and revisit them from an end-to-end process view and a customer view. Use the data and analysis to drive the better holistic process and customer experience decisions, and you will develop far more powerful digital products and services.

Where is your team or your company on the digital journey? Do you have an abundance of software development or business process engineering skills and resources? Please share your perspective and experience in these key areas in our digital age.

Best, Jim Ditmore

Overcoming the Inefficient Technology Marketplace

August 29, 2014 Jim D

The typical IT shop spends 60% or more of its budget on external vendors – buying hardware, software, and services. Globally, the $2 trillion dollar IT marketplace (2013 estimate by Forrester) is quite inefficient where prices and discounts vary widely between purchasers and often not for reasons of volume or relationship. As a result, many IT organizations fail to effectively optimize their spend, often overpaying by 10%, 20%, or even much more.

Considering that IT budgets continue to be very tight, overspending your external vendor budget by 20% (or a total budget overrun of 12%) means that you must reduce the remaining 40% budget spend (which is primarily for staff) by almost 1/3 ! What better way to get more productivity and results from your IT team than to spend only what is needed for external vendors and plow these savings back into IT staff and investments or to the corporate bottom line?

IT expenditures are easily one of the most inefficient areas of corporate spending due to opaque product prices and uneven vendor discounts. The inefficiency occurs across the entire spectrum of technology purchases – not just highly complex software purchases or service procurements. I learned from my experience in several large IT shops that there is rarely a clear rationale for the pricing achieved by different firms other than they received what they competitively arranged and negotiated. To overcome this inefficient marketplace, the key prerequisite is to set up strong competitive playing fields for your purchases. With competitive tension, your negotiations will be much stronger, and your vendors will work to provide the best value. In several instances, when comparing prices and discounts between firms where I have worked that subsequently merged, it became clear that many IT vendors had no consistent pricing structures, and in too many cases, the firm that had greater volume had a worse discount rate than the smaller volume firm. The primary difference? The firm that robustly, competitively arranged and negotiated always had the better discount. The firms that based their purchases on relationships or that had embedded technologies limiting their choices typically ended up with technology pricing that was well over optimum market rates.

As an IT leader, to recapture the 6 to 12% of your total budget due to vendor overspend, you need to address inadequate technology acquisition knowledge and processes in your firm — particularly with your senior managers and engineers who are participating or making the purchase decisions. To achieve best practice in this area, the basics of a strong technology acquisition approach are covered here, and I will post on the reference pages the relevant templates that IT leaders can use to seed their own best practice acquisition processes. The acquisition processes will only work if you are committed to creating and maintaining competitive playing fields and not making decisions based on relationships. As a leader, you will need to set the tone with a value culture and focus on your company’s return on value and objectives – not the vendors’.

Of course, the technology acquisition process outlined here is a subset of the procurement lifecycle applied to technology. The technology acquisition process provides additional details on how to apply the lifecycle to technology purchases, leveraging the teams, and accommodating the complexities of the technology world. As outlined in the lifecycle, technology acquisition should then be complemented by a vendor management approach that repairs or sustains vendor performance and quality levels – this I will cover in a later post.

Before we dive into the steps of the technology acquisition process, what are the fundamentals that must be in place for it to work well? First, a robust ‘value’ culture must be in place. A ‘value’ culture is where IT management (at all levels) is committed to optimizing its company’s spending in order to make sure that the company gets the most for its money. It should be part of the core values of the group (and even better — a derivative of corporate values). The IT management and senior engineers should understand that delivering strong value requires constructing competitive playing fields for their primary areas of spending. If IT leadership instead allows relationships to drive acquisitions, then this quickly robs the organization of negotiating leverage, and cost increases will quickly seep into acquisitions. IT vendors will rapidly adapt to how the IT team select purchases — if it is relationship oriented, they will have lots of marketing events, and they will try to monopolize the decision makers’ time. If they must be competitive and deliver outstanding results, they will instead focus on getting things done, and they will try to demonstrate value. For your company, one barometer on how you are conduct your purchases is the type of treatment you receive from your vendors. Commit to break out of the mold of most IT shops by changing the cycle of relationship purchases and locked-in technologies with a ‘value’ culture and competitive playing fields.

Second, your procurement team should have thoughtful category strategies for each key area of IT spending (e.g. storage, networking equipment, telecommunications services). Generally, your best acquisition strategy for a category should be to establish 2 or 3 strong competitors in a supply sector such as storage hardware. Because you will have leveled most of the technical hurdles that prevent substitution, then your next significant acquisition could easily go to any of vendors . In such a situation, you can drive all vendors to compete strongly to lower their pricing to win. Of course, such a strong negotiating position is not always possible due to your legacy systems, new investments, or limited actual competitors. For these situations, the procurement team should seek to understand what the best pricing is on the market, what are the critical factors the vendor seeks (e.g., market share, long term commitment, marketing publicity, end of quarter revenue?) and then the team should use these to trade for more value for their company (e.g., price reductions, better service, long term lower cost, etc). This work should be done upfront and well before a transaction initiates so that the conditions favoring the customer in negotiations are in place.

Third, your technology decision makers and your procurement team should be on the same page with a technology acquisition process (TAP). Your technology leads who are making purchase decisions should be work arm in arm with the procurement team in each step of the TAP. Below is a diagram outlining the steps of the technology acquisition process (TAP). A team can do very well simply by executing each of the steps as outlined. Even better results are achieved by understanding the nuances of negotiations, maintaining competitive tension, and driving value.

Here are further details on each TAP step:

A. Identify Need – Your source for new purchasing can come from the business or from IT. Generally, you would start at this step only if it is a new product or significant upgrade or if you are looking to introduce a new vendor (or vendors) to a demand area. The need should be well documented in business terms and you should avoid specifying the need in terms of a product — otherwise, you have just directed the purchase to a specific product and vendor and you will very likely overpay.

B. Define Requirements – Specify your needs and ensure they mesh within the overall technology roadmap that the architects have defined. Look to bundle or gather up needs so that you can attain greater volumes in one acquisition to possibly gain better better pricing. Avoid specifying requirements in terms of products to prevent ‘directing’ the purchase to a particular vendor. Try to gather requirements in a rapid process (some ideas here) and avoid stretching this task out. If necessary, subsequent steps (including an RFI) can be used to refine requirements.

C. Analyze Options – Utilize industry research and high level alternatives analysis to down-select to the appropriate vendor/product pool. Ensure you maintain a strong competitive field. At the same time, do not waste time or resources for options that are unlikely.

D, E, F, G. Execute these four steps in concurrence. First, ensure the options will all meet critical governance requirements (risk, legal, security, architectural) and then drive the procurement selection process as appropriate based on the category strategy. As you narrow or extend options, conduct appropriate financial analysis. If you do wish to leverage proofs of concept or other trials, ensure you have pricing well-established before the trial. Otherwise, you will have far less leverage in vendor negotiations after it has been successful.

H. Create the contract – Leverage robust terms and conditions via well-thought out contract templates to minimize the work and ensure higher quality contracts. At the same time, don’t forgo the business objectives of price and quality and capability and trade these away for some unlikely liability term. The contract should be robust and fair with highly competitive pricing.

I. Acquire the Product – This is the final step of the procurement transaction and it should be as accurate and automated as possible. Ensure proper receivables and sign off as well as prompt payment. Often a further 1% discount can be achieved with prompt payment.

J & K. The steps move into lifecycle work to maintain good vendor performance and manage the assets. Vendor management will be covered in a subsequent post and it is an important activity that corrects or sustains vendor performance to high levels.

By following this process and ensuring your key decision makers set a competitive landscape and hold your vendors to high standards, you should be able to achieve better quality, better services, and significant cost savings. You can then plow these savings back into either strategic investment including more staff or reduce IT cost for your company. And at these levels, that can make a big difference.

What are some of your experiences with technology acquisition and suppliers? How have you tackled or optimized the IT marketplace to get the best deals?

I look forward to hearing your views. Best, Jim Ditmore

Looking to Improve IT Production? How to Start

August 18, 2013 Jim D

Production issues, as Microsoft and Google can tell you, impact even cloud email apps. A few weeks ago, Microsoft took an entire weekend to full recover its cloud Outlook service. Perhaps you noted the issues earlier this year in financial services where Bank of America experienced internet site availability issues. Unfortunately for Bank of America that was their second outage in 6 months, though they are not alone in having problems as Chase suffered a similar production outage on their internet services the week following. And these are regular production issues, not the unavailability of websites and services due to a series of DD0S attacks.

Perhaps 10 or certainly 15 years ago, such outages with production systems would have resulted in far less notice by their customers as the front office personnel would have worked alternate systems and manual procedures until the systems were restored. But with customers accessing the heart of most companies systems now through internet and mobile applications, typically on a 7×24 basis, it is very difficult to avoid direct and widespread impact to customers in the event of a system failure. Your production performance becomes very evident to your customers. And your customers’ expectations have continued to increase such that they expect your company and your services to be available pretty much whenever they want to use them. And while being available is not the only attribute that customers value (usability, feature, service and pricing factor in importantly as well) companies that consistently meet or exceed consumer availability expectations gain a key edge in the market.

So how do you deliver to current and future rising expectations around availability of your online and mobile services? And if both BofA and Chase, which are large organizations that offer dozens of services online and have massive IT departments have issues delivering consistently high availability, how can smaller organizations deliver compelling reliability?

And often, the demand for high availability must be achieved in an environment where ongoing efficiencies have eroded the production base and a tight IT labor market has further complicated obtaining adequate expertise. If your organization is struggling with availability or you are looking to achieve top quartile performance and competitive service advantage, here’s where to start:

First, understand that availability, at its root, is a quality issue. And quality issues can only be changed if you address all aspects. You must set quality and availability as a priority, as a critical and primary goal for the organization. And you will need to ensure that incentives and rewards are aligned to your team’s availability goal.

Second, you will need to address the IT change processes. You should look to implement an ITSM change process based on ITIL. But don’t wait for a fully defined process to be implemented. You can start by limiting changes to appropriate windows. Establish release dates for major systems and accompanying subsystems. Avoid changes during key business hours or just before the start of the day. I still remember the ‘night programmer’ at Ameritrade at the beginning of our transformation there. Staying late one night as CIO in my first month, I noticed two guys come in at 10:30 PM. When I asked what they did, they said ‘ We are the night programmers. When something breaks with the nightly batch run, we go in and fix it.’ And done with no change records, minimal testing and minimal documentation. Of course, my hair stood on end hearing this. We quickly discontinued that practice and instead made changes as a team, after they were fully engineered and tested. I would note that combining this action with a number of other measures mentioned here enabled us to quickly reach a stable platform that had the best track record for availability for all online brokerages.

Importantly, you should ensure that adequate change review and documentation is being done by your teams for their changes. Ensure they take accountability for their work and their quality. Drive to an improved change process with templates for reviews, proper documentation, back out plans, and validation. Most failed changes are due to issues with the basics: a lack of adequate review and planning, poor change documentation of deployment steps, or missing or ineffective validation, or one person doing an implementation in the middle of the night when you should have at least two people doing it together (one to do, and one to check).

Also, you should measure the proportion of incidents due to change. If you experience mediocre or poor availability and failed changes contribute to more than 30% of the incidents, you should recognize change quality is a major contributor to your issues. You will need to zero in on the areas with chronic change issues. Measure the change success rate (percentage of changes executed successfully without production incident) of your teams. Publish the results by team (this will help drive more rapid improvement). Often, you can quickly find which of your teams has inadequate quality because their change success rate ranges from a very poor mid-80s percentage to a mediocre mid-90s percentage. Good shops deliver above 98% and a first quartile shop consistently has a change success rate of 99% or better.

Third, ensure all customer impacting problems are routed through an enterprise command center via an effective incident management process. An Enterprise Command Center (ECC) is basically an enterprise version of a Network Operations Center or NOC, where all of your systems and infrastructure are monitored (not just networks). And the ECC also has capability to facilitate and coordinate triage and resolution efforts for production issues. An effective ECC can bring together the right resources from across the enterprise and supporting vendors to diagnose and fix production issues while providing communication and updates to the rest of the enterprise. Delivering highly available systems requires an investment into an ECC and the supporting diagnostic and monitoring systems. Many companies have partially constructed the diagnostics or have siloed war rooms for some applications or infrastructure components. To fully and properly handle production issues requires consolidating these capabilities and extending their reach. If you have an ECC in place, ensure that all customer impacting issues are fully reported and handled. Underreporting of issues that impact a segment of your customer base, or the siphoning off of a problem to be handled by a local team, is akin to trying to handle a house fire with a garden hose and not calling the fire department. Call the fire department first, and then get the garden hose out while the fire trucks are on their way.

Fourth, you must execute strong root cause and followup. These efforts must be at the individual issue or incident level as well as at a summary or higher level. It is important to not just get focused on fixing the individual incident and getting to root cause for that one incident but to also look for the overall trends and patterns of your issues. Are they cluster with one application or infrastructure component? Are they caused primarily by change? Does a supplier contribute far too many issues? Is inadequate testing a common thread among incidents? Are your designs too complex? Are you using the products in a mainstream or unique manner – especially if you are seeing many OS or product defects? Use these patterns and analysis to identify the systemic issues your organization must fix. They may be process issues (e.g. poor testing), application or infrastructure issues (e.g., obsolete hardware), or other issues (e.g., lack of documentation, incompetent staff). Track both the fixes for individual issues as well as the efforts to address systemic issues. The systemic efforts will begin to yield improvements that eliminate future issues.

These four efforts will set you on a solid course to improved availability. If you couple these efforts will diligent engagement by senior management and disciplined execution, the improvements will come slowly at first, but then will yield substantial gains that can be sustained.

You can achieve further momentum with work in several areas:

Document configurations for all key systems. If you are doing discovery during incidents it is a clear indicator that your documentation and knowledge base is highly inadequate.
Review how incidents are reported. Are they user reported or did your monitoring identify the issue first? At least 70% of the issues should be identified first by you, and eventually you will want to drive this to a 90% level. If you are lower, then you need to look to invest in improving your monitoring and diagnostic capabilities.
Do you report availability in technical measures or business measures? If you report via time based systems availability measures or number of incidents by severity, these are technical measures. You should look to implement business-oriented measures such as customer impact availability. to drive great transparency and more accurate metrics.
In addition to eliminating issues, reduce your customer impacts by reducing the time to restore service (Microsoft can certainly stand to consider this area given their latest outage was three days!). For mean time to restore (MTTR – note this is not mean time to repair but mean time to restore service), there are three components: teime to detect (MTTD), time to diagnose or correlation (MTTC), and time to fix (to restore service or MTTF). An IT shop that is effective at resolution normally will see MTTR at 2 hours or less for its priority issues where the three components each take about 1/3 of the time. If your MTTD is high, again look to invest in better monitoring. If your MTTC is high look to improve correlation tools, systems documentation or engineering knowledge. And if your MTTF is high, again look to improve documentation or engineering knowledge or automate recovery procedures.
Consider investing in greater resiliency for key systems. It may be that customer expectations of availability exceed current architecture capabilities. Thus, you may want to invest in greater resiliency and redundancy or build a more highly available platform.

As you can see, providing robust availability for your customers is a complex endeavor. By implementing these steps, you can enable sustainable and substantial progress to top quartile performance and achieve business advantage in today’s 7×24 world.

What would you add to these steps? What were the key factors in your shop’s journey to high availability?

Best, Jim Ditmore

IT Security in the Headlines – Again

October 13, 2012 Jim D

Again. Headlines are splashed across front pages and business journals where banks, energy companies, and government web sites have been attacked. As I called out six months ago, the pace, scale and intensity of attacks had increased dramatically in the past year and was likely to continue to grow. Given one of the most important responsibilities of a CIO and senior IT leaders is to protect the data and services of the firm or entity, security must be a bedrock capability and focus. And while I have seen a significant uptick in awareness and investment in security over the past 5 years, there is much more to be done at many firms to reach proper protection. Further, as IT leaders, we must understand IT is in deadly arms race that requires urgent and comprehensive action.

The latest set of incidents are DD0S attacks against US financial institutions. These have been conducted by Muslim hacker groups purportedly in retaliation for the Innocence of Muslims film. But this weekend’s Wall Street Journal outlined that the groups behind the attacks are sponsored by the Iranian government – ‘the attacks bore “signatures” that allowed U.S. investigators to trace them to the Iranian government’. This is another expansion of the ‘advanced persistent threats’ or APTs that now dominate hacker activity. APTs are well-organized, highly capable entities funded by either governments or broad fraud activities that enables them to carry out hacking activities at unprecedented scale and sophistication. As this wave of attacks migrates from large financial institutions like JP Morgan Chase and Wells Fargo to mid-sized firms, IT departments should be rechecking their defenses against DD0S as well as other hazards. If you do not already have explicit protection against DDoS, I recommend leveraging a carrier network-based DDoS service as well as having a third party validate your external defenses against penetration. While the stakes currently appear to be a loss of access to your websites, any weaknesses found by the attackers will invariably be subsequently exploited for fraud and potential data destruction. This is exactly the path of the attacks against energy companies including Saudi Aramco that recently preceded the financial institutions attack wave. And no less than Leon Panetta spoke about the most recent attacks and consequences. As CIO, your firm cannot be exposed as lagging in this arena without possible significant impact to reputation, profits, and competitiveness.

So, what are the measures you should take or ensure are in place? In addition to the network-based DDoS service mentioned above, you should implement these fundamental security measures first outlined in my April post and then consider the advanced measures to keep pace in the IT security arms race.

Fundamental Measures:

1. Establish a thoughtful password policy. Sure, this is pretty basic, but it’s worth revisiting and a key link in your security. Definitely require that users change their passwords regularly, but set a reasonable frequency–any less than three months and users will write their passwords down, compromising security. As for password complexity, require at least six characters, with one capital letter and one number or other special character.

2. Publicize best security and confidentiality practices. Do a bit of marketing to raise user awareness and improve security and confidentiality practices. No security tool can be everywhere. Remind your employees that security threats can follow them home from work or to work from home.

3. Install and update robust antivirus software on your network and client devices. Enough said, but keep it up-to-date and make it comprehensive (all devices).

4. Review access regularly. Also, ensure that all access is provided on a “need-to-know” or “need-to- do” basis. This is an integral part of any Sarbanes-Oxley review, and it’s a good security practice as well. Educate your users at the same time you ask them to do the review. This will reduce the possibility of a single employee being able to commit fraud resulting from retained access from a previous position.

5. Put in place laptop bootup hard drive encryption. This encryption will make it very difficult to expose confidential company information via lost or stolen http://www.buyambienmed.com laptops, which is still a big problem. Meanwhile, educate employees to avoid leaving laptops in their vehicles or other insecure places.

6. Require secure access for “superuser” administrators. Given their system privileges, any compromise to their access can open up your systems completely. Ensure that they don’t use generic user IDs, that their generic passwords are changed to a robust strength, and that all their commands are logged (and subsequently reviewed by another engineering team and management). Implement two-factor authentication for any remote superuser ID access.

7. Maintain up-to-date patching. Enough said.

8. Encrypt critical data only. Any customer or other confidential information transmitted from your organization should be encrypted. The same precautions apply to any login transactions that transmit credentials across public networks.

9. Perform regular penetration testing. Have a reputable firm test your perimeter defenses regularly.

10. Implement a DDoS network-based service. Work with your carriers to implement the ability to shed false requests and enable you to thwart a DDoS attack.

Advanced Practices:

a. Provide two-factor authentication for customers. Some of your customers’ personal devices are likely to be compromised, so requiring two-factor authentication for access to accounts prevents easy exploitation. Also, notify customers when certain transactions have occurred on their accounts (for example, changes in payment destination, email address, physical address, etc.).

b. Secure all mobile devices. Equip all mobile devices with passcodes, encryption, and wipe clean. Encrypt your USD flash memory devices. On secured internal networks, minimize encryption to enable detection of unauthorized activity as well as diagnosis and resolution of production and performance problems.

c. Further strengthen access controls. Permit certain commands or functions (e.g., superuser) to be executed only from specific network segments (not remotely). Permit contractor network access via a partitioned secure network or secured client device.

d. Secure your sites from inadvertent outside channels.Implement your own secured wireless network, one that can detect unauthorized access, at all corporate sites. Regularly scan for rogue network devices, such as DSL modems set up by employees, that let outgoing traffic bypass your controls.

e. Prevent data from leaving. Continuously monitor for transmission of customer and confidential corporate data, with the automated ability to shut down illicit flows using tools such as NetWitness. Establish permissions whereby sensitive data can be accessed only from certain IP ranges and sent only to another limited set. Continuously monitor traffic destinations in conjunction with a top-tier carrier in order to identify traffic going to fraudulent sites or unfriendly nations.

f. Keep your eyes and ears open. Continually monitor underground forums (“Dark Web”) for mentions of your company’s name and/or your customers’ data for sale. Help your marketing and PR teams by monitoring social networks and other media for corporate mentions, providing a twice-daily report to summarize activity.

g. Raise the bar on suppliers. Audit and assess how your company’s suppliers handle critical corporate data. Don’t hesitate to prune suppliers with inadequate security practices. Be careful about having a fully open door between their networks and yours.

h. Put in place critical transaction process checks. Ensure that crucial transactions (i.e., large transfers) require two personnel to execute, and that regular reporting and management review of such transactions occurs.

i. Establish 7×24 security monitoring. If your firm has a 7×24 production and operations center, you should supplement that team with security operations specialists and capability to monitor security events across your company and take immediate action. And if you are not big enough for a 7×24 capability, then enlist a reputable 3rd party to provide this service for you.

I recommend that you communicate the seriousness of these threats to your senior business management and ensure that you have the investment budget and resources to implement these measures. Understand the measures above will bring you current but you will need to remain vigilant given the arms race underway. Ensure your 2013 budget allows further investment, even if as a placeholder. For those security pros out there, what else would you recommend?

In the next week, I will outline recommendations on cloud which I think could be very helpful given the marketing hype and widely differing services and products now broadcast as ‘cloud’ solutions.

Best, Jim Ditmore

Both Sides of the Staffing Coin: Building a High Performance Team -and- Building a Great IT Career

October 2, 2012 Jim D

I find it remarkable that despite the slow recovery the IT job market remains very tight. This poses significant hurdles for IT managers looking to add talent. In the post below, I cover how to build a great team and team into good seams of talent. I think this will be a significant issue for IT managers for the next three to four years – finding and growing talent to enable them to build high performance teams.

And for IT staffers, I have mapped out seasoned advice on how to build your capabilities and experience to enable you to have a great career in IT. Improving IT staff skills and capabilities is of keen interest not to just the staff, but also to IT management so that their team is much more productive and capable. And on a final note, I would suggest that anyone who is in the IT field should consider reaching out to high schoolers and college students and encourage them to consider a career in IT. Currently, in the US, there are fewer IT graduates each year than IT jobs that open. And this gap is expected to widen in the coming years. So IT will continue to be a good field for employees, and IT leaders will need to encourage others to join in so we can meet the expected staffing needs.

Please do check out both sides of the coin, and I look forward to your perspectives. Note that I did publish variants on these posts in InformationWeek over the past few months.

Best, Jim Ditmore

Building a High Performance Team Despite 4% IT Unemployment

Despite a national unemployment rate of more than 8%, the overall IT unemployment rate is at a much lower 4% or less. Further, the unemployment rates for IT specialties such as networking, IT security or data base are even lower — at 1% or less. This makes finding capable IT staff difficult and is compounded because IT professionals are less likely to take new opportunities (turnover rates are much less than average over the past 10 years). Unfortunately these tough IT staffing conditions are likely to continue and perhaps be exacerbated if the recovery actually picks up pace. With such a tight IT job market, how do you build or sustain your high performance IT team?

I recommend several tactics to incorporate into your current staffing approach that should allow you to improve your current team and acquire the additional talent needed for your business to compete. Let’s focus first on acquiring talent. In a tight market you must always be present to enable you to acquire the talent when they first consider looking for a position. You must move to a ‘persistent’ recruiting mode. If your group is still only opening positions after someone leaves or after a clear funding approval is granted, you are late to the game. Given the extended recruiting times, you will likely not acquire the staff in time to meet your needs. Nor will you consistently be on the market when candidates are seeking employment. Look instead to do ‘pipeline recruiting’. That is, for those common positions that you know you will need over the next 12 months, set up an enduring position, and have your HR team persistently recruit for these ‘pipeline positions’. Good examples would be Java or mobile developers, project managers, network engineers, etc. Constantly recruit, interview and when you find an ‘A’ caliber candidate, hire them — whether you have the exact position open or not. You can be certain that you will need the talent, so hire them and put them on the next appropriate project to be worked on from your demand list. Not only will you now have talent sourced and available when you need it because you are always out in the market, you will develop a reputation as a place where talent is sought and you will have an edge when those ‘A’ players who seldom look for work in the market, decide to seek a new opportunity.

Another key tactic is to extend the pipeline recruiting to interns and graduates. Too many firms only look for experienced candidates and neglect this source. In many companies, graduates can be a key long term source of their best senior engineers. Moreover, they can often contribute much more than most managers give them credit, especially if you have good onboarding programs and robust training and education offerings for your staff. I have seen uplifting results for legacy teams when they have brought on bright, enthusiastic talent and combined it with their experienced engineers — everyone’s performance often lifts. They will bring energy to your shop and we will have the added dividend of increasing the pool of available, experienced talent. And while it will take 7 to 15 years for them to become the senior engineers and leaders of tomorrow, they will be at your company, not at someone else’s (if you don’t start, you will never have them).

The investment in robust training and education for graduates should pay off also for your current staff and potential hires. Your current staff, by leveraging training, can improve their skills and productivity. And for potential hires, an attractive attribute of a new company is a strong training program and focus on staff development. These are wise investments as they will pay back in higher productivity and engagement, and greater retention and attraction of staff. You should couple the training program with clearly defined job positions and career paths. These should spell out for your team what the competencies and capabilities of both their current position as well as what is needed to move to the next step in their career. Their ability to progress with clarity will be a key advantage in your staff’s growth and retention as well as attracting new team members. And in a tight job market, this will let your company stand out in the crowd.

Another tactic to apply is to leverage additional locations to acquire talent. If you limit yourself to one or a few metropolitan areas, you are limiting the potential IT population you are drawing from. Often, you can use additional locations to tap entirely new sources of talent at potentially lower costs than your traditional locations. Given the lower mobility of today’s candidates, it may effective to open a location in the midwest, in rustbelt cities with good universities or cities such as Charlotte or Richmond. Such 2nd tier cities can harbor surprisingly strong IT populations that have lower costs and better retention than 1st tier locations like California or Boston or New York. The same is true of Europe and India. Your costs are likely to be 20 to 40% less than headline locations, with attrition rates perhaps 1/3 less.

And you can go farther afield as well. Nearshore and offshore locations from Ireland to Eastern Europe to India should be considered. Though again, it is worth avoiding the headline locations and going to places like Lithuania or Romania, or 2nd tier cities in India or Poland. You should look to tap the global IT workforce and gain advantage through diverse talent, ability to work longer through a ‘follow the sun’ approach, and optimized costs and capacity. Wherever you go though, you will need to enable an effective distributed workforce. This requires a minimum critical mass in each site, proper allocation of activities in a holistic manner, robust audio and video conferencing capabilities, and effective collaboration and configuration management tools. If done well, a global workforce can deliver more at lower costs and with better skills and time to market. For large companies, such a workforce is really a mandatory requirement to achieve competitive IT capabilities. And to some degree, you could say IT resources are like oil, you go wherever in the world you can to find and acquire them.

Don’t forget to review your recruiting approach as well. Maintain high standards and ensure you select the right candidates through using effective interviewing and evaluation techniques. Apply a metrics-based improvement approach to your recruiting process. What is the candidate yield on each recruiting method? Where are your best candidates coming from? Invest more in recruiting approaches that yield good numbers of strong candidates. One set of observations from many years of analyzing recruiting results: your best source of strong candidates is usually referrals and weak returns typically come from search firms and broad sweep advertising. Building a good reputation in the marketplace to attract strong candidates takes time, persistence, and most important, an engaging and rewarding work environment.

With those investments, you will be able to recruit, build and sustain a high performance team even in the tightest of markets. While I know this is a bit like revealing your favorite fishing spot, what other techniques have you been able to apply successfully?

Best, Jim Ditmore

Riding with the Technology Peloton

August 14, 2012 Jim D

One of the most important decisions that technology leaders make is when to strike out and leverage new and unique technologies for competitive advantage and when to stay with the rest of the industry and stay on a common technology platform. Nearly every project and component contains a micro decision of the custom versus common path. And while it is often easy to have great confidence in our ability and capacity to build and integrate new technologies, the path of striking out on new technologies ahead of the crowd is often much harder and has less payback than we realize. In fact, I would suggest that the payback is similar to what occurs during cycling’s Tour de France: many, many riders strike out in small groups to beat the majority of cyclists (or peloton), only to be subsequently caught by the peloton but with enormous energy expended, fall further behind the pack.

In the peloton, everyone is doing some of the work. The leaders of the peloton take on the most wind resistance but rotate with others in pack so that work is balanced. In this way the peloton can move as quickly as any cyclist can individually but at 20 or 30% less energy due to much less wind resistance. Thus, with energy conserved, later in the race, the peloton can move much faster than individual cyclists. Similarly, in developing a new technology or advancing an existing technology, with enough industry mass and customers (a peloton), the technology can be advanced as quickly or more than quickly than an individual firm or small group and at much less individual cost. Striking out on your own to develop highly customized capabilities (or in concert with a vendor) could leave you with a high cost capability that provides a brief competitive lead only to be quickly passed up by the technology mainstream or peloton.

If you have ever watched one of the stages of the Tour de France, what can be most thrilling is to see a small breakaway group of riders trying to build or preserve their lead over the peloton. As the race progresses closer to the finish, the peloton relentlessly (usually) reels in and then passes the early leaders because of its far greater efficiency. Of course, those riders who time it correctly and have the capacity and determination to maintain their lead can reap huge time gains to their advantage.

Similarly, I think, in technology and business, you need to choose your breakaways wisely. You must identify where you can reap gains commensurate with the potential costs. For example, breaking away on commodity infrastructure technology is typically not wise. Plowing ahead and being the first to incorporate the latest in infrastructure or cloud or data center technology where there is little competitive advantage is not where you should invest your energy (unless that is your business). Instead, your focus should be on those areas where an early lead can be driven to business advantage and then sustained. Getting closer to your customer, being able to better cross-sell to them, significantly improving cycle time or quality or usability or convenience, or being first to market with a new product — these are all things that will win in the marketplace and customers will value. That is where you should make your breakaway. And when you do look to customize or lead the pack, understand that it will require extra effort and investment and be prepared to make and sustain it.

And while I caution selecting the breakaway course, particular in this technology environment where industry change is on an accelerated cycle already, I also caution against being in the back of the peloton. There, just as in the Tour de France when you are lagging and in the back, it is too easy to be dropped by the group. And once you drop from the peloton, you must now work on your own to work even harder just to get back in with the peloton. Similarly, once an IT shop falls significantly behind the advance of technology, and loses pace with its peers, further consequence incur. It becomes harder to recruit and retain talent because the technology is dated and the reputation is stodgy. Extra engineering and repair work must be done to patch older systems that don’t work well with newer components. And extra investment must be justified with the business to ‘catch’ technology back up. So you must keep the pace with the peloton, and even better be a leader among your peers in technology areas of potential competitive advantage. That way, when you do see a breakaway opportunity for competitive advantage you are positioned to make it.

The number of breakaways you can do of course depends on the size of your shop and the intensity of IT investment in your industry. The larger you are, and the greater the investment, the more breakaways you can afford. But make sure they are truly competitive investments with strong potential to yield benefits. Otherwise you are far better off ensuring you stay at the front of the peloton leveraging best-in-class practices and common but leading technology approaches. Or as an outstanding CEO that I worked for once said ‘There should be no hobbies’. Having a cool lab environment without rigorous business purpose and ongoing returns (plenty of failures are fine as long as there are successful projects as well) is a breakaway with no purpose.

I am sure there are some experienced cyclists among our readers — how does this resonate? What ‘breakaways’ worked for you or your company? Which ones got reeled in by the industry peloton?

I look forward to hearing from you.

Best, Jim Ditmore

Outsourcing and Out-tasking Best Practices

July 30, 2012 Jim D

I recently published this post first at InformationWeek and it generated quite a few comments, both published and several sent directly via e-mail. I would note that a strong theme is the frustration of talented staff dealing with senior leadership that does not understand how IT works well or do not appear to be focused on the long term interests of the company. It is a key responsibility of leadership to ensure they keep these interests at the core of their approach, especially when executing complex efforts like outsourcing or offshoring so that they do achieve benefits and do not harm their company. I think the national debate that is occurring at this time as well with Romney and Obama only serves to show how complex executing these efforts are. As part of a team, we were able to adjust and resolve effectively many different situations and I have extracted much of that knowledge here. If you are looking to outsource or are dealing with an inherited situation, this post should assist you in improving your approach and execution.

While the general trend of more IT outsourcing but via smaller, more focused deals continues, it remains an area that is difficult for IT management to navigate successfully. In my experience, every large shop that I have turned around had significant problems caused or made worse by the outsourcing arrangement, particularly large deals. While understanding that these shops performed poorly for primarily other reasons (leadership, process failures, talent issues), achieving better performance in these situations required substantial revamp or reversal of the outsourcing arrangements. And various industries continue to be littered with examples of failed outsourcing, many with leading outsource firms (IBM, Accenture, etc) and reputable clients. While formal statistics are hard to come by (in part because companies are loathe to report failure publicly), my estimate is that at least 25% and possibly more than 50% fail or perform very poorly. Why do the failures occur? And what should you do when engaging in outsourcing to improve the probability of success?

Much of the success – or failure – depends on what you choose to outsource followed by effectively managing the vendor and service. You should be highly selective on both the extent and the activities you chose for outsourcing. A frequent mistake is the assumption that any activity that is not ‘core’ to a company can and should be outsourced to enable focus on the ‘core’ competencies. I think this perspective originates from principles first proposed in The Discipline of Market Leaders by Michael Treacy and Fred Wisrsema. In essence, Treacy and Wisrsema state that companies that are market leaders do not try to be all things to all customers. Instead, market leaders recognize their competency either in product and innovation leadership, customer service and intimacy, or operational excellence. Good corporate examples of each would be 3M for product, Nordstrom for service, and FedEx for operational excellence. Thus business strategy should not attempt to excel at all three areas but instead to leverage an area of strength and extend it further while maintaining acceptable performance elsewhere. And by focusing on corporate competency, the company can improve market position and success. But generally IT is absolutely critical to improving customer knowledge intimacy and thus customer service. Similarly, achieving outstanding operational competency requires highly reliable and effective IT systems backing your operational processes. And even in product innovation, IT plays a larger and large role as products become more digital and smarter.

Because of this intrinsic linkage to company products and services, IT is not like a security guard force, nor like legal staff — two areas that are commonly fully or highly outsourced (and generally, quite successfully). And by outsourcing intrinsic capabilities, companies put their core competency at risk. In a recent University of Utah business school article, the authors found significantly higher rates of failure of firms who had outsourced. They concluded that “companies need to retain adequate control over specialized components that differentiate their products or have unique interdependencies, or they are more likely to fail to survive.” My IT best practice rule is ‘ You must control your critical IP (intellectual property)’. If you use an outsourcer to develop and deliver the key features or services that differentiate your products and define your company’s success, then you likely have someone doing the work with different goals and interests than you, that can typically easily turn around and sell advances to your competitors. Why would you turn over your company’s fate to someone else? Be wary of approaches that recommend outsourcing because IT is not a ‘core’ competency when with every year that passes, there is greater IT content in products in nearly every industry. Chose instead to outsource those activities where you do not have scale (or cost advantage), or capacity or competence, but ensure that you either retain or build the key design, integration, and management capabilities in-house.

Another frequent reason for outsourcing is to achieve cost savings. And while most small and mid-sized companies do not have the scale to achieve cost parity with a large outsourcer, nearly all large companies, and many mid-sized do have the scale. Further, nearly every outsourcing deal that I have reversed in the past 20 years yielded savings of at least 30% and often much more. Cost savings can only be accomplished by an outsourcer for a large firm for a broad set of services if the current shop is a mediocre shop. If you have a well-run shop, your all-in costs will be similar to the better outsource firms’ costs. If you are world-class, you can beat the outsourcer by 20-40%.

Even more, the outsourcer’s cost difference typically degrades over time. Note that the goals of the outsourcer are to increase revenue and margin (or increase your costs and spend less resources doing your work). Invariably, the outsourcer will find ways to charge you more, usually for changes to services and minimize work being done. And previously, when you had used your ‘run’ resources to complete minor fixes and upgrades, you could find you are charged for those very same resources for such efforts once outsourced. I have often seen that ‘run’ functions will be hollowed out and minimized and the customer will pay a premium for every change or increase in volume. And while the usual response to such a situation is that the customer can put terms in the contract to avoid this, I have yet to see such terms that ensure the outsourcer works in your best interest to do the ‘right’ thing throughout the life of the contract. One interesting example that I reversed a few years back was an outsourced desktop provisioning and field support function for a major bank (a $55M/year contract). When an initial (surprise) review of the function was done, there were warehouses full of both obsolete equipment that should have been disposed and new equipment that should have been deployed. Why? Because the outsourcer was paid to maintain all equipment whether in use in the offices or in a warehouse, and they had full control of the logisitics function (here, the critical IP). So, they had ordered up their own revenue in effect. Further, the service had degraded over the years as the initial workforce had been hollowed out and replaced with less qualified individuals. The solution? We immediately in-sourced back the logistics function to a rebuilt in-house team with cost and quality goals established. Then we split the field support geography and conducted a competitive auction to select two firms to handle the work. Every six months each firm’s performance would be evaluated for quality, timeliness and cost and the higher performing firm would gain further territory. The lower performing firm would lose territory or be at risk of replacement. And we maintained a small but important pool of field support experts to ensure training and capabilities were kept up to par and service routines were updated and chronic issues resolved. The end result was far better quality and service, and the cost of the services were slashed by over 40% (from $55M/year to less than $30M/year). And these results — better quality at lower costs — from effective management of the functions and having key IP and staff in-house are the typical results achieved with similar actions across a wide range of services, organizations and locales.

When I was at BankOne, working under Jamie Dimon and his COO Austin Adams, they provided the support for us to tackle bringing back in what had been the largest outsourcing deal ever consummated at its time in 1998. Three years after the outsource had started, it had become a millstone around BankOne’s neck. Costs had been going up every year, quality continued to erode to where systems availability and customer complaints became worst in the industry. In sum, it was a burning platform. In 2001 we cut the deal short (it was scheduled to run another 4 years). In the next 18 months, after hiring 2200 infrastructure staff (via best practice talent acquisition), revamping the processes and infrastructure, we reduced defects (and downtime) to 1/20th of the levels in 2001 and reduced our ongoing expenses by over $200M per year. This supported significantly the bank’s turnaround and enabled the merger with JP Morgan a few years later. As for having in-house staff do critical work, Jamie Dimon said it best with ‘Who do you want doing your key work? Patriots or mercenaries?’

Delivering comparable cost to an outsourcer is not that difficult for mid to large IT shops. Note that the outsourcer must include a 20% margin in their long term costs (though they may opt to reduce profits in the first year or two of the contract) as well as an account team’s costs. And, if in Europe, they must add 15 to 20% VAT. Further, they will typically avoid making the small investments required for continuous improvement over time. Thus, three to five years out, nearly all outsourcing arrangements cost 25% to 50% more than a well-run in-house service (that will have the further benefit of higher quality). You should set the bar that your in-house services can deliver comparable or better value than typical out-sourced alternatives. But ensure you have the leadership in place and provide the support for them to reach such a capability.

But like any tool or management approach, used properly and in the right circumstances, outsourcing is a benefit to the company. As a leader you cannot focus on all company priorities at once, nor would you have the staff even if you could, to deliver. And in some areas such as field support there are natural economies of scale that benefit a third party doing the same work for many companies. So consider outsourcing in these areas but the extent of the outsource carefully. Ensure that you still retain critical IP and control. Or use it to augment and increase your capacity, or where you can leverage best-in-class specialized services to your company’s benefit. Then, once selected and effectively negotiated, manage the outsourcing vendor effectively. Since effective management of large deals is complex and nearly impossible, it is far better to do small outsourcing deals or selective out-tasking. The management of the outsourcing should be handled like any significant in-house function, where SLAs are established and proper operational metrics are gathered, performance is regularly reviewed with management and actions are noted and tracked to address issues or improve service. Properly constructed contracts that accommodate potential failure are key if things do not go well. Senior management should jointly review the service every 3 to 6 months, and consequences must be in place for performance (good or bad).

Well-selected and managed outsourcing will then complement your in-house team with more traditional approaches that leverage contractors for peak workloads or projects or the modern alternative to use cloud services and out-task some functions and applications. With these best practices in place and with a selective hand, your IT shop and company can benefit from outsourcing and avoid the failures.

What experiences have you had with outsourcing? Do you see improvement in how companies leverage such services? I look forward to your comments.

Best, Jim Ditmore

Achieving Outstanding IT Strategy

June 20, 2012 Jim D

Developing your IT strategy should be based on a thoughtful, ongoing process. Too often, strategy is developed as a one time event (typically with consultants) or is a hurried episode following a corporate vision statement that has been handed down. A considered approach, where there is robust industry and technology trend analysis coupled with a two way dialogue on business strategy can yield much better results. I have mapped out below a best practice strategy process that I have leveraged in previous organizations that will ensure a strong connection with the business strategy, leverage of technology trends and clear cascade into effective goals and plans. With such a process in hand, the senior technology leader should be able to both drive a better IT strategy, and importantly, an improved business strategy.

The IT strategy process should start with two sets of research and analysis that interplay: a full review of the business strategy and a comprehensive survey of the key technology trends, opportunities and constraints. It is critical that the business strategy should drive the technology strategy but aspects of the business strategy can and should be driven by the technology. Utilize the technology trend analysis as well as the understanding of the key strengths and weaknesses of the current technology platform to as a feedback loop into the business strategy.

When working with the business, to help them hone their strategy, I recommend leveraging a corporate competency approach from The Discipline of Market Leaders by Michael Treacy and Fred Wisrsema. In essence, Treacy and Wisrsema state that companies that are market leaders do not try to be all things to all customers. Instead, market leaders recognize their competency either in product and innovation leadership, customer service and intimacy, or operational excellence. Good corporate examples of each would be 3M for product, Nordstrom for service, and FedEx for operational excellence. Thus your business strategy should not attempt to excel at all three areas but instead to leverage your area of strength and extend it further while maintaining acceptable performance elsewhere. This focus is particularly valuable when working to prioritize an overly broad and ambitious business strategy.

Below is a diagram that maps out this strategy process or cascade:

The process anticipates that the corporate strategy will drive multiple business unit strategies that IT will then support. It is appropriate to develop the business unit technology strategies that will operate in concert with both the business unit strategy and the corporate technology strategy. Once the strategies are established, it is then critical to define the technology roadmao for each business unit. The roadmap can be viewed as a snapshot of the critical technology capabilities and systems every 3 or 6 months for the next two years that provides a definitive plan of how the business unit’s technology will evolve and be delivered to meet the business requirements. These roadmaps should be tied into and should support an overall technology reference architecture for the corporation. This ensure that the technology roadmaps will work in concert with each other and enable critical corporate capabilities such as understanding the entire relationship with a customer across products and business units.

I recommend executing the full process on an annual basis, synchronous with the corporate planning cycle with quarterly updates to the roadmaps. It is also reasonable to update the the technology trends and business unit strategies on a six month basis with additional data and results.

What would you add to this strategic planning approach? Have you leveraged different approaches that worked well?

Best, Jim Ditmore

Getting Things Done: A Key Leadership Skill

May 22, 2012 Jim D

It is a bit ironic that this post has taken me twice as long to do as my average post. But while it is an important topic, it is difficult to pinpoint, of all the practices you can leverage, which ones really help you or your team or organization get the right things done. So, just before the Memorial Day holiday, here is a post to help you execute better for the rest of the year and meet those goals.

Have a great holiday weekend. Jim

Getting things done is a hallmark of effective teams. Unfortunately, the focus and flow of large business organizations combined with influences of the modern world erode our ability to get the right things done. To raise the productivity to a high performance team, as a senior leader, you should impart an ability to get the right things done at the divisional and team level within your organization. And while there are myriad reasons that conspire to reduce our focus or effectiveness, there are a number of techniques and practices that can greatly improve the selection and capacity at all levels: at the overall organization or division, at the working team level, and for the individual.

Realize that the same positive forces that ensure a focus on business goals, drive consensus within an organization, or require risk and control to be addressed, can also be mis- or over-applied and result in organizational imbalance or gridlock. Coupled with too much waterfall or ‘big bang’ approaches and you can get not just ineffectiveness but spectacular failures of large efforts. At the organizational level, you should set the right agenda and framework so the productivity and capacity of your IT shop can be improved at the same time you are delivering to the business agenda. To set the right agenda look to the following practices:

provide a clear vision with robust goals that include clear delivery milestones and that are aligned to the business objectives. The vision should also be compelling — your team will only outperform for a worthwhile aspiration.
avoid too many big bets (an unbalanced portfolio) – your portfolio should be a mix of large, medium and small deliveries. This enables you to deliver a regular stream of benefits across a broader set of functions and constituents with less risk. Often a nice balancing investment area is drive several small efforts in HR and Finance that streamline and automate common processes in these areas used by much of the corporation (thus a good, broad positive impact on the corporate productivity).
aggregate your delivery – often IT efforts can be so tightly tied to immediate delivery for the business that the IT processes are substantially penalized including:

where a continuous stream of applications and updates are introduced into production without a release schedule (causing large amount of duplicative or indequate design, testing and implementation)
where a highly siloed delivery approach where every minor business unit has its own set of business systems resulting in redundant feature build and maintain work.

address poor quality standards and ineffective build capability including:

correct defects as early in the build process as possible. Defects correct at their source (design or implementation) are far less costly to fix than those corrected once in production
lower build productivity due to a lack of investment in the underlying ‘build factory’ including tools, training and processes or the teams do not leverage modern incremental or agile methods
delivery by the internal team of the full stack, where packaged software is not leveraged (recently I have encountered shops trying to do their own software distribution tools or data bases

So, in sum, at the organizational level, provide clarity of vision, review your portfolio for balance, make room for investments in your factory and look to simplify and consolidate.

At the team level, employ clarity, accountability, and simplicity to get the right things done. Whether it is a project or an ongoing function:

are the goals or deliverables clear?
are the efforts broken into incremental tasks or steps?
are the roles clear?
are the tasks assigned?
are there due dates? or good operational metrics?
is the solution or approach straightforward?
is there follow up to ensure that the important work takes priority and the work is done?

And then, most important, are you recognizing and rewarding those who gets things done with quality? There are many other factors that you may need to address or supplement to enable the team to be achieve results from providing specific direction to coaching to adding resources or removing poor performers. But frequently well-resourced teams can spin their wheels working on the wrong things, or delivering with poor quality or just not focusing on getting results. This is where clarity, accountability and simplicity make the difference and enable your team to get the right things done.

Most importantly, getting the right things done as an individual is a critical skill that enables outperformance. Look to hone your abilities with some of following suggestions:

recognize we tend to do what is urgent rather than what is important. Shed the unimportant but urgent tasks and spend more time on important tasks. In particular, use the time to be prepared, improve your skills, or do the planning work that is often neglected.
hold yourself accountable, make your commitments. As a leader you must demonstrate holding yourself to the same (or higher) standards as those for your team.
Make clear, fact-based decisions and don’t over-analyze. But seek inputs where possible from your team and experts. And leverage a low PDI style so you can avoid major mistakes.
and finally, a positive approach can make a world of a difference. Do your job with high purpose and in high spirit. Your team will see it and it will lighten their step as well.

So, those are the practices from my experience that have been enablers to getting things done. What would you add? or change? Do let me know.

Best, Jim Ditmore

Recipes for IT

Best practices for achieving high performance IT

Tag: IT Leadership

The Elusive High Availability in the Digital Age

The Key Growth Constraints on Your Business

Overcoming the Inefficient Technology Marketplace

Looking to Improve IT Production? How to Start

IT Security in the Headlines – Again

Both Sides of the Staffing Coin: Building a High Performance Team -and- Building a Great IT Career

Riding with the Technology Peloton

Outsourcing and Out-tasking Best Practices

Achieving Outstanding IT Strategy

Getting Things Done: A Key Leadership Skill

Have a great holiday weekend. Jim