Improving Vendor Performance

As we discussed in our previous post on the inefficient technology marketplace, the typical IT shop spends 60% or more of its budget on external vendors – buying hardware, software, and services. Often, once the contract has been negotiated, signed, and initial deliveries commence, attentions drift elsewhere. There are, of course, plenty of other fires to put out. But maintaining an ongoing, fact-based focus on your key vendors can result in significant service improvement and corresponding value to your firm. This ongoing fact-based focus is proper vendor management.

Proper vendor management is the right complement to a robust, competitive technology acquisition process. For most IT shops, your top 20 or 30 vendors account for about 80% of your spend. And once you have achieved outstanding pricing and terms through a robust procurement process, you should ensure you have effective vendor management practices in place that result in sustained strong performance and value by your vendors.

Perhaps the best vendor management programs are those run by manufacturing firms. Firms such as GE, Ford, and Honda have large dedicated supplier teams that work closely with their suppliers on a continual basis on all aspects of service delivery. Not only do the supplier teams routinely review delivery timing,  quality, and price, but they also work closely with their suppliers to help them improve their processes and capabilities as well as identify issues within their own firm that impact supplier price, quality and delivery. The work is data-driven and leverages heavily process improvement methodologies like LEAN. For the average IT shop in services or retail, a full blown manufacturing program may be overkill, but by implementing a modest but effective vendor management program you can spur 5 to 15% improvements in performance and value which accumulate to considerable benefits over time.

The first step to implementing a vendor management program is to segment your vendor portfolio. You should focus on your most important suppliers (by spend or critical service). Focus on the top 10 to 30 suppliers and segment them into the appropriate categories. It is important to group like vendors together (e.g, telecommunications suppliers or server suppliers). Then, if not already in place, assign executive sponsors from your company’s management team to each vendor. They will be the key contact for the vendor (not the sole contact but instead the escalation and coordination point for all spend with this vendor) and will pair up with the procurement team’s category lead to ensure appropriate and optimal spend and performance for this vendor. Ensure both sides (your management and the vendor know the expectations for suppliers (and what they should expect of your firm). Now you are ready to implement a vendor management program for each of these vendors.

So what are the key elements of an effective vendor management program? First and foremost, there should be three levels of vendor management:

  • regular operational service management meetings
  • quarterly technical management sessions, and
  • executive sessions every six or twelve months.

The regular operational service management meetings – which occur at the line management level – ensure that regular service or product deliveries are occurring smoothly, issues are noted, and teams conduct joint working discussions and efforts to improve performance. At the quarterly management sessions, performance against contractual SLAs is reviewed as well as progress against outstanding and jointly agreed actions. The actions should address issues that are noted at the operational level to improve performance. At the nest level, the executive sessions will include a comprehensive performance review for the past 6 or 12 months as well as a survey completed by and for each firm.  (The survey data to be collected will vary of course by the product or service being delivered.) Generally, you should measure along the following categories:

  • product or service delivery (on time, on quality)
  • service performance (on quality, identified issues)
  • support (time to resolve issues, effectiveness of support)
  • billing (accuracy, clarity of invoice, etc)
  • contractual (flexibility, rating of terms and conditions, ease of updates, extensions or modifications)
  • risk (access management, proper handling of data, etc)
  • partnership (willingness to identify and resolve issues, willingness to go above and beyond, how well the vendor understand your business and your goals)
  • innovation (track record of bringing ideas and opportunities for cost improvement or new revenues or product features )

Some of the data (e.g. service performance) will be  summarized from operational data collected weekly or monthly as part of the ongoing operational service management activities. The operational data is supplemented by additional data and assessments captured from participants and stakeholders from both firms. It is important that the data collected be as objective as possible – so ratings that are high or low should be backed up with specific examples or issues. The data is then collated and filtered for presentation to a joint session of senior management representing their firms. The focus of the executive session is straightforward: to review how both teams are performing and to identify the actions that can enable the relationship to be more successful for both parties. The usual effect of a well-prepared assessment with data-driven findings is strong commitment and a re-doubling of effort to ensure improved performance.

Vendors rarely get clear, objective feedback from customers, and if your firm provides such valuable information, you will often be the first to reap the rewards. And by investing your time and effort into a constructive report, you will often gain an executive partner at your vendor willing to go the extra mile for your firm when needed. Lastly, the open dialogue will also identify areas and issues within your team and processes, such as poor specifications or cumbersome ordering processes that can easily be improved and yield efficiencies for both sides.

It is also worthwhile to use this supplier scorecard to rate the vendor against other similar suppliers. For example, you can show there total score in all categories against other vendors in an an anonymized fashion (e.g., Vendor A, Vendor B, etc) where they can see their score but can also see other vendors doing better and worse. Such a position often brings out the competitive nature of any group, also resulting in improved performance in the future.

Given the investment of time and energy by your team, the vendor management program should be focused on your top suppliers. Generally, this is the top 10 to 30 vendors depending on your IT spend. The next tier of vendors (31 through 50 or 75) should get an annual or biannual review and risk assessment but not the regular operational meetings or assessments and management assessment unless the performance is below par. Remediation of such a vendor’s performance can often be turned around by applying such a program.

Another valuable practice, once your program is established and is yielding constructive results, is to establish a vendor awards program. With the objective and thoughtful perspective of your vendors, you can then establish awards for your top vendors – vendor of the year, vendor partner of the year, most improved vendor, most innovative, etc. Perhaps invite the senior management of the vendor’s receiving awards to attends and awards dinner, along with your firm’s senior management to give the awards, will further spur both those who win the awards as well as those who don’t. Those who win will pay attention to your every request, those who don’t will have their senior management focused on winning the award for next year. The end result, from the weekly operational meetings, to the regular management sessions, and the annual gala, is that vendor management positively impacts your significant vendor relationships and enables you to drive greater value from your spend.

Of course, the vendor management process outlined here is a subset of the procurement lifecycle applied to technology. It complements the technology acquisition process and enables you to repairs or improve and sustain vendor performance and quality levels for a significant and valuable gain for your company.

It would be great to hear from your experience with leveraging vendor management.

Best, Jim Ditmore

 

Overcoming the Inefficient Technology Marketplace

The typical IT shop spends 60% or more of its budget on external vendors – buying hardware, software, and services. Globally, the $2 trillion dollar IT marketplace (2013 estimate by Forrester) is quite inefficient where prices and discounts vary widely between purchasers and often not for reasons of volume or relationship. As a result, many IT organizations fail to effectively optimize their spend, often overpaying by 10%, 20%, or even much more.

Considering that IT budgets continue to be very tight, overspending your external vendor budget by 20% (or a total budget overrun of 12%) means that you must reduce the remaining 40% budget spend (which is primarily for staff) by almost 1/3 ! What better way to get more productivity and results from your IT team than to spend only what is needed for external vendors and plow these savings back into IT staff and investments or to the corporate bottom line?

IT expenditures are easily one of the most inefficient areas of corporate spending due to opaque product prices and uneven vendor discounts. The inefficiency occurs across the entire spectrum of technology purchases – not just highly complex software purchases or service procurements. I learned from my experience in several large IT shops  that there is rarely a clear rationale for the pricing achieved by different firms other than they received what they competitively arranged and negotiated. To overcome this inefficient marketplace, the key prerequisite is to set up strong competitive playing fields for your purchases. With competitive tension, your negotiations will be much stronger, and your vendors will work to provide the best value. In several instances, when comparing prices and discounts between firms where I have worked that subsequently merged, it became clear that many IT vendors had no consistent pricing structures, and in too many cases, the firm that had greater volume had a worse discount rate than the smaller volume firm. The primary difference? The firm that robustly, competitively arranged and negotiated always had the better discount. The firms that based their purchases on relationships or that had embedded technologies limiting their choices typically ended up with technology pricing that was well over optimum market rates.

As an IT leader, to recapture the 6 to 12% of your total budget due to vendor overspend, you need to address inadequate technology acquisition knowledge and processes in your firm — particularly with your senior managers and engineers who are participating or making the purchase decisions. To achieve best practice in this area, the basics of a strong technology acquisition approach are covered here, and I will post on the reference pages the relevant templates that IT leaders can use to seed their own best practice acquisition processes. The acquisition processes will only work if you are committed to creating and maintaining competitive playing fields and not making decisions based on relationships. As a leader, you will need to set the tone with a value culture and focus on your company’s return on value and objectives – not the vendors’.

Of course, the technology acquisition process outlined here is a subset of the procurement lifecycle applied to technology. The technology acquisition process provides additional details on how to apply the lifecycle to technology purchases, leveraging the teams, and accommodating the complexities of the technology world. As outlined in the lifecycle, technology acquisition should then be complemented by a vendor management approach that repairs or sustains vendor performance and quality levels – this I will cover in a later post.

Before we dive into the steps of the technology acquisition process, what are the fundamentals that must be in place for it to work well? First, a robust ‘value’ culture must be in place. A ‘value’ culture is where IT management (at all levels) is committed to optimizing its company’s spending in order to make sure that the company gets the most for its money. It should be part of the core values of the group (and even better — a derivative of corporate values). The IT management and senior engineers should understand that delivering strong value requires constructing competitive playing fields for their primary areas of spending. If IT leadership instead allows relationships to drive acquisitions, then this quickly robs the organization of negotiating leverage, and cost increases will quickly seep into acquisitions.  IT vendors will rapidly adapt to how the IT team select purchases — if it is relationship oriented, they will have lots of marketing events, and they will try to monopolize the decision makers’ time. If they must be competitive and deliver outstanding results, they will instead focus on getting things done, and they will try to demonstrate value. For your company, one barometer on how you are conduct your purchases is the type of treatment you receive from your vendors. Commit to break out of the mold of most IT shops by changing the cycle of relationship purchases and locked-in technologies with a ‘value’ culture and competitive playing fields.

Second, your procurement team should have thoughtful category strategies for each key area of IT spending (e.g. storage, networking equipment, telecommunications services). Generally, your best acquisition strategy for a category should be to establish 2 or 3 strong competitors in a supply sector such as storage hardware. Because you will have leveled most of the technical hurdles that prevent substitution, then your next significant acquisition could easily go to any of vendors . In such a situation, you can drive all vendors to compete strongly to lower their pricing to win. Of course, such a strong negotiating position is not always possible due to your legacy systems, new investments, or limited actual competitors. For these situations, the procurement team should seek to understand what the best pricing is on the market, what are the critical factors the vendor seeks (e.g., market share, long term commitment, marketing publicity, end of quarter revenue?) and then the team should use these to trade for more value for their company (e.g., price reductions, better service, long term lower cost, etc). This work should be done upfront and well before a transaction initiates so that the conditions favoring the customer in negotiations are in place.

Third, your technology decision makers and your procurement team should be on the same page with a technology acquisition process (TAP). Your technology leads who are making purchase decisions should be work arm in arm with the procurement team in each step of the TAP.  Below is a diagram outlining the steps of the technology acquisition process (TAP). A team can do very well simply by executing each of the steps as outlined. Even better results are achieved by understanding the nuances of negotiations, maintaining competitive tension, and driving value.

 

Here are further details on each TAP step:

A. Identify Need – Your source for new purchasing can come from the business or from IT. Generally, you would start at this step only if it is a new product or significant upgrade or if you are looking to introduce a new vendor (or vendors) to a demand area. The need should be well documented in business terms and you should avoid specifying the need in terms of a product — otherwise, you have just directed the purchase to a specific product and vendor and you will very likely overpay.

B. Define Requirements – Specify your needs and ensure they mesh within the overall technology roadmap that the architects have defined. Look to bundle or gather up needs so that you can attain greater volumes in one acquisition to possibly gain better better pricing. Avoid specifying requirements in terms of products to prevent ‘directing’ the purchase to a particular vendor. Try to gather requirements in a rapid process (some ideas here) and avoid stretching this task out. If necessary, subsequent steps (including an RFI) can be used to refine requirements.

C. Analyze Options – Utilize industry research and high level alternatives analysis to down-select to the appropriate vendor/product pool. Ensure you maintain a strong competitive field. At the same time, do not waste time or resources for options that are unlikely.

D, E, F, G. Execute these four steps in concurrence. First, ensure the options will all meet critical governance requirements (risk, legal, security, architectural) and then drive the procurement selection process as appropriate based on the category strategy. As you narrow or extend options, conduct appropriate financial analysis. If you do wish to leverage proofs of concept or other trials, ensure you have pricing well-established before the trial. Otherwise, you will have far less leverage in vendor negotiations after it has been successful.

H. Create the contract – Leverage robust terms and conditions via well-thought out contract templates to minimize the work and ensure higher quality contracts. At the same time, don’t forgo the business objectives of price and quality and capability and trade these away for some unlikely liability term. The contract should be robust and fair with highly competitive pricing.

I. Acquire the Product – This is the final step of the procurement transaction and it should be as accurate and automated as possible. Ensure proper receivables and sign off as well as prompt payment. Often a further 1% discount can be achieved with prompt payment.

J & K. The steps move into lifecycle work to maintain good vendor performance and manage the assets. Vendor management will be covered in a subsequent post and it is an important activity that corrects or sustains vendor performance to high levels.

By following this process and ensuring your key decision makers set a competitive landscape and hold your vendors to high standards, you should be able to achieve better quality, better services, and significant cost savings. You can then plow these savings back into either strategic investment including more staff or reduce IT cost for your company. And at these levels, that can make a big difference.

What are some of your experiences with technology acquisition and suppliers? How have you tackled or optimized the IT marketplace to get the best deals?

I look forward to hearing your views. Best, Jim Ditmore

Moving from Offshoring to Global Service Centers II

As we covered in our first post on this topic, since the mid-90s, companies have used offshoring to achieve cost and capacity advantages in IT. Offshoring was a favored option to address Y2K issues and has continued to expand at a steady rate throughout the past twenty years. But many companies still approach offshoring as  ‘out-tasking’ and fail to leverage the many advantages of a truly global and high performance work force.

With out-tasking, companies take a limited set of functions or ‘tasks’ and move these to the offshore team. They often achieve initial economic advantage through labor arbitrage and perhaps some improvement in quality as the tasks are documented and  standardized in order to make it easier to transition the work to the new location. This constitutes the first level of a global team: offshore service provider. But larger benefits around are often lost and only select organizations have matured the model to its highest performance level as ‘global service centers’.

So, how do you achieve high performance global service centers instead of suboptimal offshore service providers? As discussed previously, you must establish the right ‘global footprint’ for your organization. Here we will cover the second half of getting to global service centers:  implementing a ‘global team’ model. Combined with the right footprint, you will be able to achieve global service centers and enable competitive advantage.

Global team elements include:

  • consistent global goals and vision across global sites with commensurate rewards and recognition by site
  • a matrix team structure that enables both integrated processes and local and global leadership and controls
  • clarity on roles based on functional responsibility and strategic competence rather than geographic location
  • the opportunity for growth globally from a junior position to a senior leader
  • close partnership with local universities and key suppliers at each strategic location

To understand the variation in performance for the different structures, first consider the effectiveness of your entire team – across the globe – on several dimensions:

  • level of competence (skill, experience)
  • productivity, ability to improve current work
  • ownership and engagement
  • customization and innovation contributions
  • source of future leaders

For an offshore service provider, where work has been out-tasked to a particular site, the team can provide similar or in some cases, better levels of competence. Because of the lower cost in the offshore location, if there is adequate skilled labour, the offshore service provider can more easily acquire such skill and experience within a given budget. A recognizable global brand helps with this talent acquisition. But since only tasks are sent to the center, productivity and continuous improvement can only be applied to the portions of the process within the center. Requirements, design, and other early stage activities are often left primarily to the ‘home office’ with little ability for the offshore center to influence. Further, the process standards and ownership typically remain at home office as well, even though most implementation may be done at the offshore service provider. This creates a further gap where implications of new standards or home office process ‘improvements’ must be borne by the offshore service provider even if the theory does not work well in actual practice. And since implementation and customer interfaces are often limited as well, the offshore service provider receives little real feedback, furthering constraining the improvement cycle.

For the offshore service provider,  the ability to improve processes and productivity is limited to local optimization only, and capabilities are often at the whims of poor decisions from a distant home office. More comprehensive productivity and process improvements can be achieved by devolving competency authority to the primary team executing the work. So, if most testing is done in India, then the testing process ownership and testing best practices responsibility should reside in India. By shifting process ownership closer to the primary team, there will be a natural interchange and flow of ideas and feedback that will result in better improvements, better ownership of the process, and better results. The process can and should still be consistent globally, the primary competency ownership just resides at its primary practice location.  This will result in a highly competent team striving to be among the best in the world. Even better, the best test administrators can now aspire to become test best practice experts and see a longer career path at the offshore location. Their productivity and knowledge levels will improve significantly. These improvements will reduce attrition and increase employee engagement in the test team, not just in India but globally. In essence, by moving from proper task placement to proper competency placement, you enable both the offshore site and the home sites to perform better on both team skill and experience, as well as team productivity and process improvement.

Proper competency placement begins the movement of your sites from offshore service providers to global service excellence. Couple competency placement with transparent reporting on the key metrics for the selected competencies (e.g., all test teams, across the globe, should report based on best in class operational metrics) and drive improvement cycles (local and global) based on findings from the metrics. Full execution of these three adjustments will enable you to achieve sustained productivity improvements of 10 to 30% and lower attrition rates (of your best staff) by  20 to 40%.

It is important to understand that pairing competency leadership with primary execution is required in IT disciplines much more so than other fields due to the rapid fluidity and advance of technology practices, the frequent need to engage multiple levels of the same expertise to resource and complete projects, and the ambiguity and lack of clear industry standards in many IT engineering areas. In many other industries (manufacturing, chemicals, petroleum), stratification between engineering design and implementation is far more rigorous and possible given the standardization of roles and slower pace of change. Thus, organizations can operate far closer to optimum even with task offshoring that is just not possible in the IT space over any sustained time frame.

To move beyond global competency excellence, the structures around functions (the entire processes, teams and leadership that deliver a service) must be optimized and aligned. First and foremost, goals and agenda must be set consistently across the globe for all sites. There can be no sub agendas where offshore sites focus only on meeting there SLAs or capturing a profit, instead the goals must be the appropriate IT goals globally. (Obviously, for tax implications, certain revenue and profit overheads will be achieved but that is an administrative process not an IT goal. )

Functional optimization is achieved by integrating the functional management across the globe where it becomes the primary management structure. Site and resource leadership is secondary to the functional management structure. It is important to maintain such site leadership to meet regulatory and corporate requirements as well as provide local guidance, but the goals, plans, initiatives, and even day-to-day activities flow through a natural functional leadership structure. There is of course a matrix management approach where often the direct line for reporting and legal purposes is the site management, but the core work is directed via the functional leadership. Most large international companies have mastered this matrix management approach and staff and management understand how to properly work within such a setup.

It is worth noting that within any large services corporation ‘functional’ management will reign supreme over ‘site’ management. For example, in a debate deciding what are the critical projects to be tackled by the IT development team, it is the functional leaders working closely with the global business units that will define the priorities and make the decisions. And if the organization has a site-led offshore development shop, they will find out about the resources required long after the decisions are made (and be required to simply fulfill the task). Site management is simply viewed as not having worthy knowledge or authority to participate in any major debate. Thus if you have you offshore centers singularly aligned to site leadership all the way up the corporate chain, the ability to influence or participate in corporate decisions is minimal. However, if you have matrixed the structure to include a primary functional reporting mechanism, then the offshore team will have some level of representation. This increases particularly as manager and senior managers populate the offshore site and are enable functional control back into home offices or other sites. Thus the testing team, as discussed earlier, if it is primarily located in India, would have not just responsibility for the competency and process direction and goals but also would have the global test senior leader at its site who would have test teams back at the home office and other sites. This structure enables functional guidance and leadership from a position of strength. Now, priorities, goals, initiatives, functional direction can flow smoothly from around the globe to best inform the functional direction. Staff in offshore locations now feel committed to the function resulting in far more energy and innovation arising from these sites. The corporation now benefits from having a much broader pool of strong candidates for leadership positions. And not just more diverse candidates, but candidates who understand a global operating model and comfortable reaching across time zones and cultures. Just what is needed to compete globally in the business. The chart below represents this transition from task to competency to function optimization.

Global Team ProgressionIf you combine the functional optimization with a highly competitive site structure, you can typically organize key function in 2 or 3 locations where global functional leadership will reside. This then adds time of day and business continuity advantages. By having the same function at a minimum of two sites, then even if one site is down the other can operate. Or IT work can be started at one site and handed off at the end of the day at the next site that is just beginning their day (in fact most world class IT command centers operate this way). Thus no one ever works the night shift. And time to market can be greatly improved by leveraging such time advantages.

While it is understandably complex that you are optimizing across many variables (site location, contractor and skill mix, location cost, functional placement, competency placement, talent and skill availability), IT teams that can achieve a global team model and put in place global service centers reap substantial benefits in cost, quality, innovation, and time to market.

To properly weigh these factors I recommend a workforce plan approach where each each function or sub function maps out their staff and leaders across site, contractor/staff mix, and seniority mix. Lay out the target to optimize across all key variables (cost, capability, quality, business continuity and so on) and then construct a quarterly trajectory of the function composition from current state until it can achieve the target. Balance for critical mass, leadership, and likely talent sources. Now you have the draft plan of what moves and transactions must be made to meet your target. Every staff transaction (hires, rotations, training, layoffs, etc) going forward should be weighed against whether it meshes with the workforce plan trajectory or not. Substantial progress to an optimized global team can then be made by leveraging a rising tide of accumulated transactions executed in a strategic manner. These plans must be accompanied or even introduced by an overall vision of the global team and reinforcement of the goals and principles required to enable such an operating model. But once laid, you and your organization can expect to achieve far better capabilities and results than just dispersing tasks and activities around the world.

In today’s global competition, this global team approach is absolutely key for competitive advantage and essential for competitive parity if you are or aspire to be a top international company. It would be great to hear of your perspectives and any feedback on how you or your company been either successful (or unsuccessful) at achieving a global team.

I will add a subsequent reference page with Workforce Plan templates that can be leveraged by teams wishing to start this journey.

Best, Jim Ditmore

Keeping Score and What’s In Store for 2014

Now that 2013 is done, it is time to review my predictions from January last year. For those keeping score, I had six January predictions for Technology in 2013:

6. 2013 is the year of the ‘connected house’ as standards and ‘hub’ products achieve critical mass. Score: Yes! – A half dozen hubs were introduced in 2013 including Lowe’s and AT&T’s as well as SmartThings and Nest. The sector is taking off but is not quite mainstream as there is a bit of administration and tinkering to get everything hooked in. Early market share could determine the standards and the winners here.

5. The IT job market will continue to tighten requiring companies to invest in growing talent as well as higher IT compensation. Score: Nope! – Surprisingly, while the overall job market declined from a 7.9% unemployment rate to 7.0% over 2013, the tech sector had a slight uptick from 3.3% to 3.9% in the 3rd quarter (4Q numbers not available). However, this uptick seems to be caused by more tech workers switching jobs (and thus quitting old jobs) perhaps due to more confidence and better pay elsewhere. Look for a continued tight supply of IT workers as the Labor department predicts that by 2020, another 1.4M IT workers are required and there will only be 400K IT graduates during that time!

4. Fragmentation will multiply in the mobile market, leaving significant advantage to Apple and Samsung being the only companies commanding premiums for their products. Score: Yes and no – Fragmentation did occur in Android segment, but the overall market consolidated greatly. And Samsung and Apple continued in 2013 to capture the lion’s share of all profits from mobile and smart phones. Android picked up market share (and fragment into more players), as well as Windows Phone, notably in Europe. Apple dipped some, but the greatest drop was in ‘other’ devices (Symbian, Blackberry, etc). So expect a 2014 market dominated by Android, iOS, and a distant third to Windows Phone. And Apple will be hard pressed to come out with lower cost volume phones to encourage entry into their ecosystem. Windows Phone will need to continue to increase well beyond current levels especially in the US or China in order to truly compete.

3. HP will suffer further distress in the PC market both from tablet cannibalization and aggressive performance from Lenovo and Dell. Score: Yes! – Starting with the 2nd quarter of 2013, Lenovo overtook HP as the worldwide leader in PC shipments and then widened it in the 3rd quarter. Dell continued to outperform the overall market sector and finished a respectable second in the US and third in the world. Overall PC shipments continued to slide with an 8% drop from 2012, in large part due to tablets. Windows 8 did not help shipments and there does not look like a major resurgence in the market in the near term. Interestingly, as with smart phones, there is a major consolidation occurring around the top 3 vendors in the market — again ‘other’ is the biggest loser of market share.

2. The corporate server market will continue to experience minimal increases in volume and flat or downward pressure on revenue. Score: Yes! – Server revenues declined year over year from 2012 to 2013 in the first three quarters (declines of 5.0%, 3.8%, and 2.1% respectively). Units shipped treaded water with a decline in the first quarter of .7%, an uptick in the second quarter of  4%, and a slight increase in the third quarter of 2%. I think 2014 will show more robust growth with greater business investment.

1. Microsoft will do a Coke Classic on Windows 8. Score: Yes and no – Windows 8.1 did put back the Start button, but retained much of the ‘Metro’ interface. Perhaps best cast as the ‘Great Compromise’, Windows 8.1 was a half step back to the ‘old’ interface and a half step forward to a better integrated user experience. We will see how the ‘one’ user experience across all devices works for Microsoft in 2014.

So, final score was 3 came true, 2 mostly came true, and 1 did not – for a total score of 4. Not too bad though I expected a 5 or 6 🙂 . I will do one re-check of the score when the end of year IT unemployment figures come out to see if the strengthening job market made up for the 3rd quarter dip.

As an IT manager, it is important to have strong, robust competition – it was good to see both Microsoft and HP come out swinging in 2013. Maybe they did not land many punches but it is good to have them back in the games.

Given it is the start of the year, I thought I would map out some of the topics I plan to cover this coming year in my posts. As you know, the focus of Recipe for IT  is useful best practice techniques and advice that works in the real world and enables IT managers to be more successful. In 2013, we had a very successful year with over 43,000 views from over 150 countries, (most are from the US, UK, India, and Canada). And I wish to thank the many who have contributed comments and feedback — it has really helped me craft a better product. So with that in mind, please provide your perspective on the upcoming topics, especially if there are areas you would like to see covered that are not.

For new readers, I have structured the site into two main areas: posts – which are short, timely essays on a particular topic and reference pages– which often take a post and provide a more structured and possibly deeper view of the topic. The pages are intended to be an ongoing reference of best practice for you leverage. You can reach the reference pages from the drop down links on the home page.

For posts, I will be continue the discussion on cloud and data centers. I will also explore flash storage and the continuing impact of mobile. Security will invariably be a topic. Some of you may have noticed some posts are placed first on InformationWeek and then subsequently here. This helps increase the exposure of Recipe for IT and also ensure good editing (!).

For the reference pages, I have recently refined and will continue to improve the production and quality areas. Look also for updates and improvements to leadership  as well as the service desk.

What other topics would you like to see explored? Please comment and provide your feedback and input.

Best, and I wish you a great start to 2014,

Jim Ditmore

How Did Technology End Up on the Sunday Morning Talk Shows?

It has been two months since the Healthcare.gov launch and by now nearly every American has heard or witnessed the poor performance of the websites. Early on, only one of every five users was able to actually sign in to Healthcare.gov, while poor performance and unavailable systems continue to plague the federal and some state exchanges. Performance was still problematic several weeks into the launch and even as of Friday, November 30, the site was down for 11 hours for maintenance. As of today, December 1, the promised ‘relaunch day’, it appears the site is ‘markedly improved’ but there are plenty more issues to fix.

What a sad state of affairs for IT. So, what does the Healthcare website issues teach us about large project management and execution? Or further, about quality engineering and defect removal?

Soon after the launch, former federal CTO Aneesh Chopra, in an Aspen Institute interview with The New York Times‘ Thomas Friedman, shrugged off the website problems, saying that “glitches happen.” Chopra compared the Healthcare.gov downtime to the frequent appearances of Twitter’s “fail whale” as heavy traffic overwhelmed that site during the 2010 soccer World Cup.

But given that the size of the signup audience was well known and that website technology is mature and well understood, how could the government create such an IT mess? Especially given how much lead time the government had (more than three years) and how much it spent on building the site (estimated between $300 million and $500 million).

Perhaps this is not quite so unusual. Industry research suggests that large IT projects are at far greater risk of failure than smaller efforts. A 2012 McKinsey study revealed that 17% of lT projects budgeted at $15 million or higher go so badly as to threaten the company’s existence, and more than 40% of them fail. As bad as the U.S. healthcare website debut is, there are dozens of examples, both government-run and private of similar debacles.

In a landmark 1995 study, the Standish Group established that only about 17% of IT projects could be considered “fully successful,” another 52% were “challenged” (they didn’t meet budget, quality or time goals) and 30% were “impaired or failed.” In a recent update of that study conducted for ComputerWorld, Standish examined 3,555 IT projects between 2003 and 2012 that had labor costs of at least $10 million and found that only 6.4% of them were successful.

Combining the inherent problems associated with very large IT projects with outdated government practices greatly increases the risk factors. Enterprises of all types can track large IT project failures to several key reasons:

  • Poor or ambiguous sponsorship
  • Confusing or changing requirements
  • Inadequate skills or resources
  • Poor design or inappropriate use of new technology

Unfortunately, strong sponsorship and solid requirements are difficult to come by in a political environment (read: Obamacare), where too many individual and group stakeholders have reason to argue with one another and change the project. Applying the political process of lengthy debates, consensus-building and multiple agendas to defining project requirements is a recipe for disaster.

Furthermore, based on my experience, I suspect the contractors doing the government work encouraged changes, as they saw an opportunity to grow the scope of the project with much higher-margin work (change orders are always much more profitable than the original bid). Inadequate sponsorship and weak requirements were undoubtedly combined with a waterfall development methodology and overall big bang approach usually specified by government procurement methods. In fact, early testimony by the contractors ‘cited a lack of testing on the full system and last-minute changes by the federal agency’.

Why didn’t the project use an iterative delivery approach to hone requirements and interfaces early? Why not start with healthcare site pilots and betas months or even years before the October 1 launch date? The project was underway for three years, yet nothing was made available until October 1. And why did the effort leverage only an already occupied pool of virtualized servers that had little spare capacity for a major new site? For less than 10% of the project costs a massive dedicated farm could have been built.  Further, there was no backup site, nor any monitoring tools implemented. And where was the horizontal scaling design within the application to enable easy addition of capacity for unexpected demand? It is disappointing to see such basic misses in non-functional requirements and design in a major program for a system that is not that difficult or unique.

These basic deliverables and approaches appear to have been fully missed in the implementation of the wesite. Further, the website code appears to have been quite sloppy, not even using common caching techniques to improve performance. Thus, in addition to suffering from weak sponsorship and ambiguous requirements, this program failed to leverage well-known best practices for the technology and design.

One would have thought that given the scale and expenditure on the program, top technical resources would have been allocated and ensured these practices were used. The feds are  scrambling with a “surge” of tech resources  for the site. And while the new resources and leadership have made improvements so far, the surge will bring its own problems. It is very difficult to effectively add resources to an already large program. And, new ideas introduced by the ‘surge’ resources, may not be either accepted or easily integrated. And if the issues are deeply embedded in the system, it will be difficult for the new team to fully fix the defects. For every 100 defects identified in the first few weeks, my experience with quality suggests there are 2 or 3 times more defects buried in the system. Furthermore, if one wonders if the project couldn’t handle the “easy” technical work — sound website design and horizontal scalability – how will they can handle the more difficult challenges of data quality and security?

These issues will become more apparent in the coming months when the complex integration with backend systems from other agencies and insurance companies becomes stressed. And already the fraudsters are jumping into the fray.

So, what should be done and what are the takeaways for an IT leader? Clear sponsorship and proper governance are table stakes for any big IT project, but in this case more radical changes are in order. Why have all 36 states and the federal government roll out their healthcare exchanges in one waterfall or big bang approach? The sites that are working reasonably well (such as the District of Columbia’s) developed them independently. Divide the work up where possible, and move to an iterative or spiral methodology. Deliver early and often.

Perhaps even use competitive tension by having two contractors compete against each other for each such cycle. Pick the one that worked the best and then start over on the next cycle. But make them sprints, not marathons. Three- or six-month cycles should do it. The team that meets the requirements, on time, will have an opportunity to bid on the next cycle. Any contractor that doesn’t clear the bar gets barred from the next round. Now there’s no payoff for a contractor encouraging endless changes. And you have broken up the work into more doable components that can then be improved in the next implementation.

Finally, use only proven technologies. And why not ask the CIOs or chief technology architects of a few large-scale Web companies to spend a few days reviewing the program and designs at appropriate points. It’s the kind of industry-government partnership we would all like to see.

If you want to learn more about how to manage (and not to manage) large IT programs, I recommend “Software Runaways,” , by Robert L. Glass, which documents some spectacular failures. Reading the book is like watching a traffic accident unfold: It’s awful but you can’t tear yourself away. Also, I expand on the root causes of and remedies for IT project failures in my post on project management best practices.

And how about some projects that went well? Here is a great link to the 10 best government IT projects in 2012!

What project management best practices would you add? Please weigh in with a comment below.

Best, Jim Ditmore

This post was first published in late October in InformationWeek and has been updated for this site.

Whither Virtual Desktops?

The enterprise popularity of tablets and smartphones at the expense of PCs and other desktop devices is also sinking desktop virtualization. In addition to the clear link that tablets and smartphones are cannibalizing PC sales, mobility and changing device economics is also impacting corporate desktop virtualization or VDI.

The heyday of virtual desktop infrastructure came around 2008 to 2010, as companies sought to cut their desktop computing costs — VDI promised savings from 10% to as much as 40%. Those savings were possible despite the additional engineering and server investments required to implement the VDI stack. Some companies even anticipated replacing up to 90% of their PCs with VDI alternatives. Companies sought to reduce desktop costs and address specific issues not well-served by local PCs (e.g., smaller overseas sites with local software licensing and security complexities).

But something happened on the way to VDI dominance. The market changed faster than the maturing of VDI. Employee demand for mobile devices, in line with the BYOD phenomenon, has refocused IT shops on delivering mobile device management capabilities, not VDI. On-the-go employees are gravitating toward new lightweight laptops, a variety of tablets and other non-desktop innovations that aren’t VDI-friendly. Mobile employees want to use multiple devices; they don’t want to be tied down to a single VDI-based interface. And enterprise IT shops have refocused on delivering mobile device management capabilities so company employees can securely use their smartphones for their work. Given the VDI interface is at best cumbersome on a touch interface with a different OS than Windows, there will be less and less demand for VDI as the way to interconnect.  Given the dominance of these highly mobile smartphones and tablets will only increase in the next few years as the client device war between Apple, Android, and Microsoft (Nokia) heats up further (and they continue to produce better and cheaper products) VDI’s appeal will fall even farther.

Meantime, PC prices, both desktop and laptop, which have had a steady decline in the past 4 years, dropping 30-40% (other than Apple’s products, of course), will accelerate their price drop.  With the decline in shipments these past 18 months, the entire industry is overcapacity and the only way to out of the situation is to spur demand and better consumer interest in PCs is through further cost reductions. (Note that the answer is not that Windows 8 will spur demand). Already Dell and Lenovo are using lower prices to try to hold their volumes steady. And with other devices entering the market (e.g. Smart TVs, smart game stations, etc), it will become a very bloody marketplace. The end result for IT shops will be $300 laptops that are pretty slick that come fully with Windows (perhaps even Office). At those prices, VDI will have minimal or no cost advantage especially taking into account the backend VDI engineering costs.  And if you can buy a $300 laptop or tablet fully equipped that is preferred by most employees, IT shops will be hard pressed to pass that up and impose VDI. In fact, by late 2014, corporate IT shops in 2014 could be faced with their VDI solutions costing more than traditional client devices (e.g., that $300 laptop). This is because the major components of VDI costs (servers and engineering work and support) will not drop nearly as quickly as the distressed market PC costs. 

There is no escaping the additional engineering time and attention VDI requires. The complex stack (either Citrix or VMware) still requires more engineering than a traditional solution. And with this complexity, there will still be bugs between the various client and VDI and server layers that impact user experience. Recent implementations still show far too many defects between the layers. At Allstate, we have had more than our share of defects in our recent rollout between the virtualization layer, Windows, and third party products. And this is for what should be by now, a mature technology.

Faced with greater costs, greater engineering resources (which are scarce) and employee demand for the latest mobile client devices, organizations will begin to throw in the towel on VDI. Some companies now deploying will reduce the scope of current VDI deployments. Some now looking at VDI will jump instead to mobile-only alternatives more focused on tablets and smartphones. And those with extensive deployments will allow significant erosion of their VDI footprint as internal teams opt for other solutions, employee demand moves to smartphones and tablets or lifecycle events occur. This is a long fall from the lofty goals of 90% deployment from a few years ago. IT shops do not want to be faced with both supporting VDI for an employee who also has a tablet, laptop or desktop solution because it essentially doubles the cost of the client technology environment. In an era of very tight IT budgets, excess VDI deployments will be shed.

One of the more interesting phenomenon in the rapidly changing world of technology is when a technology wave gets overtaken well before it peaks. This occurred many times before (think optical disk storage in the data center) but perhaps most recently with netbooks where their primary advantages of cost and simplicity where overwhelmed by smartphones (from below) and ultra-books from above. Carving out a sustainable market niche on cost alone in the technology world is a very difficult task, especially when you consider that you are reversing long term industry trends.

Over the past 50 years of computing history, the intelligence and capability has been drawn either to the center or to the very edge. In the 60s, mainframes were the ‘smart’ center and 3270 terminals were the ‘dumb’ edge device. In the 90s, client computing took hold and the ‘edge’ became much smarter with PCs but there was a bulging middle tier of the three tier client compute structure. This middle tier disappeared as hybrid data centers and cloud computing re-centralized computing. And the ‘smart’ edge moved out even farther with smartphones and tablets. While VDI has a ‘smart’ center, it assumes a ‘dumb’ edge, which goes against the grain of long term compute trends. Thus the VDI wave, a viable alternative for a time, will be dissipated in the next few years as the long term compute trends overtake it fully.

I am sure there will still be niche applications, like offshore centers (especially where VDI also enables better control of software licensing) and there will still be small segments of the user population that will swear by the flexibility to access their device from anywhere they can log in without carrying anything, but these are ling term niches. Long term, VDI solutions will have a smaller and smaller portion of the device share, perhaps 10%, maybe even 20%, but not more.

What is your company’s experience with VDI? Where do you see its future?

Best, Jim Ditmore

 This post was first published in InformationWeek on September 13, 2013 and has been slightly revised and updated.

Getting to Private Cloud: Key Steps to Build Your Cloud

Now that I am back from summer break, I want to continue to further the discussion on cloud and map out how medium and large enterprises can build their own private cloud. As we’ve discussed previously, software-as-a-service, engineered stacks and private cloud will be the biggest IT winners in the next five to ten years. Private clouds hold the most potential — in fact, early adopters such as JP Morgan Chase and Fidelity are seeing larger savings and greater benefits than initially anticipated.

While savings is a key reason to move to a private cloud, shorter development cycles and faster time to market are more significant. Organizations can test risky ideas more easily as small, low-cost projects, quickly dispensing with those projects that fail and accelerating those that show more promise.

While savings is a key driver to moving to private cloud, faster development cycles and better time to market are turning out to be both more significant and more valuable to early adopter firms than initially estimated. And it is not just a speed improvement but a qualitative improvement where smaller projects can trialled or riskier pilots can be executed with far greater speed and nominal costs. This allows a ‘fast fail’ approach on corporate innovation that greatly speeds the selection process, avoids extensive wasted investment in lengthier traditional pilots (that would have failed anyway) and greatly improves time to market on those ideas that are successful.

As for the larger savings, early implementations at scale are seeing savings well in excess of 50%. This is well beyond my estimate of 30% and is occurring in large part because of the vastly reduced labor requirements to build and administer a private cloud versus traditional infrastructure.

So with greater potential benefits, how should an IT department go about building a private cloud? The fundamental building blocks required for private cloud are a base of virtualized servers utilizing commodity servers and leveraging open systems. And of course you need the server engineering and administration expertise to support the platform. There’s also a strong early trend toward leveraging open source software for private clouds, from the Linux operating system to OpenNebula and Eucalyptus for infrastructure management. But just having a virtualized server platform does not result in private cloud. There are several additional elements required.

First, establish a set of standardized images that constitute most of the stack. Preferably, that stack will go from the hardware layer to the operating system to the application server layer, and it will include systems management, security, middleware and database. Ideally, go with a dozen or fewer server images and certainly no more than 20. Consider everything else to be custom and treated separately and differently from the cloud.

Once you have established your target set of private cloud images you should build a catalogue and ordering process that is easy, rapid, and transparent. The costs should be clear, and the server units should be processor-months or processor-weeks. You will need to couple the catalogue with highly automated provisioning and de-provisioning. Your objective should be to deliver servers quickly, certainly within hours, preferably within minutes (once the costs are authorized by the customer). And de-provisioning should be just as rapid and regular. In fact, you should offer automated ‘sunset’ servers in test and development environments (e.g., after 90 days the server(s) are allocated, they are automatically returned to the pool). I strongly recommend well-published and clear cost and allocation reporting to drive the right behaviors among your users. It will encourage quicker adoption, better and more efficient usage and rapid turn-in when no longer needed. With these 4 prerequisites in place (standard images, a catalogue and easy ordering process, clear costs and allocations, and automated provisioning and de-provisioning) you are ready to start your private cloud.

Look to build your private cloud in parallel to your traditional data center platforms. There should be both a development and test private cloud as well as a production private cloud. Seed the cloud with an initial investment of servers of each standard type. Then transition demand into the private as new projects initiate and proceed to grow it project by project.

You could begin by routing small and medium size projects to the private cloud environment and as it builds up scale and provisioning kinks are ironed out, migrate more and more server requests until nearly all requests are routed through your private cloud path. As you begin to achieve scale and you prove out your ordering and provisioning (and de-provisioning processes) you can begin to tighten the criteria for projects to proceed with traditional custom servers. Within 6 months, custom, traditional servers should be the rare exception and should be charged fully for the excess costs they will generate.

 Once the private cloud is established you can verify the costs savings and advantages. And there will be additional advantages such as improved time to market because of improvements in the speed of your development efforts given server deployment is no longer a long pole in the tent. Well-armed with this data, you can now circle back and tackle existing environments and legacy custom servers. While often the business case for a platform transition is not a good investment, a transition to private cloud during another event (e.g., major application release, server end-of-life migration) should easily become a winning investment. A few early adopters (such as JPMC or Fidelity) are seeing outsized benefits and strong developer push into these private cloud environments. So, if you build it well, you should be able to reap the same advantages.

How is your cloud journey proceeding? Are there other key steps necessary to be successful? I look forward to hearing your perspective.

Best, Jim Ditmore

 

Looking to Improve IT Production? How to Start

Production issues, as Microsoft and Google can tell you, impact even cloud email apps. A few weeks ago, Microsoft took an entire weekend to full recover its cloud Outlook service. Perhaps you noted the issues earlier this year in financial services where Bank of America experienced internet site availability issues. Unfortunately for Bank of America that was their second outage in 6 months, though they are not alone in having problems as Chase suffered a similar production outage on their internet services the week following. And these are regular production issues, not the unavailability of websites and services due to a series of DD0S attacks.

Perhaps 10 or certainly 15 years ago, such outages with production systems would have resulted in far less notice by their customers as the front office personnel would have worked alternate systems and manual procedures until the systems were restored. But with customers accessing the heart of most companies systems now through internet and mobile applications, typically on a 7×24 basis, it is very difficult to avoid direct and widespread impact to customers in the event of a system failure. Your production performance becomes very evident to your customers. And your customers’ expectations have continued to increase such that they expect your company and your services to be available pretty much whenever they want to use them. And while being available is not the only attribute that customers value (usability, feature, service and pricing factor in importantly as well) companies that consistently meet or exceed consumer availability expectations gain a key edge in the market.

So how do you deliver to current and future rising expectations around availability of your online and mobile services? And if both BofA and Chase, which are large organizations that offer dozens of services online and have massive IT departments have issues delivering consistently high availability, how can smaller organizations deliver compelling reliability?

And often, the demand for high availability must be achieved in an environment where ongoing efficiencies have eroded the production base and a tight IT labor market has further complicated obtaining adequate expertise. If your organization is struggling with availability or you are looking to achieve top quartile performance and competitive service advantage, here’s where to start:

First, understand that availability, at its root, is a quality issue. And quality issues can only be changed if you address all aspects. You must set quality and availability as a priority, as a critical and primary goal for the organization. And you will need to ensure that incentives and rewards are aligned to your team’s availability goal.

Second, you will need to address the IT change processes. You should look to implement an ITSM change process based on ITIL. But don’t wait for a fully defined process to be implemented. You can start by limiting changes to appropriate windows. Establish release dates for major systems and accompanying subsystems. Avoid changes during key business hours or just before the start of the day. I still remember the ‘night programmer’ at Ameritrade at the beginning of our transformation there. Staying late one night as CIO in my first month, I noticed two guys come in at 10:30 PM. When I asked what they did, they said ‘ We are the night programmers. When something breaks with the nightly batch run, we go in and fix it.’  And done with no change records, minimal testing and minimal documentation. Of course, my hair stood on end hearing this. We quickly discontinued that practice and instead made changes as a team, after they were fully engineered and tested. I would note that combining this action with a number of other measures mentioned here enabled us to quickly reach a stable platform that had the best track record for availability for all online brokerages.

Importantly, you should ensure that adequate change review and documentation is being done by your teams for their changes. Ensure they take accountability for their work and their quality. Drive to an improved change process with templates for reviews, proper documentation, back out plans, and validation. Most failed changes are due to issues with the basics: a lack of adequate review and planning, poor change documentation of deployment steps, or missing or ineffective validation, or one person doing an implementation in the middle of the night when you should have at least two people doing it together (one to do, and one to check).

Also, you should measure the proportion of incidents due to change. If you experience mediocre or poor availability and failed changes contribute to more than 30% of the incidents, you should recognize change quality is a major contributor to your issues. You will need to zero in on the areas with chronic change issues. Measure the change success rate (percentage of changes executed successfully without production incident) of your teams. Publish the results by team (this will help drive more rapid improvement). Often, you can quickly find which of your teams has inadequate quality because their change success rate ranges from a very poor mid-80s percentage to a mediocre mid-90s percentage. Good shops deliver above 98% and a first quartile shop consistently has a change success rate of 99% or better.

Third, ensure all customer impacting problems are routed through an enterprise command center via an effective incident management process. An Enterprise Command Center (ECC) is basically an enterprise version of a Network Operations Center or NOC, where all of your systems and infrastructure are monitored (not just networks). And the ECC also has capability to facilitate and coordinate triage and resolution efforts for production issues. An effective ECC can bring together the right resources from across the enterprise and supporting vendors to diagnose and fix production issues while providing communication and updates to the rest of the enterprise. Delivering highly available systems requires an investment into an ECC and the supporting diagnostic and monitoring systems. Many companies have partially constructed the diagnostics or have siloed war rooms for some applications or infrastructure components. To fully and properly handle production issues requires consolidating these capabilities and extending their reach.  If you have an ECC in place, ensure that all customer impacting issues are fully reported and handled. Underreporting of issues that impact a segment of your customer base, or the siphoning off of a problem to be handled by a local team, is akin to trying to handle a house fire with a garden hose and not calling the fire department. Call the fire department first, and then get the garden hose out while the fire trucks are on their way.

Fourth, you must execute strong root cause and followup. These efforts must be at the individual issue or incident level as well as at a summary or higher level. It is important to not just get focused on fixing the individual incident and getting to root cause for that one incident but to also look for the overall trends and patterns of your issues. Are they cluster with one application or infrastructure component? Are they caused primarily by change? Does a supplier contribute far too many issues? Is inadequate testing a common thread among incidents? Are your designs too complex? Are you using the products in a mainstream or unique manner – especially if you are seeing many OS or product defects? Use these patterns and analysis to identify the systemic issues your organization must fix. They may be process issues (e.g. poor testing), application or infrastructure issues (e.g., obsolete hardware), or other issues (e.g., lack of documentation, incompetent staff). Track both the fixes for individual issues as well as the efforts to address systemic issues. The systemic efforts will begin to yield improvements that eliminate future issues.

These four efforts will set you on a solid course to improved availability. If you couple these efforts will diligent engagement by senior management and disciplined execution, the improvements will come slowly at first, but then will yield substantial gains that can be sustained.

You can achieve further momentum with work in several areas:

  • Document configurations for all key systems.  If you are doing discovery during incidents it is a clear indicator that your documentation and knowledge base is highly inadequate.
  • Review how incidents are reported. Are they user reported or did your monitoring identify the issue first? At least 70% of the issues should be identified first by you, and eventually you will want to drive this to a 90% level. If you are lower, then you need to look to invest in improving your monitoring and diagnostic capabilities.
  • Do you report availability in technical measures or business measures? If you report via time based systems availability measures or number of incidents by severity, these are technical measures. You should look to implement business-oriented measures such as customer impact availability. to drive great transparency and more accurate metrics.
  • In addition to eliminating issues, reduce your customer impacts by reducing the time to restore service (Microsoft can certainly stand to consider this area given their latest outage was three days!). For mean time to restore (MTTR – note this is not mean time to repair but mean time to restore service), there are three components: teime to detect (MTTD), time to diagnose or correlation (MTTC), and time to fix (to restore service or MTTF). An IT shop that is effective at resolution normally will see MTTR at 2 hours or less for its priority issues where the three components each take about 1/3 of the time. If your MTTD is high, again look to invest in better monitoring. If your MTTC is high look to improve correlation tools, systems documentation or engineering knowledge. And if your MTTF is high, again look to improve documentation or engineering knowledge or automate recovery procedures.
  • Consider investing in greater resiliency for key systems. It may be that customer expectations of availability exceed current architecture capabilities. Thus, you may want to invest in greater resiliency and redundancy or build a more highly available platform.

As you can see, providing robust availability for your customers is a complex endeavor. By implementing these steps, you can enable sustainable and substantial progress to top quartile performance and achieve business advantage in today’s 7×24 world.

What would you add to these steps? What were the key factors in your shop’s journey to high availability?

Best, Jim Ditmore

Turning the Corner on Data Centers

Recently I covered the ‘green shift’ of servers where each new server generation is not only driving major improvements in compute power but is also requires about the same or even less environmentals (power, cooling, space) as the previous generation. Thus, compute efficiency, or compute performance per watt, is improving exponentially. And this trend in servers, which started in 2005 or so, is also being repeated in storage. We have seen a similar improvement in power per terabyte  for the past 3 generations (since 2007). Current storage product pipeline suggests this efficiency trend will continue for the next several years. Below is a chart showing representative improvements in storage efficiency (power per terabyte) across storage product generations from a leading vendor.

Power (VA) per Terabyte
Power (VA) per Terabyte

With current technology advances, a terabyte of storage on today’s devices requires approximately 1/5 of the amount of power as a device from 5 years ago. And these power requirements could drop even more precipitously with the advent of flash technology. By some estimates, there is a drop of 70% or more in power and space requirements with the switch to flash products. In addition to being far more power efficient, flash will offer huge performance advantages for applications with corresponding time reductions in completing workload. So expect flash storage to quickly convert the market once mainstream product introductions occur. IBM sees this as just around the corner, while other vendors see the flash conversion as 3 or more years out. In either scenario, there are continued major improvements in storage efficiency in the pipeline that deliver far lower power demands even with increasing storage requirements.

Ultimately, with the combined efficiency improvements of both storage and server environments over the next 3 to 5 years, most firms will see a net reduction in data center requirements. The typical corporate data center power requirements are approximately one half server, one third storage, and the rest being network and other devices. With the two biggest components experiencing ongoing dramatic power efficiency trends, the net power and space demand should decline in the coming years for all but the fastest growing firms. Add in the effects of virtualization, engineered stacks and SaaS and the data centers in place today should suffice for most firms if they maintain a healthy replacement pace of older technology and embrace virtualization.

Despite such improvements in efficiency, we still could see a major addition in total data center space because cloud and consumer firms like Facebook are investing major sums in new data centers. This resulting consumer data center boom also shows the effects of growing consumerization in the technology market place. Consumerization, which started with PCs and PC software, and then moved to smart phones, has impacted the underlying technologies dramatically. The most advanced compute chips are now those developed for smart phones and video games. Storage technology demand and advances are driven heavily by smart phones and products like the MacBook Air which already leverage only flash storage. The biggest and best data centers? No longer the domain of corporate demand, instead, consumer demand (e.g. Gmail, FaceBook, etc) drives bigger and more advanced centers. The proportion of data center space dedicated to direct consumer compute needs (a la GMail or Facebook) versus enterprise compute needs (even for companies that provide directly consumer services) will see a major shift from enterprise to consumer over the next decade. This will follow the shifts in chips and storage that at one time were driven by the enterprise space (and previously, the government) and are now driven by the consumer segment. And it is highly likely that there will be a surplus of enterprise class data centers (50K – 200K raised floor space) in the next 5 years. These centers are too small and inefficient for a consumer data center (500K – 2M or larger), and with declining demand and consolidation effects, plenty of enterprise data center space will be on the market.

As an IT leader, you should ensure your firm is riding the effects of the compute and storage efficiency trends. Further multiply these demand reduction effects by leveraging virtualization, engineered stacks and SaaS (where appropriate). If you have a healthy buffer of data center space now, you could avoid major investments and costs in data centers in the next 5 to 10 years by taking these measures. Those monies can instead be spent on functional investments that drive more direct business value or drop to the bottom line of your firm. If you have excess data centers, I recommend consolidating quickly and disposing of the space as soon as possible. These assets will be worth far less in the coming years with the likely oversupply. Perhaps you can partner with a cloud firm looking for data center space if your asset is strategic enough for them. Conversely, if you have minimal buffer and see continued higher business growth, it may be possible to acquire good data center assets for far less unit cost than in the past.

For 40 years, technology has ridden Moore’s Law to yield ever-more-powerful processors at lower cost. Its compounding effects have been astounding — and we are now seeing nearly 10 years of similar compounding on the power efficiency side of the equation (below is a chart for processor compute power advances and compute power efficiency advances).

Trend Change for Power Efficiency

The chart above shows how the compute efficiency (performance per watt — green line) has shifted dramatically from its historical trend (blue lines). And it’s improving about as fast as compute performance is improving (red lines), perhaps even faster.

These server and storage advances have resulted in fundamental changes in data centers and their demand trends for corporations. Top IT leaders will be take advantage of these trends and be able to direct more IT investment into business functionality and less into the supporting base utility costs of the data center, while still growing compute and storage capacities to meet business needs.

What trends are you seeing in your data center environment? Can you turn the corner on data center demand ? Are you able to meet your current and future business needs and growth within your current data center footprint and avoid adding data center capacity?

Best, Jim Ditmore

Using Organizational Best Practices to Handle Cloud and New Technologies

I have extended and updated this post which was first published in InformationWeek in March, 2013. I think it is a very salient and pragmatic organizational method for IT success. I look forward to your feedback! Best, Jim

IT organizations are challenged to keep up with the latest wave of cloud, mobile and big data technologies, which are outside the traditional areas of staff expertise. Some industry pundits recommend bringing on more technology “generalists,” since cloud services in particular can call on multiple areas of expertise (storage, server, networking). Or they recommend employing IT “service managers” to bundle up infrastructure components and provide service offerings.

But such organizational changes can reduce your team’s expertise and accountability and make it more difficult to deliver services. So how do you grow your organization’s expertise to handle new technologies? At the same time, how do you organize to deliver business demands for more product innovation and faster delivery yet still ensure efficiency, high quality and security?

Rather than acquire generalists and add another layer of cost and decision making to your infrastructure team, consider the following:

Cloud computing. Assign architects or lead engineers to focus on software-as-a-service and infrastructure-as-a-service, ensuring that you have robust estimating and costing models and solid implementation and operational templates. Establish a cloud roadmap that leverages SaaS and IaaS, ensuring that you don’t overreach and end up balkanizing your data center.

For appliances and private cloud, given their multiple component technologies, let your best component engineers learn adjacent fields. Build multi-disciplinary teams to design and implement these offerings. Above all, though, don’t water down the engineering capacity of your team by selecting generalists who lack depth in a component field. For decades, IT has built complex systems with multiple components by leveraging multi-faceted teams of experts, and cloud is no different.

Where to use ‘service managers’. A frequent flaw in organizations is to employ ‘service managers’ who group multiple infrastructure components (e.g. storage, servers, data centers, etc) into a ‘product’ (e.g. ‘hosting service’) and provide direction and interface for this product. This is an entirely artificial layer that then removes accountability from the component teams and often makes poor ‘product’ decision because of limited knowledge and depth. In the end IT does not deliver ‘hosting services’; IT delivers systems that meet business functions (e.g., for banking, teller or branch functions, ATMs; or for insurance, claims reporting or policy quote or issue). These business functions are the true IT services and are where you should apply a service manager role. Here, a service manager can ensure end-to-end integration and quality, drive better overall transaction performance and reliability, and provide deep expertise on system connections and SLAs and business needs back across the application and infrastructure component teams. And because it is directly attached to the business functions to be done, it will yield high value. These service managers will be invaluable for both new development and enhancement work as well as assisting during production issues.

Mobile. If mobile isn’t already the most critical interface for your company, it will be in three to five years. So don’t treat mobile as an afterthought, to be adapted from traditional interfaces. And don’t outsource this capability, as mobile will be pervasive in everything you build.

Build a mobile competency center that includes development, user experience and standards expertise. Then fan out that expertise to all of your development teams, while maintaining the core mobile group to assist with the most difficult efforts. And of course, continue with a central architecture and control of the overall user experience. A consistent mobile look, feel and flow is essentially your company’s brand, invaluable in interacting with customers.

Big data. There are two key aspects of this technology wave: the data (and traditional analytic uses) and real-time data “decisioning,” similar to IBM’s Watson. You can handle the data analytics as an extension of your traditional data warehousing (though on steroids). However, real-time decisioning has the potential to dramatically alter how your organization specifies and encodes business rules.

Consider the possibility that 30% to 50% of all business logic traditionally encoded in 3 or 4 generation programming languages instead becomes decisioned in real time. This capability will require new development and business analyst skills. For now, cultivate a central team with these skills. As you pilot and determine how to more broadly leverage real-time data decisioning, decide how to seed your broader development teams with these capabilities. In the longer run, I believe it will be critical to have these skills as an inherent portion of each development team.

Competing Demands. Overall, IT organizations must meet several competing demands: Work with business partners to deliver competitive advantage; do so quickly in order to respond to (and anticipate) market demands; and provide efficient, consistent quality while protecting the company’s intellectual property, data and customers. In essence, there are business and market drivers that value speed, business knowledge and closeness at a reasonable cost and risk drivers that value efficiency, quality, security and consistency.

Therefore, we must design an IT organization and systems approach that meets both sets of drivers and accommodates business organizational change. As opposed to organizing around one set of drivers or the other, the best solution is to organize IT as a hybrid organization to deliver both sets of capabilities.

Typically, the functions that should be consolidated and organized centrally to deliver scale, efficiency and quality are infrastructure (especially networks, data centers, servers and storage), IT operations, information security, service desks and anything else that should be run as a utility for the company. The functions to be aligned and organized along business lines to promote agility and innovation are application development (including Web and mature mobile development), data marts and business intelligence.

Some functions, such as database, middleware, testing and project management, can be organized in either mode. But if they aren’t centralized, they’ll require a council to ensure consistent processes, tools, measures and templates.

For services becoming a commodity, or where there’s a critical advantage to having one solution (e.g., one view of the customer for the entire company), it’s best to have a single team or utility that’s responsible (along with a corresponding single senior business sponsor). Where you’re looking to improve speed to market or market knowledge, organize into smaller IT teams closer to the business. The diagram below gives a graphical view of the hybrid organization.

The IT Hybrid Model diagram
With this approach, your IT shop will be able to deliver the best of both worlds. And you can then weave in the new skills and teams required to deliver the latest technologies such as cloud and mobile. You can read more about this hybrid model in our best practice reference page.

Which IT organizational approaches or variations have you seen work best? How are you accommodating new technologies and skills within your teams? Please weigh in with a comment below.

Best, Jim Ditmore