Celebrate 2013 Technology or Look to 2014?

The year is quickly winding down and 2013 will not be remembered as a stellar year for technology. Between the NSA leaks and Orwellian revelations, the Healthcare.gov mishaps, the cloud email outages (and Yahoo’s is still lingering) and now the 40 million credit identities stolen from Target, 2013 actually was a pretty tough year for the promise of technology to better society.

While the breakneck progress of technology continued, we witnessed so many shortcomings in its implementation. Fundamental gaps in large project delivery and availability design and implementation continue to plague large and widely used systems.   It is as if the primary design lessons of ‘Galloping Gertie’ regarding resonance were never absorbed by bridge builders. The costs of such major flaws in these large systems are certainly similar to that of a failed bridge.  And as it turns out, if there is a security flaw or loophole, either the bad guys or the NSA will exploit it. I particularly like NSA’s use of ‘smiley faces’ on internal presentations when they find a major gap in someone else’s system.

So, given 2013 has shown the world we live in all too clearly, as IT leaders let’s look to 2014 and resolve to do things better. Let’s continue to up the investment in security within our walls and be more demanding of our vendors to improve their security. Better security is the number 2 focus item (behind data analytics) for most firms and the US government. And security spend will increase an out-sized amount even as total spend goes up by 5%. This is good news, but let’s ensure the money is spent well and we make greater progress in 2014. Of course, one key step is to get XP out of your environment by March since it will no longer be patched by Microsoft. For a checklist on security, here is a good start at my best practices security reference page.

As for availability, remember that quality provides the foundation to availability. Whether design, implementation or change, quality must be woven throughout these processes to enable robust availability and meet the demands of today’s 7×24 mobile consumers. Resolve to move your shop from craft to science in 2014, and make a world of a difference for your company’s interface to its customers. Again, if you are wondering how best to start this journey and make real progress, check out this primer on availability.

Now, what should you look for in 2014? As with last January, where I made 6 predictions for 2013, I will make 6 technology predictions for 2014. Here we go!

6. There will be consolidation in the public cloud market as smaller companies fail to gather enough long term revenue to survive and compete in a market with rapidly falling prices. Nirvanx was the first of many.

5. NSA will get real governance, though it will be secret governance. There is too much of a firestorm for this to continue in current form.

4. Dual SIM phones become available in major markets. This is my personal favorite wish list item and it should come true in the Android space by 4Q.

3. Microsoft’s ‘messy’ OS versions will be reduced, but Microsoft will not deliver on the ‘one’ platform. Expect Microsoft to drop RT and continue to incrementally improve Pro and Enterprise to be more like Windows 7. As for Windows Phone OS, it is a question of sustained market share and the jury is out. It should hang on for a few more years though.

2. With a new CEO, a Microsoft breakup or spinoffs are in the cards. The activist shareholders are holding fire while waiting for the new CEO, but will be applying the flame once again. Effects? How about Office on the iPad? Everyone is giving away software and charging for hardware and services, forcing an eventual change in the Microsoft business model.

1. Flash revolution in the enterprise. What looked at the start of 2013 to be 3 or more years out looks now like this year. The emergence of flash storage at prices (with de-duplication) comparable to traditional storage and 90% reductions in environmentals will become a stampede with the next generation of flash costing significantly less than disk storage.

What are your top predictions? Anything to change or add?

I look forward to your feedback and next week I will assess how my predictions from January 2013 did — we will keep score!

Best, and have a great holiday,

Jim Ditmore

How Did Technology End Up on the Sunday Morning Talk Shows?

It has been two months since the Healthcare.gov launch and by now nearly every American has heard or witnessed the poor performance of the websites. Early on, only one of every five users was able to actually sign in to Healthcare.gov, while poor performance and unavailable systems continue to plague the federal and some state exchanges. Performance was still problematic several weeks into the launch and even as of Friday, November 30, the site was down for 11 hours for maintenance. As of today, December 1, the promised ‘relaunch day’, it appears the site is ‘markedly improved’ but there are plenty more issues to fix.

What a sad state of affairs for IT. So, what does the Healthcare website issues teach us about large project management and execution? Or further, about quality engineering and defect removal?

Soon after the launch, former federal CTO Aneesh Chopra, in an Aspen Institute interview with The New York Times‘ Thomas Friedman, shrugged off the website problems, saying that “glitches happen.” Chopra compared the Healthcare.gov downtime to the frequent appearances of Twitter’s “fail whale” as heavy traffic overwhelmed that site during the 2010 soccer World Cup.

But given that the size of the signup audience was well known and that website technology is mature and well understood, how could the government create such an IT mess? Especially given how much lead time the government had (more than three years) and how much it spent on building the site (estimated between $300 million and $500 million).

Perhaps this is not quite so unusual. Industry research suggests that large IT projects are at far greater risk of failure than smaller efforts. A 2012 McKinsey study revealed that 17% of lT projects budgeted at $15 million or higher go so badly as to threaten the company’s existence, and more than 40% of them fail. As bad as the U.S. healthcare website debut is, there are dozens of examples, both government-run and private of similar debacles.

In a landmark 1995 study, the Standish Group established that only about 17% of IT projects could be considered “fully successful,” another 52% were “challenged” (they didn’t meet budget, quality or time goals) and 30% were “impaired or failed.” In a recent update of that study conducted for ComputerWorld, Standish examined 3,555 IT projects between 2003 and 2012 that had labor costs of at least $10 million and found that only 6.4% of them were successful.

Combining the inherent problems associated with very large IT projects with outdated government practices greatly increases the risk factors. Enterprises of all types can track large IT project failures to several key reasons:

  • Poor or ambiguous sponsorship
  • Confusing or changing requirements
  • Inadequate skills or resources
  • Poor design or inappropriate use of new technology

Unfortunately, strong sponsorship and solid requirements are difficult to come by in a political environment (read: Obamacare), where too many individual and group stakeholders have reason to argue with one another and change the project. Applying the political process of lengthy debates, consensus-building and multiple agendas to defining project requirements is a recipe for disaster.

Furthermore, based on my experience, I suspect the contractors doing the government work encouraged changes, as they saw an opportunity to grow the scope of the project with much higher-margin work (change orders are always much more profitable than the original bid). Inadequate sponsorship and weak requirements were undoubtedly combined with a waterfall development methodology and overall big bang approach usually specified by government procurement methods. In fact, early testimony by the contractors ‘cited a lack of testing on the full system and last-minute changes by the federal agency’.

Why didn’t the project use an iterative delivery approach to hone requirements and interfaces early? Why not start with healthcare site pilots and betas months or even years before the October 1 launch date? The project was underway for three years, yet nothing was made available until October 1. And why did the effort leverage only an already occupied pool of virtualized servers that had little spare capacity for a major new site? For less than 10% of the project costs a massive dedicated farm could have been built.  Further, there was no backup site, nor any monitoring tools implemented. And where was the horizontal scaling design within the application to enable easy addition of capacity for unexpected demand? It is disappointing to see such basic misses in non-functional requirements and design in a major program for a system that is not that difficult or unique.

These basic deliverables and approaches appear to have been fully missed in the implementation of the wesite. Further, the website code appears to have been quite sloppy, not even using common caching techniques to improve performance. Thus, in addition to suffering from weak sponsorship and ambiguous requirements, this program failed to leverage well-known best practices for the technology and design.

One would have thought that given the scale and expenditure on the program, top technical resources would have been allocated and ensured these practices were used. The feds are  scrambling with a “surge” of tech resources  for the site. And while the new resources and leadership have made improvements so far, the surge will bring its own problems. It is very difficult to effectively add resources to an already large program. And, new ideas introduced by the ‘surge’ resources, may not be either accepted or easily integrated. And if the issues are deeply embedded in the system, it will be difficult for the new team to fully fix the defects. For every 100 defects identified in the first few weeks, my experience with quality suggests there are 2 or 3 times more defects buried in the system. Furthermore, if one wonders if the project couldn’t handle the “easy” technical work — sound website design and horizontal scalability – how will they can handle the more difficult challenges of data quality and security?

These issues will become more apparent in the coming months when the complex integration with backend systems from other agencies and insurance companies becomes stressed. And already the fraudsters are jumping into the fray.

So, what should be done and what are the takeaways for an IT leader? Clear sponsorship and proper governance are table stakes for any big IT project, but in this case more radical changes are in order. Why have all 36 states and the federal government roll out their healthcare exchanges in one waterfall or big bang approach? The sites that are working reasonably well (such as the District of Columbia’s) developed them independently. Divide the work up where possible, and move to an iterative or spiral methodology. Deliver early and often.

Perhaps even use competitive tension by having two contractors compete against each other for each such cycle. Pick the one that worked the best and then start over on the next cycle. But make them sprints, not marathons. Three- or six-month cycles should do it. The team that meets the requirements, on time, will have an opportunity to bid on the next cycle. Any contractor that doesn’t clear the bar gets barred from the next round. Now there’s no payoff for a contractor encouraging endless changes. And you have broken up the work into more doable components that can then be improved in the next implementation.

Finally, use only proven technologies. And why not ask the CIOs or chief technology architects of a few large-scale Web companies to spend a few days reviewing the program and designs at appropriate points. It’s the kind of industry-government partnership we would all like to see.

If you want to learn more about how to manage (and not to manage) large IT programs, I recommend “Software Runaways,” , by Robert L. Glass, which documents some spectacular failures. Reading the book is like watching a traffic accident unfold: It’s awful but you can’t tear yourself away. Also, I expand on the root causes of and remedies for IT project failures in my post on project management best practices.

And how about some projects that went well? Here is a great link to the 10 best government IT projects in 2012!

What project management best practices would you add? Please weigh in with a comment below.

Best, Jim Ditmore

This post was first published in late October in InformationWeek and has been updated for this site.