Is Your Enterprise Secure? 7 Questions You Should Ask

Happy New Year! As we embark on 2020, we unfortunately must recall that 2019 was a relatively dismal year in terms of industry performance against hackers. Even as the world becomes more digital, data breaches and IT failures appear to be multiplying. In fact, 2019 has been called the ‘Year of the Massive Data Breaches’. And worse, many of the breaches were suffered by large and sophisticated corporations: Capital One, Facebook, and Twitter among others. In our challenging digital world, business executives and IT leaders need to take stock in 2020 of too frequent past incompetence and inadequate protection and apply lessons learned to protect their companies, their reputations, and most importantly, their customers. With GDPR in Europe and the new California Privacy regulations, it is also becoming a matter of meeting societal responsibilities and avoiding fines, as well as doing right by the customer.

I recently met with a former technology colleague Peter Makohon, who is a foremost expert in IT Security, and he outlined 7 questions you should ask (and try to answer) to understand how well your corporation is protected. Here are the 7 questions:

  1. What’s on your network?
    Knowing what’s on your corporate network is the most fundamental question to be answered to properly manage your information technology and keep your enterprise secure.  Because of the structure that most corporate networks are built on, if a device or program resides on your network, it can usually easily traverse and infiltrate across your network to critical data or assets. Knowing what is on your network, and that it should be on your network, is necessary to secure your critical data and assets. Your information security team should be working closely with the network engineers to install the right monitors and automated agents so you can gather identifying information from the traffic as it passes through your network.  As devices and programs speak with other programs or devices,  “network communication trails” are left like digital footprints that can ensure you know what is on your network.
  2. What applications are running on your computers? The second most important question to ask is do we know what programs are installed and running on each of the computers within the enterprise. The list of programs and applications should be compared to a list of expected known good applications along with a list of known bad or unwanted applications. Anything that is not on the known good or the known bad list should be put into a malware forensics sandbox for further analysis.  Once a list of applications is put together anything new that shows up each day should be also reviewed in order to determine whether it really belongs in the environment.  Your engineers should be doing more that simple comparisons, they should be leveraging hashing to ensure the identity of the application is verified. Of course, ensuring your environment is properly controlled and administered to minimize the possible downloads of malware or fraudulent software is a key protection to put in place.  
  3. Who are your computers talking to?  Once the network devices and applications are known, analysis can then focus on understanding what destinations are the computers sending and receiving traffic from/to.  It is paramount to understand whether or not the traffic is staying within the enterprise or whether it is traversing across the Internet or to external destinations. An organization’s supply chain should be analyzed to determine which of these outbound and inbound connections are going to valid, approved supplier systems and connections.  Every external destination should be compared to known malware or fraudulent sites. And for all external transmissions, are they properly encrypted and protected?
  4. Where is your data stored and where has it been replicated to?   Perhaps the most difficult question to answer for most companies involves understanding where your company’s and customers’ confidential and restricted data is being stored and sent to, both within your enterprise and the extended ecosystem of suppliers.   Production data can sometimes find its way onto user machines to be used for ad hoc analysis.  This dispersal of data makes protection (and subsequent required regulatory deletion) of customer confidential data much more difficult. Look for your IT team to minimize and eliminate such ‘non-production’ proliferation.
  5. What does the environment look like to an attacker?   Every enterprise has a cyber profile that is visible from outside of the organization. This includes technology and its characteristics on the Internet; information about jobs being offered by an organization; information about employees, including social media; and even vendors who reveal that they have sold technology and products to your company.  It is important to understand what an attacker can see and take steps to reduce the amount of information that is being provided to fill in the “digital puzzle” that the attacker is trying to assemble in order to increase the likelihood of their success. Your websites should ensure their code is obfuscated and not visible to visitors of the site. Most importantly, all interfaces should be properly patched and fully up-to-date to prevent attackers from simply using known attacks to penetrate your systems.
  6. Are your security controls providing adequate coverage? Are they effective?  Just because your organization has spent a considerable amount of time and money purchasing and deploying security measures does not mean that the controls are deployed effectively and are providing the necessary coverage. 2019 has demonstrated too many organizations – with sophisticated and well-funded IT teams – missed the boat on security basics. Too many breaches were successful through elementary attacks, leveraging known security problems in common software. Your IT and security teams need to monitor the environment on a 7×24 basis (leveraging a Security Operations Center or SOC) to identify usual behavior and uncover attacks. Second, your teams should test your controls regularly.   Automated control testing solutions should be utilized in order to understand whether or not the controls are appropriately deployed and configured. Further, external vendors should be engaged to attempt to breach your environment – so called ‘Red teams’ can expose vulnerabilities that you have overlooked and enable you to correct the gaps before real hackers find them.
  7. Are your employees educated and aware of the many threats and typical attack techniques of fraudsters?  Often, helpful and unaware staff are the weakest link in a security defense framework, enabling cybercriminals entry through clicking on malware, helping unverified outsiders navigate your company, and inadequately verifying and checking requested actions. Companies have been defrauded by by simple email ‘spoofing’ where a fraudster poses as the CEO by making his email appear to come from the CEO. The accounting department then doesn’t ask any questions of an unusual request (e.g. to wire money to China for a ‘secret’ acquisition) and the company is then defrauded of millions because of employee unawareness. It is important to keep employees educated on the fraud techniques and hacker attacks, and if not sure what to do, to be cautious and call the information security department.
     

How comfortable are you answering these questions? Anything you would change or add? While difficult, proper investment and implementation of information security is critical to secure your company’s and customer’s assets in 2020. Let’s make this year better than 2019!

Best, Jim Ditmore

Building Advanced Technology Capabilities

Robotics, AI, Advanced Analytics, BPM, Agile, DevOps, Cloud… Technology and business leaders are inundated with marketing pitches and consulting proposals on the latest technology, and how, by applying them, they can win against the competition. Unfortunately far too often, the implementations of advanced technology in their organizations fall well short of the promises: they’re expensive, require enormous organizational change that doesn’t happen, or they just doesn’t produce nearly the expected results. Often, applying advanced technology achieves modest success only in a few areas, for a slim portion of the enterprise while broader impact and benefits never seem to materialize. Frequently, the highly hyped efforts have overhyped success as a pilot only to peter out well before the promised returns are delivered. Organizations and teams then quietly go about their business as they have always done, now with added complexity and perhaps a bit more cynical and disengaged.

The far too frequent inability to broadly digitalize and leverage advanced technologies means most organizations remain anchored to legacy systems and processes with only digital window dressing on interfaces and minimal true commercial digitalization success. Some of the lack of success is due to the shortcomings of the technologies and tools themselves – some advanced technologies are truly overhyped and not ready for primetime, but more often, the issue is due to the adoption approach within the organization and the leadership and discipline necessary to fully implement new technologies at scale.

Advanced technology implementations are not silver bullets that can be readily adopted with benefits delivered in a snap. As with any significant transformation in a large organization, advanced technology initiatives require senior sponsorship and change management. Further, because of the specialty skills, new tools and methods, new personnel, and critically, new ways of getting the work done, there is additional complexity and more possibilities for issues and failures with advanced technology implementations. Thus, the transformation program must plan and anticipate to ensure these factors are properly addressed to enable the implementation to succeed. Having helped implement successfully a number of advanced technologies at large firms, I have outlined the key steps to successful advanced technology adoption as well as major pitfalls to be avoided.

Foremost, leadership and sponsorship must be full in place before embarking on a broad implementation of an advanced technology. In addition, it is particularly crucial at integration points, where advanced technologies require different processes and cycle times than those of the legacy organization. For example, traditional, waterfall financial planning processes and normal but typically undisciplined business decision processes can cause great friction when they are used to drive agile technology projects. The result of an unmanaged integration like this is then failing or greatly underperforming agile projects accompanied by frustration on the technology side and misunderstanding and missed expectations on the business side.

Success is also far more likely to come if the ventures into advanced technologies are sober and iterative. An iterative process, building success but starting small and growing its scope, while at each step, using a strong feedback loop to improve the approach and address weaknesses. Further, robust change management should accompany the effort given the level of transformation. Such change management should encompass all of the ‘human’ and organizational aspects from communications to adjusting incentives and goals, to defining new roles properly, to training and coaching, and ensuring the structures and responsibilities support the new ways of working.

Let’s start with Robotics and Business Process Management, two automation and workflow alternatives to traditional IT and software programming. Robotics or better put, Robotic Process Automation (RPA) has been a rapidly growing technology in the past 5 years, and the forecasts are for even more rapid growth over the next 5 years. For those not familiar, here is a reference page for a quick primer on RPA. Briefly, RPA is the use of software robots to do repetitive tasks that a human typically does when interfacing with a software application. RPA tools allow organizations to quickly set up robots to handle basic tasks thus freeing up staff time from repetitive typing tasks. At Danske Bank, since our initial implementation in 2014 (yes, 2014), we have implemented well over 300 robots leveraging the Blue Prism toolset. Each robot that was built was typically completed in a 2 to 6 week cycle where the automation suggestion was initially analyzed, reviewed for applicability and business return, and then prioritized. We had set up multiple ‘robotic teams’ to handle the development and implementation. Once a robotic team freed up, they would then go to work on the next best idea. The team would take the roughly drafted idea, further analyze and then build and deliver it into production (in two to six weeks). Each robot implemented could save anywhere from a third of an FTE to 30 FTEs (or even more). Additionally, and usually of greater value, the automation typically increased process quality (no typos) and improved cycle time.

Because the cycle time and actual robotic analyze, build, and implement process were greatly different than those of traditional IT projects, it was necessary to build a different discovery, review, and approval process than those for traditional IT projects. Traditional IT (unfortunately) often operates on an annual planning cycle with lengthy input and decision cycles with plenty of debate and tradeoffs considered by management. The effort to decision a traditional mid-sized or large IT project would dwarf the total effort required for implementation of a robot (!) which then would be impractical and wasteful. Thus, a very different review and approval process is required to match the advanced technology implementation. Here, a far more streamlined and ‘pipelined’ approach was used for robotic automation projects. Instead of funding each project separately, a ‘bucket’ of funding was set up annually for Robotics that then had certain hurdle criteria for each robotic project to be prioritized. A backlog of automation ideas was generated by business and operations teams and then based on an quick analysis of ease of implementation, FTE savings, and core functional capability, the ideas were prioritized. Typical hurdle rates were a 6 month or less ROI (yes, the implementation would save more money within 6 months than the full cost of implementation) and at least .5 FTE of savings. Further, implementations that required completing critical utility functionality (e.g., interfacing with the email system or financial approval system) were prioritized early in our Robotics implementation to enable reuse of these capabilities by later automation efforts.

The end result was a strong pipeline of worthwhile ideas that could be easily prioritized and approved. This steady stream of ideas was then fed into multiple independent robotics development teams that were each composed of business or operations analysts, process analysts, and technology developers (skilled in the RPA tool) that could take the next best idea out of the pipeline and work it as soon as that team were ready. This pipeline and independent development factory line approach greatly improved time to market and productivity. So, not only can you leverage the new capabilities and speed of the advanced technology, you also eliminate the stop-go and wait time inefficiencies of traditional projects and approval processes.

To effectively scale advanced technology in a large firm requires proper structure and sponsorship. Alternative scaling approaches can range from a broad or decentralized approach where each business unit or IT team experiment and try to implement the technology to a fully centralized and controlled structure where one program team is tasked to implement and roll it out across the enterprise. While constraints (scarce resources, desire for control (local or central), lack of senior sponsorship) often play a role in dictating the structure, technology leaders should recognize that taking a Center of Excellence (COE) approach is far more likely to succeed at scale for advanced technology implementations. I strongly recommend the COE approach as it addresses fundamental weaknesses that will hamper both the completely centralized or the decentralized approaches and is much more likely to succeed.

When rolling out an advanced technology, the first challenge to overcome is the difficulty to attract and retain advanced technology talent. It is worthwhile to note that just because your firm has decided to embark on adopting an advanced technology does not mean you will be able to easily attract the critical talent to design and implement the technology. Nearly every organization is looking for these talents, and thus you need to have a compelling proposition to attract the top engineers and leaders. In fact, few strong talents would want to join a decentralized structure where it’s not clear who is deciding on toolset and architecture, and the projects have only local sponsors without clear enterprise charters, mandates or impact. Similarly, they will be turned off from top-heavy, centralized programs that will likely plod along for years and not necessarily complete the most important work. By leveraging a COE, where the most knowledgeable talent is concentrated and where demand for services is driven by the prioritizing the areas with the highest needs, your firm will be able attract talent as well as establish an effective utility and deliver the most commercial value. And the best experts and experienced engineers will want to work in a structure where they can both drive the most value as well as set the example for how to get things done. Even better, with a COE construct, each project leverages the knowledge of the prior projects, thus improving productivity and reuse with each implementation. As you scale and increase volume, you get better at doing the work. With a decentralized approach, you often end up with discord where multiple toolsets, multiple groups of experts, and inexperienced users often lead to teams in conflict with each other or duplicating work.

When the COE is initially set up, the senior analysts, process engineers and development engineers in the COE should ensure proper RPA toolset selection and architecture. Further, they lead the definition of the analysis and prioritization methodology. Once projects begin to be implemented, they maintain the libraries of modules and encourage reuse, and they ensure the toolsets and systems are properly updated and supported for production including backups, updates and adequate capacity. Thus, as the use of RPAs grow, your productivity improves with scale, ensuring more and broader commercial successes. By assigning responsibility for the service to your best advanced technology staff, they will plan and avoid pitfalls due to immature, disparate implementations that often fail 12 or 18 months after initial pilots.

Importantly though, in a COE model, demand is not determined by the central team, but rather, is developed by the COE team consulting with each business unit to determine the appetite and capability to tackle automation projects. This consulting results in a rough portfolio being drafted, which is then used as the basis to fund that level of advanced technology implementation for that business unit. Once the draft portfolio is developed and approved, it is jointly and tightly managed by the business unit with the COE to ensure greatest return. With such an arrangement, the business unit feels in control, the planned work is in the areas that the business feels is most appropriate, and the business unit can then line up the necessary resources, direction, and adoption to ensure the automation succeeds commercially (since it is their ambition). Allowing the business unit to drive the demand avoids the typical flaws of a completely centralized model where an organization separate from the unit where the implementation will occur makes the decisions on where and what to implement. Such centralized structures usually result in discord and dissatisfaction between the units doing ‘real business work’ and an ‘ivory tower’ central team that doesn’t listen well. By using a COE with a demand driven portfolio, you get the advantages of a single high performance team yet avoid the pitfalls of ‘central planning’ which often turns into ‘central diktat’.

As it turns out, the COE approach is also valuable for BPM rollouts. In fact, it could be synergistic to run both RPA and BPM from ‘sister’ COEs. Yes, BPM will require more setup and has a longer 6 to 12 or even 18 week development cycle. Experts in BPM are not necessarily experts in RPA tools but they will share process engineering skills and documentation. Further, problems or automation that is too complex for RPAs could be perfectly suited for BPM, thus enabling a broader level of automation of your processes. In fact, some automation or digitalization solutions may turn out to be best using a mix of RPA and BPM. Treating them as similar, each with their own COE structure, their own methodology and their own business demand stream but where they leverage common process knowledge and work together on more complex solutions will yield optimal results and progress.

A COE approach can also work well for advanced analytics. In my experience, it is very difficult for the business unit to attract and retain critical data analytics talent. But, by establishing a COE you can more easily attract enough senior and mid talent for the entire enterprise. Next, you can establish a junior pipeline as part of the COE that works alongside the senior talent and is trained and coached to advance as you experience inevitable attrition for these skills. Further, I recommend establishing Analytics COEs for each ‘data cluster’ so that models and approaches can be shared within a COE that is driven by the appropriate business units. In Financial Services, we found success with a ‘Customer and Product’ Analytics team, a ‘Fraud and Security’ team and a ‘Financing and Risk’ team. Of course, organize the COEs along the data clusters that make sense for your business. This allows greater focus by a COE and impressive development and improvement of their data models, business knowledge and thus results. Again, the COE must be supplemented by full senior sponsorship and a comprehensive change management program.

The race is on to digitalize and take advantage of the latest advanced technologies. Leveraging these practices and approaches will enable your shop to more forward more quickly with advanced technology. What alternatives have you seen or implemented that were successful?

I look forward to hearing your comments and wish you the best on attaining outstanding advanced technology capabilities for your organization. Best, Jim

Hi ho! Hi ho! It’s Off to Cloud we go!

With the ongoing stampede to public cloud platforms, it is worth a clearer look at some of the factors leading to such rapid growth. Amazon, Azure, Google, and IBM and a host of other public cloud services saw continued strong growth in 2018 of 21% to $175B, extending a long run of rapid revenue growth for the industry, according to Gartner in a recent Forbes article. Public cloud services, under Gartner’s definition, include a broad range of services from more traditional SaaS to infrastructure services (IaaS and PaaS) as well as business process services. IaaS, perhaps most closely associated with AWS, is forecast to grow 26% in 2019, with total revenues increasing from $31B in 2018 to $39.5B in 2019. AWS does have the lion’s share of this market with 80% of enterprises either experimenting with or using AWS as their preferred platform. Microsoft’s Azure continues to make inroads as well with increase of enterprises using the Azure platform from 43% to 58%. And Google is proclaiming a recent upsurge in its cloud services in its quarterly earning announcement. It is worth noting though that both more traditional SaaS and private cloud implementations are expect to also grow at near 30% rates for the next decade – essentially matching or even exceeding public cloud infrastructure growth rates over the same time period. The industry with the highest adoption rates of both private and public cloud is the financial services industry where adoption (usage) rates above 50% are common and even rates close to 100% are occurring versus median rates for all industries of 19%.

At Danske Bank, we are close to completing a 4 year infrastructure transformation program that has migrated our entire application portfolio from proprietary dedicated server farms in 5 obsolete data centers to a modern private cloud environment in 2 data centers. Of course, we migrated and updated our mainframe complex as well. Over that time, we have also acquired business software that is SaaS-provided as well as experimented with or leveraged smaller public cloud environments. With this migration led by our CTO Jan Steen Olsen, we have eliminated nearly all of our infrastructure layer technical debt, reduced production incidents dramatically (by more than a 95% ), and correspondingly improved resiliency, security, access management, and performance. Below is a chart that shows the improved customer impact availability achieved through the migration, insourcing, and adoption of best practice.

These are truly remarkable results that enable Danske Bank to deliver superior service to our customers. Such reliability for online and mobile systems is critical in the digital age. Our IT infrastructure and applications teams worked closely together to accomplish the migration to our new, ‘pristine’ infrastructure. The data center design and migration was driven by our senior engineers with strong input from top industry experts, particularly CS Technology. A critical principle we followed was not to just move old servers to the new centers but instead to set up a modern and secure ‘enclave’ private cloud and migrate old to new. Of course this is a great deal more work and requires extensive update and testing to the applications. Working closely together, our architects and infrastructure engineers partnered to design our private that established templates and services up to our middleware, API, and database layers. There were plenty of bumps in the road especially in the our earliest migrations as worked out the cloud designs, but our CIO Fredrik Lindstrom and application teams dug in, and in partnering with the infrastructure team, made room for the updates and testing, and successfully converted our legacy distributed systems to the new private cloud environments. While certainly a lengthy and complex process, we were ultimately successful. We are now reaping the benefits of a fully modernized cloud environment with rapid server implementation times and lower long term costs (you can see further guidelines here on how to build a private cloud). In fact, we have benchmarked our private cloud environment and it is 20 to 70% less expensive than comparable commercial offerings (including AWS and Azure). A remarkable achievement indeed and for the feather in the cap, the program led by Magnus Jacobsen was executed on a relatively flat budget as we used savings generated from insourcing and consolidations to fund much of the needed investments.

Throughout the design and the migration, we have stayed abreast of the cloud investments and results at many peer institutions and elsewhere. We have always looked at our cloud transformation as an infrastructure quality solution that could provide secondary performance and cycle time and cost savings. But our core objective was focused on achieving the availability and resiliency benefits and eliminating the massive risk due to legacy data center environmentals. Yet, much of the dialogue in the industry is focused on cloud as a time to market and cost solution for companies with complex legacy environments, enabling them to somehow significantly reduce systems costs and greatly improve development time to market.

Let’s consider how realistic is this rationale. First, how real is the promise of reduced development time to market due to public cloud? Perhaps, if you are comparing an AWS implementation to a traditional proprietary server shop with mediocre service and lengthy deliver times for even rudimentary servers, then, yes, you enable development teams to dial up their server capacity much more easily and quickly. But compared to a modern, private cloud implementation, the time to implement a new server for an application is (or should be) comparable. So, on an apples to apples basis, generally public and private cloud are comparably quick. More importantly though, for a business application or service that is being developed, the server implementation tasks should be done as parallel tasks to the primary development work with little to no impact on the overall development schedule or time to market. In fact, the largest tasks that take up time in application development are often the project initiation, approval, and definition phases (for traditional waterfall) and project initiation, approval, and initial sprint phases for Agile projects. In other words, management decisions and defining what the business wants the solution to do remain the biggest and longest tasks. If you are looking to improve your time to market, these are the areas where IT leadership should focus. Improving your time to get from ‘idea to project’ is typically a good investment in large organizations. Medium and large corporations are often constrained as much by the annual finance process and investment approval steps as any other factor. We are all familiar with investment processes that require several different organizations to agree and many hurdles to be cleared before the idea can be approved. And the larger the organization, the more likely the investment process is the largest impact to time to market.

Even after you have approval for the idea and the project is funded, the next lengthy step is often ensuring that adequate business, design, and technology resources are allocated and there is enough priority to get the project off the ground. Most large IT organizations are overwhelmed with too many projects, too much work and not enough time or resources. Proper prioritization and ensuring that not too many projects are in flights at any one time are crucial to enable projects to work at reasonable speed. Once the funding and resources are in place, then adopting proper agile approaches (e.g. joint technology and business development agile methods) can greatly improve the time to market.

Thus, must time to market issues have little to do with infrastructure and cloud options and almost everything to do with management and leadership challenges. And the larger the organization, the harder it is to focus and streamline. Perhaps the most important part of your investment process is on what not to do, so that you can focus your efforts on the most important development projects. To attain the prized time to market so important in today’s digital competition, drive instead for a smooth investment process coupled with a properly allocated and flexible development teams and agile processes. Streamlining these processes, and ensuring effective project startups (project manager assigned, dedicated resources, etc) will yield material time to market improvements. And having a modern cloud environment will then nicely support your streamlined initiatives.

On the cost promise of public cloud, I find it surprising that many organizations are looking to public cloud as silver bullet for improving their costs. For either legacy or modern applications, the largest costs are the software development and software maintenance cost – ranging anywhere from 50% to 70% percent of the full lifetime cost of a system. Next will come IT operations – running the systems and the production environment – as well as IT security and networks at around 15-20% of total cost. This leaves 15 to 30% of lifetime cost for infrastructure, including databases, middleware, and messaging as well as the servers and data centers. Thus, the servers and storage total perhaps 10-15% of the lifetime cost. Perhaps you can achieve a a 10%, or even 20% or 30% reduction in this cost area, for a total systems cost reduction of 2-5%. And if you have a modern environment, public cloud would actually be at a cost disadvantage (at Danske Bank, our new private cloud costs are 20% to 70% lower than AWS, Azure, and other public clouds). Further, focusing on a 2% or 5% server cost reduction will not transform your overall cost picture in IT. Major efficiency gains in IT will come from far better performance in  your software development and maintenance — improving productivity, having a better and more skilled workforce with fewer contractors, or leveraging APIs and other techniques to reduce technical and improve software flexibility.  It is disingenuous to suggest you are tackling primary systems costs and making a difference for your firm with public cloud. . You can deliver 10x total systems cost improvements by introducing and rolling out software development best practices, achieving an improved workforce mix and simplifying your systems landscape than simply substituting public cloud for your current environment. And as I noted earlier, we have actually achieved lower costs with a private cloud solution versus commercial public cloud offerings. And there are hidden factors to consider with public cloud. For example, when testing a new app on your private cloud, you can run the scripts in off hours to your heart’s content at minimal to no cost, but you would need to watch your usage carefully if on a public cloud, as all usage results in costs. The more variable your workload is also means it could cost less on a public cloud — the reverse being the more stable the total workload is, the more likely you can achieve significant savings with private cloud.

On a final note, with public cloud solutions come lock-in, not unlike previous generations of proprietary hardware or wholesale outsourcing. I am certain a few of you recall the extensions done to proprietary Unix flavors like AIX and HP-UX that provided modest gains but then increased lock-in of an application to that vendor’s platform. Of course, the cost increases from these vendors came later as did migration hurdles to new and better solutions. The same feature extension game occurs today in the public cloud setting with Azure or AWS or others. Once you write your applications to take advantage of their proprietary features, you have now become an annuity stream for that vendor, and any future migration off of their cloud with be arduous and expensive. Your ability to move to another vendor will typically be eroded and compromised with each system upgrade you implement. Future license and support price increases will need to be accepted unless you are willing to take on a costly migration. And you have now committed your firm’s IT systems and data to be handled elsewhere with less control — potentially a long term problem in the digital age. Note your application and upgrade schedules are now determined by the cloud vendor, not by you. If you have legacy applications (as we all do) that rely on an older version of infrastructure software or middleware, thus must be upgraded and keep pace, otherwise they don’t work. And don’t count on a rollback if problems are found after the upgrades by the cloud vendor.

Perhaps more concerning, in this age of ever bigger hacks, is that the public cloud environments become the biggest targets for hackers, from criminal gangs to state sponsored. And while, they have much larger security resources, there is still a rich target surface for hackers. The recent Capital One breach is a reminder that proper security remains a major task for the cloud customer.

In my judgement, certainly larger corporations are better off maintaining control of their digital capabilities with a private cloud environment than a public cloud. This will likely be supplemented with a multi-cloud environment to enable key SaaS capabilities or leverage public cloud scalability and variable expense for non-core applications. And with the improving economies of server technology and improved cloud automation tools, these environments can also be effectively implemented by medium-sized corporations as well. Having the best digital capabilities — and controlling them for your firm —  is key to outcompeting in most industries today. If you have the scale to retain control of your digital environment and data assets, then this is the best course to enabling future digital success.

What is your public or private cloud experience? Has your organization mastered private, public, or multi-cloud? Please share your thoughts and comments.

Best, Jim

P.S. Worth noting that public clouds are not immune to availability issues as well as reported here.

Building a Technology Team in an Era of Talent Scarcity

The technology labor force in the US has been in full employment (4% or less) for 22 of the past 30 years, and has been below 2% unemployment the last 3 years. With US GDP growth now above 3%, the overall labor market has become very tight, and for technology professionals even more so. This tight technology labor market fin the US is also reflected in much of Europe and parts of Asia. In Copenhagen, one of the key locations we source talent for Danske Bank, it has been estimated there are as many as 30,000 unfilled tech jobs in the greater Copenhagen area alone. Without an adequate supply of new skilled workers — not just technology, but many types of skilled workers — we are seeing companies lower their forecasts for growth. This has also reduced productivity and wages gains in many countries with lower levels of available skilled workers. From aircraft workers in Canada to skilled manufacturing workers in Germany to construction workers in the US, the talent deficit has broaden and deepened across many sectors and in many geographies. And this talent shortage will be with us for a while. Even if the economy cools off in the next few years, the large number of retiring baby boomers will make it difficult to replace all of the required skilled workers employed today with new graduates. And in IT, the shortage appears to be particularly acute.

So how should the technology leaders approach building or maintaining their teams in such a tight labor market? Nearly every organization that has a good reputation in the market will be recruited heavily, so even if you have the talent now, it will be difficult to hold on to all of your staff. Thus, you must plan to not only be able to fill your additions, but must counter the likely attrition that will also occur.

The only sustainable way to achieve this is to build your team. If you try to buy your team, where you go out to the market repeatedly to recruit both senior and mid-level talent, it will be expensive. And even with the best recruiting process you will have misses. When you recruit senior talent, based on the top grading assessment, it can result in 1 in 4 duds. And these are expensive misses, as the senior mis-hires result in further mis-hires or greater attrition in their division, or poor subsequent technical decisions. It is far better to be highly selective on which positions to recruit external, and where necessary, try to leverage talent that has been previously successful in similar situations in their career, and utilize a filtering process like topgrading. Thus, good leaders should minimize the buying of talent – use external recruiting only for critical position or skill gaps – and focus your efforts on recruiting outstanding junior talent and then building and growing your team.

To build a high performance technology team takes time and disciplined effort by the entire leadership team. You must start from the ground up. Presumably, you have located your primary technology sites where there are good technical universities nearby. The local universities are where you start your pipeline of talent. Your senior managers and recruiting team should be fully engaged with the local universities: sponsoring programs or projects, providing seminars, and just being active and providing a link to your firm. More importantly, invest in a student worker or intern program that is both wide and meaningful. Bring onboard students to your firm to do real and meaningful work that enables them to contribute and grow as well as to understand the technology work done at your firm. If you provide these opportunities and ensure active and effective interaction between the students and your senior engineers and managers, then the students will gain valuable skills and insights and become positive supporters of your employer reputation. When it comes time for recruiting the graduates, all of them will be much more familiar with your firm and what it offers, and those that were student workers will likely be highly inclined to join your team. And even better, you will know from experience which students are the best fit for your team.

In addition to building the pipeline of junior talent from the local universities, you must filter and develop the talent on your current team. To bring structure and clarity to staff development, I strongly recommend defining and implementing a career path program. A career path program maps the skills and competencies for each job position within a job family across your technology organization. From network engineering to project management to business analysis to software development, each job family is defined from its junior levels to its most senior levels, showing the additional skills, experience, and competencies required at each level. Further, certifications or demonstrated levels of excellence are noted for positions. Typical career paths are mapped through these job positions to show what an individual can both expect and accomplish in their career. This clarity enables your staff to understand what their next steps are, as well as to emphasize the importance of the development and acquiring of new skills and capabilities. And while a career path program requires an upfront investment in its definition, it pays back dividends in terms of employee satisfaction and improved staff development. Once you have the competencies, skills, or certifications defined, then you must work with local universities, training contractors and your senior staff to provide the content and curriculum in accessible formats so your team can drive their own development. And you must provide the budget and encouragement for your staff to take the initiative and accelerate their professional development.

With the foundational elements of a pipeline and staff development in place, you will have plentiful rising and ready junior staff. But, they must be able to step into their next spot up. To enable a free flow, mid-level positions must be vacated due to growth and promotion. If you have not ensured that the vast majority of the positions on your team are growth positions, that is requiring your staff to develop and have the potential for the next level, then those positions can become ‘blockers’. Having a large number of ‘blockers’ (staff that perform adequately but are not capable of moving to the next level) results in a stagnated and frustrated junior pool of talent underneath. This ready pool of talent will leave your organization if they do not see adequate opportunity when they acquire valuable additional skills that can get them to the next level in the market. This is often a major flaw in legacy organizations, which have maintained too many personnel who have key knowledge but not strong skills or potential. Your leaders must identify such situations and either encourage the blockers to become higher performing, or remove them. If you are not sure how best to carry out this ‘pruning’ or improvement – read the best practice page to gain the methods. I strongly recommend there should be ‘no escalators, only stairs’ in your organization. That is to say, one must work and gain skills and competencies to move up in the organization (i.e., take the stairs) and just because someone has been in the organization for an extended period of time, one is not promoted based on length of service (i.e., there are no escalators) but instead only on merit and capability. Rewarding staff development and performance with opportunity and promotion will further encourage your team to take their development responsibility seriously and build their capabilities.

This ‘build’ approach should be used at each of your strategic locations for your IT staff — and you should leverage the overall Global Team approach explained here on our reference page for the entire IT organization.

Given the current environment, it is likely that some of your best engineers or leaders will gain opportunities elsewhere. This is to be expected, especially if you have a reputation for being a strong technology shop. Of course, you should work where possible to minimize such losses, but you will still have losses. So it is better to prepare for such attrition, by developing a strong bench that is ready to step in or step up, into the newly vacant positions. Each of your leaders should understand the development of a bench for the key positions in their organization is a critical and ongoing responsibility. With a strong pipeline and development program in place, you will have a good flow of up and coming talent that can then step into these roles and allow your shop to keep humming. In sum, in a tight labor market, the best technology leaders build their teams with strong pipelines to enable them to obtain enough talent to get the work done, to have the mid and senior expertise to do key technology work and apply best practices, and to have a bench ready to step up when there is attrition.

What experience have you had as a leader with building teams? What has worked best, build or buy?

Best, Jim Ditmore

Consumer Tech and the Small Business Boost or Digitalization and the Large Corporate Leap?

Occasionally, a few events cluster in daily life that make you sit back and realize: “Wow, things have really changed.” This spring, as a typical homeowner, I had the local heating and AC service company come out and inspect and tune up the A/C. I registered the appointment over the internet and did not think anything was really different (though the web appointment was much better than previous experiences of calling and being put on hold). Yet on the day of the appointment, I received a call from the service technician while en route, telling me he would be there in 30 minutes. That was nice, and even better when he arrived right on time (this is a feat in itself in America, not in Denmark 🙂 ). As the technician inspected the units, took care of the issues (faulty thermostat) I noticed more changes. He had checked out what the proper replacement thermostat was from a modified iPhone  and then gone and pulled it from his truck. Throughout, he did everything on his phone,  making notes, compiling the invoice, getting my signature on it, emailing it to me, taking a picture of my check (or he could have swiped my credit card). He topped it off by setting up the fall tuneup. Fully intrigued, I asked about the impact of the new device and capabilities to his service day. As it turns out, all of his appointments for the day were on the iPhone as well as the driving instructions. His company had transitioned everything the field service techs did to the iPhones and ‘pretty much eliminated all paper’. His view? Things were a lot easier, he spent more time doing real work and no lost paperwork. My view? Wow!

To see a small enterprise make such a dramatic impact with IT on the everyday tasks and productivity of a small business like AC or furnace repair is remarkable. And the potential of impact by consumer technologies on small businesses was driven home when I went to the local barbecue restaurant for lunch. When the attendant took my order with an iPad and then explained they no longer had regular cash registers, I was wondering if I had been in a time warp and somehow missed these changes. She further explained that the iPads were also a huge convenience when they set up their booths at festivals and fairs.  Another win for consumer tech in small businesses.

I still remember being slightly surprised when I walked into the first Apple stores back before 2010 and instead of walking back to a register line with my purchase, the Apple salesperson processed it right where I stood with a slightly modified iPhone. Pretty cool. But now the barbecue place also? And the furnace repair guy? And workflow and invoice and payment software to match? The consumer tech and accompanying software are becoming serious business tools of choice for small businesses. They are not just being used to improve payments and act as cash registers (and of course, there are other good tools that have been introduced like Square or Stripe or many others), but handle appointments, customer communications, inventory, workflow, delivery routing, ordering, invoicing, and accounting. These vertical apps on consumer devices allow small businesses to minimize the administration overhead and focus far more on their true services to their customer.

What is also compelling about the adoption of the new mobile technologies by small businesses is the level of positive impact they are having on the businesses themselves. Eliminating paper? That has been a lofty goal of many large businesses for decades. Looks like small businesses are actually getting it done. Provide much better customer service through all-electronic customer interactions? Also being done. This enables to small business to compete much more effectively for that customer. Enable employees to be more productive from anywhere?  Check. And all while leveraging consumer-based and cloud technologies at a fraction of the small business IT costs and complexity from just 5 or 10 years ago.

And yet, as compelling as these small business examples are, recent articles (here, the WSJ) suggest that the largest enterprises are grabbing the biggest gains from technology implementations. As noted in the article, “economists have discovered an unsettling phenomenon: While top companies are getting much more productive, gains are stalling for everyone else. And the gap between the two is widening, with globalization and new technology delivering outsize rewards to the titans of the global economy.” Thus, gains in productivity from apply technology appear to be extremely uneven across the enterprise landscape. The larger firms, or the ones most adept at applying technology, are reaping most of the rewards.

The gap becomes even larger when gains are achieved through proprietary solutions that then allow outsized productivity gains. One example provided was PWC building lead analysis software that enabled 30x productivity gains in scanning contracts. PWC built the software itself, and even though there is commercial software now available for smaller firms, the cost of the software reduces the gains. Of course, if the software becomes not just a productivity gain but a industry or sector platform – like Amazon’s marketplace software – then the gains become enormous and far beyond just productivity.

As the scope of digitalization expands and the possibilities of doing ever more functions and capabilities increase with technology’s advances, it appears that the leading companies who have scale can craft custom software solutions to greatly streamline and reduce costs and enable them to win the lion’s share of the gains – particularly in productivity. Or even win the lion’s share of the market with a compelling platform (like Amazon’s marketplace). And by having the scale, when you do hit the mark with your digitalization, your gains are much larger.

Of course, making the right investments and successfully optimizing the processes of a large and complex business requires enormous vision, skill, persistence, collaboration, and leadership. It’s not about buying the latest tech (e.g. AI engine), but instead it is about having a strong vision of your place in your market, an even stronger understanding of your customers and what they want, and the willingness to work long and hard together to deliver the new capabilities. Thus, instead of a ‘new’ way to success, digitalization and technology just increase the rewards for the largest companies that focus on their customers and deliver these solutions better.

And the small businesses that are truly gaining advantage from becoming digitalized? Maybe they will grow faster and emerge as large enterprise winners in the future.

What has the impact of consumer tech been on your enterprise? Are you seeing the same changes in your local small businesses? And for large enterprises, are you seeing productivity gains from digitalization? And if you are one of the biggest, should you be expecting more from your digitalization investments?

I look forward to your comments.

Best, Jim Ditmore

The Service Desk in the Age of Digitalization and AI

When I published the original Service Desk posts, it was more than a few years ago and since then we have seen great progress in digitalization. Importantly, technologies including advanced analytics and AI have also been introduced into the business mainstream. While much of the best practices that Steve Wignall, Bob Barnes and myself detailed still hold true, there are important new advances that can and should be leveraged. These new advances coupled with strong implementation of foundational practices can substantially improve the quality and cost of your service desk.

In the era of digitalization, the service desk has actually increased in importance as it is the human touch the remains easiest to reach in time of trouble by your users or customers. These advances in technology though can be used to improve the accessibility of the interface. For example, no longer is the service desk, just a phone interface. Now, especially with external customer desks, the interface includes chat and email. And this communication can also be ‘proactive’ where you reach out to the customer versus ‘reactive’ where you wait for them to call or chat. A proactive chat message being offered to the customer when they are hovering or waiting on an internet interface can be an excellent helping hand to your customers. Allowing them to easily reach out to your service team and obtain information or assistance that enables the to complete their transaction. Commercial results can be extremely beneficial as you reduce ‘dropout rates’ on important transactions. And overall, given such proactive chats are typically seen as unobtrusive and helpful, you can greatly improve the customer experience and satisfaction.

Further, advances in data analytics and artificial intelligence yield new capabilities from voice authentication to interactive, natural voice menus to AI actually answering the customer questions. Below are the details of these techniques and suggestions on how best to leverage these technologies.  Importantly, remember that the service desk is a critical human interface during those service ‘moments of truth’, and must be a reliable and effective channel that works as a last resort. Thus, the technology is not a replacement for the human touch, but an augment to knowledgeable personnel who are empowered to provide the services your customers need.  If your staff are without such skills and authority, a technologically savvy service desk will only compound your customers’ frustration with your services and miss opportunities to win in the digital age.

As a start, and regardless of whether an internal or external service desk, the service desk model should be based on ITIL and I recommend you start with this base of knowledge on service management. With that foundation in mind, below we cover the latest techniques and capabilities you should incorporate into your desk.

Proactive Chat: Proactive chat can greatly lift the performance of your customers’ interaction with your website and not just by reducing abandons on a complex web page.  You can use it for customers lingering over FAQs and help topics or assist customers arriving from other sites with known campaign referrals. Good implementations avoid bothering the customer when they first get on your page or when they are speeding through a transaction. Instead, you approach the customer as a non-intrusive but helpful presence armed with the likely question they are asking themselves. Such accessibility is key to enable the best customer service and avoid ‘dropouts’ on your website or mobile app. Easy to access chat from your web site or mobile app can substantially improve customer completion rates on services. Improved completion rates (and thus increased revenues) alone can often justify the cost of excellent chat support. There are several vendors offering good products to integrate with your website and service desk and you can find a good quick reference on proactive chat practices here.  And the rewards can be significant with 30 to 50% reduced abandons and much higher customer satisfaction.

Voice Authentication and Natural Language: Another technology that has advanced substantially in the past few years is voice authentication. Of course, biometrics in general has advanced broadly from fingerprint to face recognition now being deployed in everyday devices — to varying degrees of effectiveness. Voice authentication is one of the more mature biometric areas and has been adopted by many institutions to authenticate users when they call in. Voice authentication can be done either in active (e.g. using a set passphrase) or in passive mode (the user speaks naturally to the call center representative and after a period of time is either authenticated or rejected). Some large financial services companies (e.g., Barclays) have been deployed this for 2 years or more, with very high customer satisfaction results and reductions in impersonation or fraud. I recommend a passive implementation as it seems less likely to be ‘cracked’ (there is no set passphrase to record or impersonate with) and it results in a more natural human conversation. Importantly, it reduces the often lengthy time spent authenticating a customer and the representative does not ask the sometimes inane security questions of a customer which only further annoys them. Voice authentication along with traditional ANI schemes (where you use the originating number to identify the customer and their most recent information, requests or transactions are provided to the service agent) enables more certain authentication as well as the ability to immediately launch into the issue or service the customer is trying to achieve.

In addition, there is a growing use of using spoken or even ‘natural language’ to replace traditional IVR menus using touchtones (e.g. instead ‘Push 1 for Billing’, ‘tell us the nature of your call – is it related to billing, or to your order delivery, or another topic?’). Unfortunately, these can often result in a IVR maze (or even ‘hell’) for some customers when they use an usual phrase or their words are not recognized. And given there is no easy way out (e.g. ‘push 0 for an agent’), you end up frustrating your customers even more. I would be very cautious on implementing such systems as they rarely contribute to the customer experience or to efficiency.

Improved analytics and AI:  Analytics is an area that has advanced dramatically over the past 2 years. The ability to combine both your structured transaction data with additional big data from web logs to social media information means you can know much more about your customers when they call in. As advantageous as this can be, ensure first you have a solid customer profile in place that allows your agents to know all of the basics about your customer. Next layer in all recent activity in other channels – web, mobile, chat. Then supplement with suggestions such as next best product or service recommendations or upgrades based on customer characteristics or similar customer actions. You can substantially increase customer confidence by showing ‘Customers like you …. ‘.  Of course, you must leverage such data in accordance with regulatory requirements (e.g. GDPR) and in a transparent way that gives the customer the confidence that you are protecting their data and using it to provide better solutions and service for them. This is paramount, because if you lose the customer trust with their data, or appear ‘creepy’ with your knowledge, then you are ruining the customer experience you wish to provide.

Only after you have a robust customer data foundation and can demonstrate improved customer services utilizing analytics should you consider exploring AI bots. Without the customer information, AI bots are actually just ‘dumb’ bots that will likely annoy your customer.  And the recent pilots I have seen of AI capabilities have only handled the easiest questions after a huge amount of work to implement and train. Of course, I would expect this technology to improve rapidly in the coming years and their commercial proposition to become better.

Agent/Customer Matching : One other method to optimize service is through agent/customer matching where either with an automated tool or through active customer selection agents are matched to customers. The matching can occur based on emotional, experience, or other dimensions. The result is a better experience for the customer and likely a better connection with your company.

Service optimization and demand reduction: While certainly a fundamental capability, service optimization (where you use to data from the calls to proactively adjust your services and interfaces to eliminate the need for the call in the first place) becomes even more powerful when you combine it with additional data from all of your channels and the customer. You can identify root causes for calls and eliminate them better than ever. Using Pareto analysis, you can look into your most frequent calls and understand what are the defects, product problems, process gaps, or web page issues that your customers or internal users are experiencing — especially when bounced up against web logs that show how the customer navigates (or is unable to) your pages. The service desk team should then run a crisp process with management sponsorship to ensure the original issues are corrected. This can reduce your incident or problem calls by 20, 30 or even 40%. Not only do you get the cost reduction from the reduced calls, but more importantly, you greatly reduce the problems and annoyances your customers or staff experience. You optimize the services you provide and ensure a smoother customer experience through the ongoing execution of such a feedback loop. We have used to great effect within the Danske Bank IT service desk in the past two years enabling us to offer far better service at lower cost. Attached is a diagram representing the process: Demand Reduction.   Of course, credit goes fully to the team (Dan, Ona, and many others) for such successful development and execution of the practice.

So, that is our quick survey of new technologies to support the service desk in the digital age. As I noted at the beginning, you should make sure you have a solid foundation in your service centers before moving to the advanced technology. There’s no substitute for doing the basics right, and the business return on investments in the latest technologies will often be minimal until they are in place. For a quick reference on all of the foundational practices please see the service desk summary page, and  make sure you review the material on the key ingredient: service desk leadership.

Best, Jim Ditmore

The Elusive High Availability in the Digital Age

Well, the summer is over, even if we have had great weather into September. My apologies for the delay in a new post, and I know I have several topic requests to fulfill 🙂 Given our own journey at Danske Bank on availability, I thought it was best to re-touch this topic and then come back around to other requests in my next posts. Enjoy and look forward to your comments!

It has been a tough few months for some US airlines with their IT systems availability. Hopefully, you were not caught up in the major delays and frustrations. Both Southwest and Delta suffered major outages in August and September. Add in power outages affecting equipment and multiple airlines recently in Newark, and you have many customers fuming over delays and cancelled flights. And the cost to the airlines was huge — Delta’s outage alone is estimated at $100M to $150M and that doesn’t include the reputation impact. And such outages are not limited to the US airlines, with British Airways also suffering a major outage in September. Delta and Southwest are not unique in their problems, both United and American suffered major failures and widespread impacts in 2015. Even with large IT budgets, and hundreds of millions invested in upgrades over the past few years, airlines are struggling to maintain service in the digital age. The reasons are straightforward:

  • At their core, services are based on antiquated systems that have been partially refitted and upgraded over decades (the core reservation system is from the 1960s)
  • Airlines have struggled earlier this decade to make a profit due to oil prices, and minimally invested in the IT systems to attack the technical debt. This was further complicated by multiple integrations that had to be executed due to mergers.
  • As they have digitalized their customer interfaces and flight checkout procedures, the previous manual procedures are now backup steps that are infrequently exercised and woefully undermanned when IT systems do fail, resulting in massive service outages.

With digitalization reaching even further into the customer interfaces and operations, airlines, like many other industries, must invest in stabilizing their systems, address their technical debt, and get serious about availability. Some should start with the best practices in the previous post on Improving Availability, Where to Start. Others, like many IT shops, have decent availability but still have much to do to get to first quartile availability. If you have made good progress but realize that three 9’s or preferably four 9’s of availability on your key channels is critical for you to win in the digital age this post covers what you should do.

Let’s start with the foundation. If you can deliver consistently good availability, then your team should already understand:

  • Availability is about quality. Poor availability is a quality issue. You must have a quality culture that emphasizes quality as a desired outcome and doing things right if you wish to achieve high availability.
  • Most defects — which then cause outages — are injected by change. Thus, strong change management processes that identify and eliminate defects are critical to further reduce outages.
  • Monitor and manage to minimize impact. A capable command center with proper monitoring feeds and strong incident management practices may not prevent the defect from occurring but it can greatly reduce the time to restore and the overall customer impact. This directly translates into higher availability.
  • You must learn and improve from the issues. Your incident management process must be coupled with a disciplined root cause analysis that ensures teams identify and correct underlying causes that will avoid future issues. This continuous learning and improvement is key to reaching high performance.

With this base understanding, and presumably with only smoldering areas of problems for IT shop left, there are excellent extensions that will enable your team to move to first quartile availability with moderate but persistent effort. For many enterprises, this is now a highly desirable business goal. Reliable systems translate to reliable customer interfaces as customers access the heart of most companies systems now through internet and mobile applications, typically on a 7×24 basis. Your production performance becomes very evident, very fast to your customers. And if you are down, they cannot transact, you cannot service them, your company loses real revenue, and more importantly, damages it’s reputation, often badly. It is far better to address these problems and gain a key edge in the market by consistently meeting or exceeding costumer availability expectations.

First, if you have moved up from regularly fighting fires, then just because outages are not everyday, does not mean that IT leadership no longer needs to emphasize quality. Delivering high quality must be core to your culture and your engineering values. As IT leaders, you must continue to reiterate the importance of quality and demonstrate your commitment to these values by your actions. When there is enormous time pressure to deliver a release, but it is not ready, you delay it until the quality is appropriate. Or you release a lower quality pilot version, with properly set customer and business expectations, that is followed in a timely manner by a quality release. You ensure adequate investment in foundational quality by funding system upgrades and lifecycle efforts so technical debt does not increase. You reward teams for high quality engineering, and not for fire-fighting. You advocate inspections, or agile methods, that enable defects to be removed earlier in the lifecycle at lower cost. You invest in automated testing and verification that enables work to be assured of higher quality at much lower cost. You address redundancy and ensure resiliency in core infrastructure and systems. Single power cord servers still in your data center? Really?? Take care of these long-neglected issues. And if you are not sure, go look for these typical failure points (another being SPOF network connections). We used to call these ‘easter eggs’, as in the easter eggs that no one found in a preceding year’s easter egg hunt and then you find the old, and quite rotten, easter egg on your watch. It’s no fun, but it is far better to find them before they cause an outage.

Remember that quality is not achieved by not making mistakes — a zero defect goal is not the target — instead, quality is achieved by a continuous improvement approach where defects are analyzed and causes eliminated, where your team learns and applies best practices. Your target goal should be 1st quartile quality for your industry, that will provide competitive advantage.  When you update the goals, also revisit and ensure you have aligned the rewards of your organization to match these quality goals.

Second, you should build on your robust change management process. To get to median capability, you should have already established clear change review teams, proper change windows and moved to deliveries through releases. Now, use the data to identify which groups are late in their preparation for changes, or where change defects are clustered around and why. These understandings can improve and streamline the change processes (yes, some of the late changes could be due to too many approvals required for example). Further clusters of issues may be due to specific steps being poorly performed or inadequate tools. For example, often verification is done as cursory task and thus seldom catches critical change defects. The result is that the defect is then only discovered in production, hours later, when your entire customer base is trying but cannot use the system. Of course, it is likely such an outage was entirely avoidable with adequate verification because you would have known at the time of the change that it had failed and could have take action then to back out the change. The failed change data is your gold mine of information to understand which groups need to improve and where they should improve. Importantly, be transparent with the data, publish the results by team and by root cause clusters. Transparency improves accountability. As an IT leader, you must then make the necessary investments and align efforts to correct the identified deficiencies and avoid future outages.

Further, you can extend the change process by introducing production ready.  Production ready is when a system or major update can be introduced into production because it is ready on all the key performance aspects: security, recoverability, reliability, maintainability, usability, and operability. In our typical rush to deliver key features or products, the sustainability of the system is often neglected or omitted. By establishing the Operations team as the final approval gate for a major change to go into production, and leveraging the production ready criteria, organizations can ensure that these often neglected areas are attended to and properly delivered as part of the normal development process. These steps then enable a much higher performing system in production and avoid customer impacts. For a detailed definition of the production ready process, please see the reference page.

Third, ensure you have consolidated your monitoring and all significant customer impacting problems are routed through an enterprise command center via an effective incident management process. An Enterprise Command Center (ECC) is basically an enterprise version of a Network Operations Center or NOC, where all of your systems and infrastructure are monitored (not just networks). This modern ECC also has capability to facilitate and coordinate triage and resolution efforts for production issues. An effective ECC can bring together the right resources from across the enterprise and supporting vendors to diagnose and fix production issues while providing communication and updates to the rest of the enterprise. Delivering highly available systems requires an investment into an ECC and the supporting diagnostic and monitoring systems. Many companies have partially constructed the diagnostics or have siloed war rooms for some applications or infrastructure components. To fully and properly handle production issues these capabilities must be consolidated and integrated. Once you have an integrated ECC, you can extend it by moving from component monitoring to full channel monitoring. Full channel monitoring is where the entire stack for a critical customer channel (e.g. online banking for financial services or customer shopping for a retailer) has been instrumented so that a comprehensive view can be continuously monitored within the ECC. The instrumentation is such that not only are all the infrastructure components fully monitored but the databases, middleware, and software components are instrumented as well. Further, proxy transactions can and are run at a periodic basis to understand performance and if there are any issues. This level of instrumentation requires considerable investment — and thus is normally done only for the most critical channels. It also requires sophisticated toolsets such as AppDynamics. But full channel monitoring enables immediate detection of issues or service failures, and most importantly, enables very rapid correlation of where the fault lies. This rapid correlation can take incident impact from hours to minutes or even seconds. Automated recovery routines can be built to accelerate recovery from given scenarios and reduce impact to seconds. If your company’s revenue or service is highly dependent on such a channel, I would highly recommend the investment. A single severe outage that is avoided or greatly reduced can often pay for the entire instrumentation cost.

Fourth, you cannot be complacent about learning and improving. Whether from failed changes, incident pattern analysis, or industry trends and practices, you and your team should always be seeking to identify improvements. High performance or here, high quality, is never reached in one step, but instead in a series of many steps and adjustments. And given our IT systems themselves are dynamic and changing over time, we must be alert to new trends, new issues, and adjust.

Often, where we execute strong root cause and followup, we end up focused only at the individual issue or incident level. This can be all well and good for correcting the one issue, but if we miss broader patterns we can substantially undershoot optimal performance. As IT leaders, we must always consider the trees and the forest. It is important to not just get focused on fixing the individual incident and getting to root cause for that one incident but to also look for the overall trends and patterns of your issues. Do they cluster with one application or infrastructure component? Does a supplier contribute far too many issues? Is inadequate testing a common thread among incidents? Do you have some teams that create far more defects than the norm? Are your designs too complex? Are you using the products in a mainstream or unique manner – especially if you are seeing many OS or product defects? Use these patterns and analysis to identify the systemic issues your organization must fix. They may be process issues (e.g. poor testing), application or infrastructure issues (e.g., obsolete hardware), or other issues (e.g., lack of documentation, incompetent staff). Discuss these issues and analysis with your management team and engineering leads. Tackle fixing them as a team, with your quality goals prioritizing the efforts. By correcting things both individually and systemically you can achieve far greater progress. Again, the transparency of the discussions will increase accountability and open up your teams so everyone can focus on the real goals as opposed to hiding problems.

These four extensions to your initial efforts will set your team on a course to achieve top quartile availability. Of course, you must couple these efforts with diligent engagement by senior management, adequate investment, and disciplined execution. Unfortunately, even with all the right measures, providing robust availability for your customers is rarely a straight-line improvement. It is a complex endeavor that requires persistence and adjustment along the way. But by implementing these steps, you can enable sustainable and substantial progress and achieve top quartile performance to provide business advantage in today’s 7×24 digital world.

If your shop is struggling with high availability or major outages, look to apply these practices (or send your CIO the link to this page 🙂 ).

Best, Jim Ditmore

Infrastructure Engineering – Leveraging a Technology Plan

Our recent post discussed using the Infrastructure Engineering Lifecycle (IELC) to enable organizations to build a modern, efficient and robust technology infrastructure. One of the key expressions that both leverages and IELC approach and helps an infrastructure team properly plan and navigate the cycles is the Technology Plan. Normally, the technology plan is constructed for each major infrastructure ‘component’ (e.g. network, servers, client environment, etc). A well-constructed technology plan creates both the pull – outlining how the platform will meet the key business requirements and technology objectives and the push – reinforcing proper upkeep and use of the IELC practice.

Digitalization continues to sweep almost every industry, and the ability of firms to actually deliver the digital interfaces and services requires a robust, modern and efficient infrastructure. To deliver an optimal technology infrastructure, one must utilize an ‘evergreen’ approach and maintain an appropriate technology pace matching the industry. Similar to a dolphin riding the bow wave of a ship, a company can optimize both the feature and capability of its infrastructure and minimize its cost and risk by staying consistently just off the leading pace of the industry. Often companies make the mistake of either surging ahead and expending large resources to get fully leading technology or eking out and extending the life of technology assets to avoid investment and resource requirements. Neither strategy actually saves money ‘through the cycle’ and both strategies add significant risk for little additional benefit.

For those companies that choose to minimize their infrastructure investments and reduce costs by overextending asset lives, they typically incur greater additional costs through higher maintenance, greater fix resources required, and lower system performance (and staff productivity). Obviously, extending your desktop PC refresh cycle from 2 years to 4 years is workable and reasonable, but extending the cycle much beyond this and you quickly run into:

  • Integration issues – both internal and external compatibility as your clients and partners have newer versions of office tools that are incompatible with yours
  • potentially higher maintenance costs as much hardware has no maintenance cost for the first 2 or 3 years, and increasing costs in subsequent years
  • greater environmentals costs as power and cooling savings from newer generation equipment is not realized
  • longer security patch cycles for older software (though some benefit as it is also more stable)
  • greater complexity and resulting cost within your environment as you must integrate 3 or 4 generations of equipment and software versus 2 or 3 versions
  • longer incident times as the usual first vendor response to an issue is ‘you need to upgrade to the latest version of the software before we can really fix this defect’

And if you press the envelope further and extend infrastructure life to the end of the vendor’s life cycle or beyond, expect significantly higher failure rates, unsupported or expensively support software, and much higher repair costs. In my experience, where multiple times we modernized an overextended infrastructure, we were able to reduce total costs by 20 or 30%, and this included the costs of the modernization. In essence you can run 4 servers from 3 generations ago on 1 current server, and having modern PCs and laptops means far less service issues, fewer service desk calls, far less breakage (people take care of newer stuff) and more productive staff.

For those companies that have surged to the leading edge on infrastructure, they are typically paying a premium for nominal benefit. For the privilege of being first, frontrunners encounter an array of issues including:

  • Experiencing more defects – trying out the latest server or cloud product or engineered appliance means you will find far more defects.
  • Paying a premium – being first with new technology means typically you will pay a premium because it is well before the volumes and competition can kick in to drive better pricing.
  • Integration issues – having the latest software version often means third party utilities or extensions have not yet released their version that will properly work with the latest software
  • Higher security flaws – all the backdoors and gaps have been uncovered yet as there are not enough users. Thus, hackers have a greater opportunity to find ‘zero day’ flaws and exploit them to attack you

Typically, those groups that I have inherited that were on the leading edge, were doing so because they had either an excess of resources or were solely focused on technology product(and not business needs). There was inadequate dialogue with the business to ensure the focus was on business priorities versus technology priorities. Thus, the company was often expending 10 to 30% more for little tangible business benefit other than to be able to state they were ‘leading edge’. In today’s software world, seldom does the latest infrastructure provide compelling business benefit over above that of a well-run modern utility infrastructure. Nearly all of the time the business benefit is derived by compelling services and features enabled by the application software running on the utility. Thus, typically the shops that are tinkering with leading edge hardware or are always on the latest version first are shops that are doing hobbies disconnected from the business imperatives. Only where organizations are operating at massive scale or actually providing infrastructure services as a business does leading edge positioning make business sense.

So, given our objective is to be in the sweet spot riding the industry bow wave, then a good practice to ensure proper consistent pace and connection to the business is a technology plan for each of the major infrastructure components that incorporates the infrastructure engineering lifecycle. A technology plan includes the infrastructure vision and strategy for a component area, defines key services provided in business terms, and maps out an appropriate trajectory and performance for a 2 or 3 year cycle. The technology plan then becomes the roadmap for that particular component and enables management to both plan and track performance against key metrics as well as ensuring evolution of the component with the industry and business needs.

The key components of the technology plan are:

  1. Mission, Vision for that component area
  2. Key requirements/strategy
  3. Services (described in business terms)
  4. Key metrics (definition, explanation)
  5. Current starting point – explanation (SWOT) – as needed by service
  6. Current starting point – Configuration – as needed by service
  7. Target – explanation (of approach) and configuration — also defined by service
  8. Metrics trajectory and target (2 to 3 years)
  9. Gantt chart showing key initiatives, platform refresh or releases, milestones (can be by service)
  10. Configuration snapshots at 6 months (for 2 to 3 years, can be by service)
  11. Investment and resource description
  12. Summary
  13. Appendices
    1. Platform Schedule (2 -3 years as projected)
    2. Platform release schedule (next 1 -2 years, as projected)
    3. Patch cycle (next 6 – 12 months, as planned)

The mission and vision should be derived and cascaded from the overall technology vision and corporate strategy. It should emphasis key tenets of the corporate vision and their implication for the component area. For example if the corporate strategy is to be ‘easy to do business with’ then the network and server components must support a highly reliable, secure and accessible internet interface. Such reliability and security aspirations then have direct implications on component requirements, objectives and plans.

The services portion of the plan should translate the overall component into the key services provided to the business. For example, network would be translated into data services, general voice services, call center services, internet and data connection services, local branch and office connectivity, wireless and mobility connectivity, and core network and data center connectivity. The service area should be described in business terms with key requirements specified. Further, each service area should then be able to describe the key metrics to be used to gauge its performance and effectiveness. The metrics could be quality, cost, performance, usability, productivity or other metrics.

For each service area of a component, the plan is then constructed. If we take the call center service as the example, the current technology configuration and specific services available would define the current starting point. A SWOT analysis should accompany the current configuration explaining both strengths and where the services falls short of business needs. The the target is constructed where both the overall architecture and approach are described as well as the target configuration (high to medium level of definition) is provided (e.g. where will the technology configuration for that area be in 2 or 3 years).

Then, given the target, the key metrics are mapped from their current to their future levels and a trajectory established that will be the goals for the service over time. This is subsequently filled out with a more detailed plan (Gantt chart) that shows the key initiatives and changes that must be implemented to achieve the target. Snapshots, typically at 6 month intervals, of the service configuration are added to demonstrate detailed understanding of how the transformation is accomplished and enable effective planning and migration. Then the investment and resource needs and adjustments are described to accompany the technology plans.

If well done, the technology plan then provides an effective roadmap for the entire technology component team to both understand how what they do delivers to the business, where they need to be, and how they will get there. It can be an enormous assist for productivity and practicality.

I will post some good examples of technology plans in the coming months.

Have you leveraged plans like this previously? If so, did they help? Would love to to hear from you.

All the best, Jim Ditmore

 

The Infrastructure Engineering Lifecycle – How to Build and Sustain a Top Quartile Infrastructure

There are over 5,000 books on applications development methods on Amazon. There are dozens of industry and government standards that map out the methodologies for application development. And for IT operations and IT production processes like problem and change management, IT Service Management and ITIL standards provide excellent guidance and structure. Yet for the infrastructure systems on which the applications rely on fully, there is scarcely a publication which outlines the approaches organizations should use to build and sustain a robust infrastructure. ITIL ventures slightly in this area but really just re-defines a waterfall application project cycle in infrastructure terms. During many years of building, re-building, and sustaining top quartile infrastructure I have developed a life cycle methodology for infrastructure or ‘Infrastructure Engineering Life Cycle’ (IELC).

The importance of infrastructure should not be overlooked in our digital age. Not only have customer expectations increased for services where they expect ‘always on’ web sites and transaction capabilities, but they also require quick response and seamless integration across offerings. Certainly the software is critical to provide the functionality, but none of these services can be reliably and securely provided without a well-built infrastructure underpinning all of the applications. A top quartile infrastructure delivers outstanding reliability (on the order of 99.9% or better availability), zippy performance, excellent unit costs, all with robust security and resiliency.

Often enterprises make the mistake of addressing infrastructure only when things break, and they only fix or invest enough to get things back running instead of re-building correctly a modern plant. It is unfortunate because not only will they likely experience further outages and service impacts but also their full infrastructure costs are likely to be higher for their dated, dysfunctional plant than for an updated, modern plant. Unlike most assets, I have found that a modern, well-designed IT infrastructure is cheaper to run than a poorly maintained plant that has various obsolete or poorly configured elements. Remember that every new generation of equipment can basically do twice as much s the previous so  you have fewer components, less maintenance, less administration, less things that can go wrong. In addition, a modern plant also boosts time to market for application projects and reduces significantly the portion of time spent on fixing things by both infrastructure and application engineers.

So, given the critical nature of well-run technology infrastructure in the world of digitalization, how do enterprises and CIOs build and maintain a modern plant with outstanding fit and finish? It is not just about buying lots of new equipment, or counting on a single vendor or cloud provider to take care of all the integration or services. Nearly all major enterprise have a legacy of systems that result in complexity and complicate the ability to deliver reliable services or keep pace with new capabilities. These complexities can rarely be handled by a systems integrator or single service provider. Further, a complete re-build of the infrastructure often requires major capital investment and and can put the availability even further at risk. The best course is usually then is not to go ‘all-in’ where you launch a complete re-build or hand over the keys to a sole outsourcer, but instead to take a ‘spiral optimization’ approach which addresses fundamentals and burning issues first, and then uses the newly acquired capabilities to advance and address more complex or less pressing remaining issues.

A repeated, closed cycle approach (‘spiral optimization’) is our management approach. This management approach is coupled with an Infrastructure Engineering Lifecycle (IELC) methodology to build top quartile infrastructure. For the first cycle of the infrastructure rebuild, it is important to address the biggest issues. Front and center,  the entire infrastructure team must focus on quality. Poorly designed or built infrastructure becomes a blackhole of engineering time as rework demands grow with each failure or application built upon a teetering platform. And while it must also be understood that a everything cannot be fixed at once, those things that are undertaken, must be done with quality. This includes documenting the systems and getting them correctly into the asset management database. And it includes coming up with a standard design or service offering if none exists. Having 5000 servers must be viewed as a large expense requiring great care and feeding — and the only thing worse is having 5000 custom servers because your IT team did not take the time to define the standard, keep it up to date, and maintain and patch it consistently. 5000 custom servers are a massive expense that likely cannot be effectively and efficiently maintained or secured by any team. There is no cheaper time than the present moment to begin standardizing and fixing the mess by requiring that the next server built or significantly updated be done such that it becomes the new standard. Don’t start this effort though until you have the engineering capacity to do it. A standard design done by lousy engineers is not worth the investment. So, as an IT leader, while you are insisting on quality, ensure you have adequate talent to engineer your new standards. If you do not have it on board, leverage top practitioners in the industry to help your team create the new designs.

In addition to quality and starting to do things right, there are several fundamental practices that must be implemented. Your infrastructure engineering work should be guided by the infrastructure engineering lifecycle – which is a methodology and set of practices that ensure high quality platforms that are effective, efficient, and sustainable.

The IELC covers all phases of infrastructure platforms – from an emerging platform to a standard to declining and obsolete platforms. Importantly, the IELC is comprised of three cycles of activity that recognize that infrastructure requires constant grooming and patching where inputs come typically from external parties, and, all the while, technology advances regularly occur such that over 3 to 10 years nearly every hardware platform becomes obsolete and should and must be replaced. The three cycles of activity are:

  • Platform – This is the foundational lifecycle activity where hardware and utility software is defined, designed and integrated into a platform to perform a particular service. Generally, for medium and large companies, this is  a 3 to 5 year lifecycle. A few examples could be a server platform, storage platform or an email platform.
  • Release – Once a platform is initial designed and implemented, then organizations should expect to refresh the platform on a regular basis to incorporate major underlying product or technology enhancements, address significant design flaws or gaps, and improve operational performance and reliability. Release should be planned for 3 to 12 month intervals over the life of the platform (which is usually 3 to 5 years).
  • Patch – A patch should also be employed where on a regular and routine basis, minor upgrades (both fixes and enhancements) are applied. The patch cycle should synchronize with both the underlying patch cycle of the OEM (Original Equipment Manufacturer) for the product and with the security and production requirements of the organization. Usually, patch cycles are used to incorporate security fixes and significant production defect fixes issued by the OEM. Typical patch cycles can be weekly to every 6 months.

Below is a diagram that represents the three infrastructure engineering life cycles and the general parameters of the cycles.

Infrastructure Engineering Cycles
Infrastructure Engineering Cycles

In subsequent posts, I will further detail key steps and practices within the cycles as well as provide templates that I have found to be effective for infrastructure teams.  As a preview, here is the diagram of the cycles with their activities and attributes.

IELC Preview
IELC Preview

What key practices or techniques have you used for your infrastructure teams to enable them to achieve success? I look forward to you thoughts and comments.

Best, Jim Ditmore

 

The Key Growth Constraints on Your Business

We have seen the accelerating impact of technology on a wide variety of industries in the past decade. We have witnessed the impact of internet shopping on retail industries, and seen the impact of digital content and downloads on the full spectrum of traditional media companies across books and newspapers as well as movies and games. Traditional enterprises are struggling to anticipate and stay abreast of the advances and changes. Even in those industries far away from the digital world, where they seem very physical, it is critical to leverage ‘smart’ technologies to improve output and products.

Let’s take logging and sawmills as an example. Certainly there have been physical advances from improved hydraulic systems to better synthetic ropes, but playing an increasingly prominent role is the use of digital technology to assist operators to drive complex integrated machines to optimize the entire logging and sawing process. The latest  purpose-built forestry machines operate at the roadside or nearby off-road cutting logs from whole trees combining steps and eliminating manual labor. These integrated machines are guided on-board computers and electronic controls. This enables the operator to optimize log products which are machine cut by skillfully delimbing and “bucking” the whole trees into the best log lengths and loading them onto trucks. Subsequently, the logs are take to modern sawmills, where new scanning technologies and computers analyze each log and determine how to optimize the dimensional lumber cut from each log. Not only does this dramatically reduce manual labor and waste, but it improves safety and increases log product value by 20 or 30% from previous methods. And it is not just digital machinery leveraging computers to analyze and cut, but it is also mobile apps with mapping and image analysis so better decisions are made when and where to log in the field. When digitalization is revolutionizing even ‘physical’ industries like logging and sawmills, it is clear that the pace and potential to disrupt industries by applying information technology has increased dramatically. Below is a chart that represents the pace of disruption or ‘gain’ possible by digitalization over the mid-term horizon (7 to 10 years).

Slide1

It is clear that digitalization has dramatically changed the travel and media industries already. Digital disruption has been moving down into other industries as either their components move from physical to digital (e.g., cameras, film) or industry leaders apply digital techniques to take advantage (e.g., Amazon, Ameritrade, Uber). Unfortunately, many companies do not have in place the key components necessary to apply and leverage technology to digitalize in rapid or disruptive ways. The two most important ingredients to successfully digitalize are software development capacity and business process engineering skill. Even for large companies with sizable IT budgets there are typically major constraints on both software development and business process engineering. And ample quantities of both are required for significant and rapid progress in digitalization.

Starting with software development, common constraints on this capability are:

  • a large proportion of legacy systems that consume an oversize portion of resources to maintain them
  • inadequate development toolsets and test environments
  • overworked teams with a focus on schedule delivery
  • problematic architectures that limit digital interfaces and delivery speed
  • software projects that are heavily oriented to incremental product improvement versus disruptive customer-focused efforts

And even if there are adequate resources, there must be a persistent corporate focus on the discipline, productivity and speed needed for breakout efforts.

Perhaps even more lacking are the necessary business process engineering in many companies.  Here the issue is often not capacity or productivity but inadequate skill and improper focus. Most corporate investment agendas are controlled by ‘product’ teams whose primary focus is on incrementally improving their product’s features and capabilities rather than end to end service or process views that truly impact the customer. Further, process engineering skills are not a hallmark of service industry product teams. Most senior product leaders ‘grew up’ in a product focused environment, and unless they have a manufacturing background, usually do not have process improvement experience or knowledge. Typically, product team expertise lies primarily in the current product and its previous generations and not in the end-to-end process supporting the actual product. Too often the focus is on a next quarter product release with incremental features as opposed to fully reworking the customer interface from the customer’s point of view and reworking end-to-end the supporting business process to take full advantage of digitalization and new customer interfaces. There is far too much product tinkering versus customer experience design and business process engineering. Yet, the end-to-end process is actually what drives the digital customer experience versus the product features. Service firms that excel at the customer experience utilize the end-to-end process design from the customer viewpoint while taking full advantage of the digital opportunities. This yields a far better customer experience that is relatively seamless and easy. Further, the design normally incorporates a comprehensive interface approach that empowers each of the customer interaction points with the required knowledge about the customer and their next step. The end result is a compelling digital platform that enables them to win in the market.

As an IT leader certainly you must identify and sponsor the key digitalization projects for your company, but you must also build and develop the two critical capabilities to sustain digitalization. It is paramount that you build a software development factory that leverages modern methods on top of discipline and maturity so you have predictable and high quality software deliverables. And ensure you are building on an architecture that is both flexible and scalable so precious effort is not wasted on arcane internal interfaces or siloed features that must be replicated across your estate.

Work with your business partners to establish continuous process improvement and process engineering as desired and highly valued skills in both IT and the business team. Establish customer experience and user experience design as important competencies for product managers. Then take the most critical processes serving customers and revisit them from an end-to-end process view and a customer view. Use the data and analysis to drive the better holistic process and customer experience decisions, and you will develop far more powerful digital products and services.

Where is your team or your company on the digital journey? Do you have an abundance of software development or business process engineering skills and resources? Please share your perspective and experience in these key areas in our digital age.

Best, Jim Ditmore