The Service Desk in the Age of Digitalization and AI

When I published the original Service Desk posts, it was more than a few years ago and since then we have seen great progress in digitalization. Importantly, technologies including advanced analytics and AI have also been introduced into the business mainstream. While much of the best practices that Steve Wignall, Bob Barnes and myself detailed still hold true, there are important new advances that can and should be leveraged. These new advances coupled with strong implementation of foundational practices can substantially improve the quality and cost of your service desk.

In the era of digitalization, the service desk has actually increased in importance as it is the human touch the remains easiest to reach in time of trouble by your users or customers. These advances in technology though can be used to improve the accessibility of the interface. For example, no longer is the service desk, just a phone interface. Now, especially with external customer desks, the interface includes chat and email. And this communication can also be ‘proactive’ where you reach out to the customer versus ‘reactive’ where you wait for them to call or chat. A proactive chat message being offered to the customer when they are hovering or waiting on an internet interface can be an excellent helping hand to your customers. Allowing them to easily reach out to your service team and obtain information or assistance that enables the to complete their transaction. Commercial results can be extremely beneficial as you reduce ‘dropout rates’ on important transactions. And overall, given such proactive chats are typically seen as unobtrusive and helpful, you can greatly improve the customer experience and satisfaction.

Further, advances in data analytics and artificial intelligence yield new capabilities from voice authentication to interactive, natural voice menus to AI actually answering the customer questions. Below are the details of these techniques and suggestions on how best to leverage these technologies.  Importantly, remember that the service desk is a critical human interface during those service ‘moments of truth’, and must be a reliable and effective channel that works as a last resort. Thus, the technology is not a replacement for the human touch, but an augment to knowledgeable personnel who are empowered to provide the services your customers need.  If your staff are without such skills and authority, a technologically savvy service desk will only compound your customers’ frustration with your services and miss opportunities to win in the digital age.

As a start, and regardless of whether an internal or external service desk, the service desk model should be based on ITIL and I recommend you start with this base of knowledge on service management. With that foundation in mind, below we cover the latest techniques and capabilities you should incorporate into your desk.

Proactive Chat: Proactive chat can greatly lift the performance of your customers’ interaction with your website and not just by reducing abandons on a complex web page.  You can use it for customers lingering over FAQs and help topics or assist customers arriving from other sites with known campaign referrals. Good implementations avoid bothering the customer when they first get on your page or when they are speeding through a transaction. Instead, you approach the customer as a non-intrusive but helpful presence armed with the likely question they are asking themselves. Such accessibility is key to enable the best customer service and avoid ‘dropouts’ on your website or mobile app. Easy to access chat from your web site or mobile app can substantially improve customer completion rates on services. Improved completion rates (and thus increased revenues) alone can often justify the cost of excellent chat support. There are several vendors offering good products to integrate with your website and service desk and you can find a good quick reference on proactive chat practices here.  And the rewards can be significant with 30 to 50% reduced abandons and much higher customer satisfaction.

Voice Authentication and Natural Language: Another technology that has advanced substantially in the past few years is voice authentication. Of course, biometrics in general has advanced broadly from fingerprint to face recognition now being deployed in everyday devices — to varying degrees of effectiveness. Voice authentication is one of the more mature biometric areas and has been adopted by many institutions to authenticate users when they call in. Voice authentication can be done either in active (e.g. using a set passphrase) or in passive mode (the user speaks naturally to the call center representative and after a period of time is either authenticated or rejected). Some large financial services companies (e.g., Barclays) have been deployed this for 2 years or more, with very high customer satisfaction results and reductions in impersonation or fraud. I recommend a passive implementation as it seems less likely to be ‘cracked’ (there is no set passphrase to record or impersonate with) and it results in a more natural human conversation. Importantly, it reduces the often lengthy time spent authenticating a customer and the representative does not ask the sometimes inane security questions of a customer which only further annoys them. Voice authentication along with traditional ANI schemes (where you use the originating number to identify the customer and their most recent information, requests or transactions are provided to the service agent) enables more certain authentication as well as the ability to immediately launch into the issue or service the customer is trying to achieve.

In addition, there is a growing use of using spoken or even ‘natural language’ to replace traditional IVR menus using touchtones (e.g. instead ‘Push 1 for Billing’, ‘tell us the nature of your call – is it related to billing, or to your order delivery, or another topic?’). Unfortunately, these can often result in a IVR maze (or even ‘hell’) for some customers when they use an usual phrase or their words are not recognized. And given there is no easy way out (e.g. ‘push 0 for an agent’), you end up frustrating your customers even more. I would be very cautious on implementing such systems as they rarely contribute to the customer experience or to efficiency.

Improved analytics and AI:  Analytics is an area that has advanced dramatically over the past 2 years. The ability to combine both your structured transaction data with additional big data from web logs to social media information means you can know much more about your customers when they call in. As advantageous as this can be, ensure first you have a solid customer profile in place that allows your agents to know all of the basics about your customer. Next layer in all recent activity in other channels – web, mobile, chat. Then supplement with suggestions such as next best product or service recommendations or upgrades based on customer characteristics or similar customer actions. You can substantially increase customer confidence by showing ‘Customers like you …. ‘.  Of course, you must leverage such data in accordance with regulatory requirements (e.g. GDPR) and in a transparent way that gives the customer the confidence that you are protecting their data and using it to provide better solutions and service for them. This is paramount, because if you lose the customer trust with their data, or appear ‘creepy’ with your knowledge, then you are ruining the customer experience you wish to provide.

Only after you have a robust customer data foundation and can demonstrate improved customer services utilizing analytics should you consider exploring AI bots. Without the customer information, AI bots are actually just ‘dumb’ bots that will likely annoy your customer.  And the recent pilots I have seen of AI capabilities have only handled the easiest questions after a huge amount of work to implement and train. Of course, I would expect this technology to improve rapidly in the coming years and their commercial proposition to become better.

Agent/Customer Matching : One other method to optimize service is through agent/customer matching where either with an automated tool or through active customer selection agents are matched to customers. The matching can occur based on emotional, experience, or other dimensions. The result is a better experience for the customer and likely a better connection with your company.

Service optimization and demand reduction: While certainly a fundamental capability, service optimization (where you use to data from the calls to proactively adjust your services and interfaces to eliminate the need for the call in the first place) becomes even more powerful when you combine it with additional data from all of your channels and the customer. You can identify root causes for calls and eliminate them better than ever. Using Pareto analysis, you can look into your most frequent calls and understand what are the defects, product problems, process gaps, or web page issues that your customers or internal users are experiencing — especially when bounced up against web logs that show how the customer navigates (or is unable to) your pages. The service desk team should then run a crisp process with management sponsorship to ensure the original issues are corrected. This can reduce your incident or problem calls by 20, 30 or even 40%. Not only do you get the cost reduction from the reduced calls, but more importantly, you greatly reduce the problems and annoyances your customers or staff experience. You optimize the services you provide and ensure a smoother customer experience through the ongoing execution of such a feedback loop. We have used to great effect within the Danske Bank IT service desk in the past two years enabling us to offer far better service at lower cost. Attached is a diagram representing the process: Demand Reduction.   Of course, credit goes fully to the team (Dan, Ona, and many others) for such successful development and execution of the practice.

So, that is our quick survey of new technologies to support the service desk in the digital age. As I noted at the beginning, you should make sure you have a solid foundation in your service centers before moving to the advanced technology. There’s no substitute for doing the basics right, and the business return on investments in the latest technologies will often be minimal until they are in place. For a quick reference on all of the foundational practices please see the service desk summary page, and  make sure you review the material on the key ingredient: service desk leadership.

Best, Jim Ditmore

Infrastructure Engineering – Leveraging a Technology Plan

Our recent post discussed using the Infrastructure Engineering Lifecycle (IELC) to enable organizations to build a modern, efficient and robust technology infrastructure. One of the key expressions that both leverages and IELC approach and helps an infrastructure team properly plan and navigate the cycles is the Technology Plan. Normally, the technology plan is constructed for each major infrastructure ‘component’ (e.g. network, servers, client environment, etc). A well-constructed technology plan creates both the pull – outlining how the platform will meet the key business requirements and technology objectives and the push – reinforcing proper upkeep and use of the IELC practice.

Digitalization continues to sweep almost every industry, and the ability of firms to actually deliver the digital interfaces and services requires a robust, modern and efficient infrastructure. To deliver an optimal technology infrastructure, one must utilize an ‘evergreen’ approach and maintain an appropriate technology pace matching the industry. Similar to a dolphin riding the bow wave of a ship, a company can optimize both the feature and capability of its infrastructure and minimize its cost and risk by staying consistently just off the leading pace of the industry. Often companies make the mistake of either surging ahead and expending large resources to get fully leading technology or eking out and extending the life of technology assets to avoid investment and resource requirements. Neither strategy actually saves money ‘through the cycle’ and both strategies add significant risk for little additional benefit.

For those companies that choose to minimize their infrastructure investments and reduce costs by overextending asset lives, they typically incur greater additional costs through higher maintenance, greater fix resources required, and lower system performance (and staff productivity). Obviously, extending your desktop PC refresh cycle from 2 years to 4 years is workable and reasonable, but extending the cycle much beyond this and you quickly run into:

  • Integration issues – both internal and external compatibility as your clients and partners have newer versions of office tools that are incompatible with yours
  • potentially higher maintenance costs as much hardware has no maintenance cost for the first 2 or 3 years, and increasing costs in subsequent years
  • greater environmentals costs as power and cooling savings from newer generation equipment is not realized
  • longer security patch cycles for older software (though some benefit as it is also more stable)
  • greater complexity and resulting cost within your environment as you must integrate 3 or 4 generations of equipment and software versus 2 or 3 versions
  • longer incident times as the usual first vendor response to an issue is ‘you need to upgrade to the latest version of the software before we can really fix this defect’

And if you press the envelope further and extend infrastructure life to the end of the vendor’s life cycle or beyond, expect significantly higher failure rates, unsupported or expensively support software, and much higher repair costs. In my experience, where multiple times we modernized an overextended infrastructure, we were able to reduce total costs by 20 or 30%, and this included the costs of the modernization. In essence you can run 4 servers from 3 generations ago on 1 current server, and having modern PCs and laptops means far less service issues, fewer service desk calls, far less breakage (people take care of newer stuff) and more productive staff.

For those companies that have surged to the leading edge on infrastructure, they are typically paying a premium for nominal benefit. For the privilege of being first, frontrunners encounter an array of issues including:

  • Experiencing more defects – trying out the latest server or cloud product or engineered appliance means you will find far more defects.
  • Paying a premium – being first with new technology means typically you will pay a premium because it is well before the volumes and competition can kick in to drive better pricing.
  • Integration issues – having the latest software version often means third party utilities or extensions have not yet released their version that will properly work with the latest software
  • Higher security flaws – all the backdoors and gaps have been uncovered yet as there are not enough users. Thus, hackers have a greater opportunity to find ‘zero day’ flaws and exploit them to attack you

Typically, those groups that I have inherited that were on the leading edge, were doing so because they had either an excess of resources or were solely focused on technology product(and not business needs). There was inadequate dialogue with the business to ensure the focus was on business priorities versus technology priorities. Thus, the company was often expending 10 to 30% more for little tangible business benefit other than to be able to state they were ‘leading edge’. In today’s software world, seldom does the latest infrastructure provide compelling business benefit over above that of a well-run modern utility infrastructure. Nearly all of the time the business benefit is derived by compelling services and features enabled by the application software running on the utility. Thus, typically the shops that are tinkering with leading edge hardware or are always on the latest version first are shops that are doing hobbies disconnected from the business imperatives. Only where organizations are operating at massive scale or actually providing infrastructure services as a business does leading edge positioning make business sense.

So, given our objective is to be in the sweet spot riding the industry bow wave, then a good practice to ensure proper consistent pace and connection to the business is a technology plan for each of the major infrastructure components that incorporates the infrastructure engineering lifecycle. A technology plan includes the infrastructure vision and strategy for a component area, defines key services provided in business terms, and maps out an appropriate trajectory and performance for a 2 or 3 year cycle. The technology plan then becomes the roadmap for that particular component and enables management to both plan and track performance against key metrics as well as ensuring evolution of the component with the industry and business needs.

The key components of the technology plan are:

  1. Mission, Vision for that component area
  2. Key requirements/strategy
  3. Services (described in business terms)
  4. Key metrics (definition, explanation)
  5. Current starting point – explanation (SWOT) – as needed by service
  6. Current starting point – Configuration – as needed by service
  7. Target – explanation (of approach) and configuration — also defined by service
  8. Metrics trajectory and target (2 to 3 years)
  9. Gantt chart showing key initiatives, platform refresh or releases, milestones (can be by service)
  10. Configuration snapshots at 6 months (for 2 to 3 years, can be by service)
  11. Investment and resource description
  12. Summary
  13. Appendices
    1. Platform Schedule (2 -3 years as projected)
    2. Platform release schedule (next 1 -2 years, as projected)
    3. Patch cycle (next 6 – 12 months, as planned)

The mission and vision should be derived and cascaded from the overall technology vision and corporate strategy. It should emphasis key tenets of the corporate vision and their implication for the component area. For example if the corporate strategy is to be ‘easy to do business with’ then the network and server components must support a highly reliable, secure and accessible internet interface. Such reliability and security aspirations then have direct implications on component requirements, objectives and plans.

The services portion of the plan should translate the overall component into the key services provided to the business. For example, network would be translated into data services, general voice services, call center services, internet and data connection services, local branch and office connectivity, wireless and mobility connectivity, and core network and data center connectivity. The service area should be described in business terms with key requirements specified. Further, each service area should then be able to describe the key metrics to be used to gauge its performance and effectiveness. The metrics could be quality, cost, performance, usability, productivity or other metrics.

For each service area of a component, the plan is then constructed. If we take the call center service as the example, the current technology configuration and specific services available would define the current starting point. A SWOT analysis should accompany the current configuration explaining both strengths and where the services falls short of business needs. The the target is constructed where both the overall architecture and approach are described as well as the target configuration (high to medium level of definition) is provided (e.g. where will the technology configuration for that area be in 2 or 3 years).

Then, given the target, the key metrics are mapped from their current to their future levels and a trajectory established that will be the goals for the service over time. This is subsequently filled out with a more detailed plan (Gantt chart) that shows the key initiatives and changes that must be implemented to achieve the target. Snapshots, typically at 6 month intervals, of the service configuration are added to demonstrate detailed understanding of how the transformation is accomplished and enable effective planning and migration. Then the investment and resource needs and adjustments are described to accompany the technology plans.

If well done, the technology plan then provides an effective roadmap for the entire technology component team to both understand how what they do delivers to the business, where they need to be, and how they will get there. It can be an enormous assist for productivity and practicality.

I will post some good examples of technology plans in the coming months.

Have you leveraged plans like this previously? If so, did they help? Would love to to hear from you.

All the best, Jim Ditmore

 

Improving Vendor Performance

As we discussed in our previous post on the inefficient technology marketplace, the typical IT shop spends 60% or more of its budget on external vendors – buying hardware, software, and services. Often, once the contract has been negotiated, signed, and initial deliveries commence, attentions drift elsewhere. There are, of course, plenty of other fires to put out. But maintaining an ongoing, fact-based focus on your key vendors can result in significant service improvement and corresponding value to your firm. This ongoing fact-based focus is proper vendor management.

Proper vendor management is the right complement to a robust, competitive technology acquisition process. For most IT shops, your top 20 or 30 vendors account for about 80% of your spend. And once you have achieved outstanding pricing and terms through a robust procurement process, you should ensure you have effective vendor management practices in place that result in sustained strong performance and value by your vendors.

Perhaps the best vendor management programs are those run by manufacturing firms. Firms such as GE, Ford, and Honda have large dedicated supplier teams that work closely with their suppliers on a continual basis on all aspects of service delivery. Not only do the supplier teams routinely review delivery timing,  quality, and price, but they also work closely with their suppliers to help them improve their processes and capabilities as well as identify issues within their own firm that impact supplier price, quality and delivery. The work is data-driven and leverages heavily process improvement methodologies like LEAN. For the average IT shop in services or retail, a full blown manufacturing program may be overkill, but by implementing a modest but effective vendor management program you can spur 5 to 15% improvements in performance and value which accumulate to considerable benefits over time.

The first step to implementing a vendor management program is to segment your vendor portfolio. You should focus on your most important suppliers (by spend or critical service). Focus on the top 10 to 30 suppliers and segment them into the appropriate categories. It is important to group like vendors together (e.g, telecommunications suppliers or server suppliers). Then, if not already in place, assign executive sponsors from your company’s management team to each vendor. They will be the key contact for the vendor (not the sole contact but instead the escalation and coordination point for all spend with this vendor) and will pair up with the procurement team’s category lead to ensure appropriate and optimal spend and performance for this vendor. Ensure both sides (your management and the vendor know the expectations for suppliers (and what they should expect of your firm). Now you are ready to implement a vendor management program for each of these vendors.

So what are the key elements of an effective vendor management program? First and foremost, there should be three levels of vendor management:

  • regular operational service management meetings
  • quarterly technical management sessions, and
  • executive sessions every six or twelve months.

The regular operational service management meetings – which occur at the line management level – ensure that regular service or product deliveries are occurring smoothly, issues are noted, and teams conduct joint working discussions and efforts to improve performance. At the quarterly management sessions, performance against contractual SLAs is reviewed as well as progress against outstanding and jointly agreed actions. The actions should address issues that are noted at the operational level to improve performance. At the nest level, the executive sessions will include a comprehensive performance review for the past 6 or 12 months as well as a survey completed by and for each firm.  (The survey data to be collected will vary of course by the product or service being delivered.) Generally, you should measure along the following categories:

  • product or service delivery (on time, on quality)
  • service performance (on quality, identified issues)
  • support (time to resolve issues, effectiveness of support)
  • billing (accuracy, clarity of invoice, etc)
  • contractual (flexibility, rating of terms and conditions, ease of updates, extensions or modifications)
  • risk (access management, proper handling of data, etc)
  • partnership (willingness to identify and resolve issues, willingness to go above and beyond, how well the vendor understand your business and your goals)
  • innovation (track record of bringing ideas and opportunities for cost improvement or new revenues or product features )

Some of the data (e.g. service performance) will be  summarized from operational data collected weekly or monthly as part of the ongoing operational service management activities. The operational data is supplemented by additional data and assessments captured from participants and stakeholders from both firms. It is important that the data collected be as objective as possible – so ratings that are high or low should be backed up with specific examples or issues. The data is then collated and filtered for presentation to a joint session of senior management representing their firms. The focus of the executive session is straightforward: to review how both teams are performing and to identify the actions that can enable the relationship to be more successful for both parties. The usual effect of a well-prepared assessment with data-driven findings is strong commitment and a re-doubling of effort to ensure improved performance.

Vendors rarely get clear, objective feedback from customers, and if your firm provides such valuable information, you will often be the first to reap the rewards. And by investing your time and effort into a constructive report, you will often gain an executive partner at your vendor willing to go the extra mile for your firm when needed. Lastly, the open dialogue will also identify areas and issues within your team and processes, such as poor specifications or cumbersome ordering processes that can easily be improved and yield efficiencies for both sides.

It is also worthwhile to use this supplier scorecard to rate the vendor against other similar suppliers. For example, you can show there total score in all categories against other vendors in an an anonymized fashion (e.g., Vendor A, Vendor B, etc) where they can see their score but can also see other vendors doing better and worse. Such a position often brings out the competitive nature of any group, also resulting in improved performance in the future.

Given the investment of time and energy by your team, the vendor management program should be focused on your top suppliers. Generally, this is the top 10 to 30 vendors depending on your IT spend. The next tier of vendors (31 through 50 or 75) should get an annual or biannual review and risk assessment but not the regular operational meetings or assessments and management assessment unless the performance is below par. Remediation of such a vendor’s performance can often be turned around by applying such a program.

Another valuable practice, once your program is established and is yielding constructive results, is to establish a vendor awards program. With the objective and thoughtful perspective of your vendors, you can then establish awards for your top vendors – vendor of the year, vendor partner of the year, most improved vendor, most innovative, etc. Perhaps invite the senior management of the vendor’s receiving awards to attends and awards dinner, along with your firm’s senior management to give the awards, will further spur both those who win the awards as well as those who don’t. Those who win will pay attention to your every request, those who don’t will have their senior management focused on winning the award for next year. The end result, from the weekly operational meetings, to the regular management sessions, and the annual gala, is that vendor management positively impacts your significant vendor relationships and enables you to drive greater value from your spend.

Of course, the vendor management process outlined here is a subset of the procurement lifecycle applied to technology. It complements the technology acquisition process and enables you to repairs or improve and sustain vendor performance and quality levels for a significant and valuable gain for your company.

It would be great to hear from your experience with leveraging vendor management.

Best, Jim Ditmore

 

IT Service Desk: Turning around a ‘helpless’ desk

In one of our earliest posts on service desks, I mentioned how an inherited service desk had delivered such poor service that it was referred to by users as the ‘Helpless Desk’ rather than the Help Desk. With that in mind, for those IT leaders who have a poor service situation on your hands with your most important customer interface, this post outlines how to stabilize and then turnaround your service desk. For those new to this site, there is a service desk reference page and also posts to understand service desk elements and best practices.
Service Desks can underperform for a number of reasons, but ongoing poor performance is generally due to a confluence of causes. Typically, underlying issues thrust service desks into poor performance when combined with major changes to the supply (or agent service) or the demand (the calls and requests coming into the desk).  It is important to recognize that service desks are in essence call centres. Call Centre performance is driven by the supply and demand, with an effective service at an efficient cost representing equilibrium – the point at which the competing forces of supply and demand are in optimized with each other. A supply side or demand side ‘shock’ can move the state of equilibrium to a point outside of management control and if there are other fundamental issues, it will result in sustained underperformance by the Service Desk.

There is a ‘tipping point’ within Call Centre mechanics which means that the rate of deterioration will become exponential  – i.e., the gentle gradient of deterioration does not last long before service falls over the cliff edge (i.e. wait times in seconds, quickly become minutes and then tens of minutes – even hours). Calls are abandoned by customers, with call backs adding further volume. Agents become overworked and stressed due to the tone of the calls, their efficiency reduces and attrition goes up, exacerbating any supply shortage. These dynamics also work in reverse and so what can seem to be an insurmountable problem can in fact be rapidly returned to stability if managed appropriately.

Common supply side issues include:

  • Organisations increasingly use headcount caps and quotas to control their cost base. As the quota filters through the organisation, there can be a tendency to retain ‘higher end’ roles, which means the Service Desk comes under particular scrutiny and challenge. A reduction in the supply of labour (without equivalent changes in demand) can very quickly lead to significant service deterioration.
  • Similarly, Service Desk tends to be a function in which organisations have an uplifted appetite to make organisational changes to outsource and offshore (and similarly insource and onshore as the cycle runs). The wholesale replacement of the Service Desk workforce is a fairly common scenario within the industry and is frequently the root cause of acute service issues in the run up to change (as attrition without replacement bites) and during and post change (as a new workforce struggles to support their new customer base).
  • Any issue / initiative that either reduces the availability of agents to handle live calls or leads to a significant increase in the transaction time to do so can very quickly have a catastrophic impact on service. For example; the implementation of a new Service Management toolset is likely to elongate call duration in the short to medium term, a call centre with a high attrition rate will constantly lose agents just as they start to perform – to be replaced by a trainee performing at a sub optimal level and a call centre operating at too high an occupancy level will quickly burn out staff and have an increasing level of absenteeism.

Demand side issues commonly include;

  • Growth of the user base, generating an uplifted volume of contacts to the Service Desk.
  • An increase in contacts per supported user, driven by increasing IT usage or deterioration of IT performance (this is frequently driven by Business or IT change activity delivered by Projects and Programmes – such as the deployment of a new application to the workforce).

Irrespective of the root cause of the failure, service remediation needs to be a concerted effort combining strong analysis with disciplined planning and focused execution. Identifying that there is an issue and responding appropriately in a timely manner should happen automatically if you are already operating with maturity from fact based metrics that have a healthy mix of lead and lag indicators. If the organisation is less mature in its use of metrics (and particularly lead indicators) then the ‘crisis’ is not likely to be noticed (or at least taken seriously by senior leadership) until after the Service Desk hits the tipping point and service is deteriorating at an alarming pace, generating severe customer dissatisfaction (i.e. until it is too late).

Remediating a failing Service Desk requires multiple and varied actions dependent upon the root cause of the issues. The approach to identifying and rectifying those root causes can be managed effectively by following a logical framework.

Step 1 – Stabilize

If service has tumbled over the tipping point and is deteriorating rapidly, there is going to be little sponsorship for an extended analysis and planning exercise. Results – or at least avoiding further deterioration of performance – will be expected immediately. Your first priority is to create the space to put together a meaningful recovery plan.

Do everything that you can do to boost the supply side in the short term (overtime / shift amendments / cessation of any non-time critical, non-customer contact work by your agents, boost the number of agents by diverting resources to customer service roles from other positions etc, bring in temporary resources, etc). This will not fix the issue and is not a sustainable containment strategy; it will however create the window of opportunity you require and give a much needed boost to stakeholder confidence that the ‘freefall’ may be over. By itself, it will reduce the cycle of abandons and call backs that create additional work for the team.

Similar attention should be paid to any demand side actions that can be deployed quickly, it is less likely however that you can act immediately on the demand side, but there are steps that can be taken quickly. If there are recent system or application rollouts that are generating questions and calls, it may be worthwhile to send out a FAQ or Quick Tips to help employees understand what to do without calling the service desk. Or any self help initiatives already in the pipeline could be accelerated to remove some calls. While these actions are more likely to form elements of your strategic recovery plan, the may provide some level of relief.

Step 2 – Undertake the Analysis

Your leadership group and analysts need to undertake the analysis to understand why service has deteriorated. What has gone wrong, when, where and why? If your desk has been performing well (or even adequately) for some time, remember that a recent ‘change’ in either the demand or supply side is likely to be the root cause.

If the desk has been underperforming for a significant period, there are likely to be more systemic causes of the failure and so a full strategic review of your operations is required. Reading the full set of Service Desk Best Practices published within Recipes for IT will provide guidance on the areas of focus required.

After understanding your call volumes and their trends (everything from call time to why customers are calling) you should be able to identify some of the root causes. Are there new issues that are now in the top 10 call reasons? Are your call times elongated? Have call volumes or peaks increased? For each shift in metrics, analyze for the following:

  • determine if the root cause for a customer call is due to:
      • system issues or changes
      • user training or understanding
      • lack of viable self-service channel
  • identify if increases in calls are due to:
      • underlying user volume increases or growth
      • new user technologies or systems
      • major rollouts or releases that are problematic
  • or if service is suffering due to:
      • lack of resources or mismatched resource levels and call volumes
      • inadequate training of service desk personnel
      • new or ineffective toolsets that elongate service time
      • inefficient procedures or poor engagement
      • high attrition or loss of personnel

If you do not have adequate metrics to support the analysis, then you will need to establish basic metrics collection as the first, fundamental step.

Step 3 – Construct the Recovery Plan

Constructing the recovery plan needs to be genuinely solution-oriented and outcome focused. The objective of the plan is not usually to resolve the source of the ‘shock’ to return us back to the old equilibrium (e.g. we aren’t likely to want to back out the new Service Management toolset that we have just implemented – we will want to build forward). The objective of the plan is to detail the actions required to resolve the issues identified as well as build a solid foundation to allow us to move back to a steady state operation, delivering with quality and consistency to our SLA.

A good recovery plan will be clear about what actions are to be undertaken, by who, when, to achieve which specific deliverable and with specific measures and metrics tracking progress to achievement of the overall outcome.

The plan needs to focus on prioritising actions that can make a positive impact of scale and of pace commensurate to the scale of the service issues being experienced. Many and multiple actions on a service recovery plan creates a false sense of comfort for those involved in the crisis and will almost certainly hinder genuine service improvement. Targeted action is required and this needs discipline and skill from the plan owner to ensure that benefits will be realised, will be relevant to the problem statement and that our actions in aggregate will move bottom line performance to where we need it to be.

We recommend a recovery plan that has the following elements:

a. Maintain an interim staffing boost to stabilize the service desk until other work is completed

b. If clear problem causes are identified (poorly rolled out systems, ongoing defective systems causing high volumes of calls) then ensure these areas are high priority for fixes on the next release or maintenance cycle.

c. Match resources to demand cycles based on current volumes and call handle times. Then forecast proper resource levels based on improvement initiatives and their completion dates.

d. If self service can address a significant volume of calls, these should also be a top priority for investment as this solution is also usually an overall cost save as well as service experience improvement (e.g. password resets).

e. Ensure your service desk staff can efficiently handle calls — proper training, tool adjustments, thoughtful goals, incentives and a productive environment.

f. Address staff recruiting as well as development, incentives, and training and career progression to ensure you will have an engaged and well-trained staff to deliver exceptional service

g. Review your IVR and call centre technology setup and look to optimize menus, self-service, and call back options. Specialize resources into pools as appropriate to improve efficiency.

h. Define strategic service goals and SLAs along with initiatives to achieve them (e.g., additional or different sites, knowledge management tools, revamp of problem systems, etc).

Step 4 – Execute the Recovery Plan

Ensure that the plan is owned by an individual with the gravitas, influence, experience and focus to manage it through with real pace and drive. Ideally, the individual should not own actions within the plan itself (as this undermines their ability to hold everyone fully to account and removes their impartiality when inevitable conflicting priorities arise).

The plan can (and should) be meaningfully tuned as you progress with delivery. It should not however be a constant exercise in planning and re-planning and particular focus needs to be applied to ensure that commitments (delivery to time, cost and quality) are met by all action owners.

Communicate the issue, the plan, progress & improvement achieved to date and upcoming actions from your recovery plan to stakeholders. Ensure that stakeholder management is a specific activity within your plan and that you pay particular attention to each stakeholder group as a constituency. The role of senior leaders in recovery situations should be to protect the operations team to enable it to focus on delivery through the management of senior clients and customers and to ensure that the required resources to remediate the issues are provided.

Step 5 – Take a Look Back

Once service has been remediated and stabilised there are a number of residual actions to undertake.

  • As additional resources were utilised in the recovery effort (holidays restricted, time off in lieu accumulated, overtime paid etc…) there may well be negative service and / or financial implications of those decisions. It is important to quickly understand any such impacts and to manage them appropriately (e.g., review the holiday allocation process to ensure accumulated holidays can still be scheduled without a bottleneck, determine whether to grant time off in lieu for extra hours worked or to pay overtime, ensure that departments and functions who have been loaning staff to the front line receive support and resources to now clear their backlogs quickly etc.).
  • Review the control processes and responsiveness of your Service Desk in the identification of the issue / issues and how this could be improved upon in the future (in particular the use of lead and lag performance metrics). The ‘root causes’ identified should be eliminated or carefully tracked to ensure that future occurrences can be identified and dealt with before they manifest as service impact to your customers.
  • Ensure that the findings of your root cause analysis are communicated to and understood by your stakeholders. Be honest, be clear and be candid about what has happened, why and the measures that are now in place to prevent / mitigate any future such occurrences.
  • Say Thank You as the milestones are completed. A number of people will have participated in the recovery effort, some very explicitly and others in a very low key manner (for example by absorbing extra workload from colleagues seconded to the front line). Recognising their contribution and taking the time to say Thank You will ensure that your team feel rewarded for their efforts and motivated to stand shoulder to shoulder in tackling future adverse events that impact customer service.

And with these efforts, you will have turned the ‘helpless desk’ into a positive key interface for your customers.

Best, Steve Wignall and Jim Ditmore


Using Performance Metric Trajectories to Achieve 1st Quartile Performance

I hope you enjoyed the Easter weekend. I have teamed up today with Chris Collins, a senior IT Finance manager and former colleague. Our final post on metrics is on unit costing — on which Chris has been invaluable with his expertise. For those just joining our discussion on IT metrics, we have had 6 previous posts on various aspects of metrics. I recommend reading the Metrics Roundup and A Scientific Approach to Metrics to catch you up in our discussion.

As I outlined previously, unit costing is one of the critical performance metrics (as opposed to operational or verification metrics) that a mature IT shop should leverage particularly for its utility functions like infrastructure (please see the Hybrid model for more information on IT utilities). With proper leverage, you can use unit cost and the other performance metrics to map a trajectory that will enable your teams to drive to world-class performance as well as provide greater transparency to your users.

For those just starting the metrics journey, realize that in order to develop reliable sustainable unit cost metrics, significant foundational work must be done first including:

  • IT service definition should be completed and in place for those areas to be unit costed
  • an accurate and ongoing asset inventory must be in place
  • a clean and understandable set of financials must be available organized by account so that the business service cost can be easily derived

 If you have these foundation elements in place then you can quickly derive the unit costing for your function. I recommend partnering with your Finance team to accomplish unit costing. And this should be an effort that you and your infrastructure function leaders champion. You should look to apply a unit cost approach to the 20 to 30 functions within the utility space (from storage to mainframes to security to middleware, etc). It usually works best to start with one or two of the most mature component functions and develop the practices and templates. For the IT finance team, they should progress the effort as follows:

  • Ensure they can easily segregate cost based on service listing for that function
  • Refine and segregate costs further if needed (e.g., are there tiers of services that should be created because of substantial cost differences?)
  • Identify a volume driver to use as the basis of the unit cost (for example, for storage it could be terabytes of allocated storage)
  • Parallel to the service identification/cost segregation work, begin development of unit cost database that allows you to easily manipulate and report on unit cost.  Specifically, the database should contain:
    • Ability to accept RC and account level assignments
    • Ability to capture expense/plan from the general ledger
    • Ability to capture monthly volume feeds from source systems including detail volume data (like user name for an email account or application name tied to a server)

For the function team, they should support the IT Finance team in ensuring the costs are properly segregated into the services they have defined. Reasonable precision of the cost segregation is required since later analysis will be for naught if the segregations are inaccurate. Once the initial unit costs are reported, the function technology can now begin their analysis and work. First and foremost should be an industry benchmark exercise. This will enable you to understand quickly how your performance ranks against competitors and similar firms. Please reference the Leveraging Benchmarkspage for best practices in this step. In addition to this step, you should further leverage performance metrics like unit cost to develop a projected trajectory for for your function’s performance. For example, if your unit cost for storage is currently $4,100/TB for tier 1 storage, then the storage team should map out what their unit cost will be 12, 24, and even 36 months out given their current plans, initiatives and storage demand. And if your target is for them to achieve top quartile cost, or cost median, then they can now understand if their actions and efforts will enable them to deliver to that future target. And if they will not achieve it, they can add measures to address their gaps.

Further, you can now measure and hold them accountable on a regular basis to achieve the proper progress towards their projected target. This can be done not just for unit cost but for all of your critical performance measures (e.g., productivity, time to market, etc).  Setting goals and performance targets in this manner will achieve far better results because a clear mechanism for understanding cause and effect between their work and initiatives and the target metrics has been established.

A broad approach to also potentially utilize is to establish a unit cost progress chart for all of your utility functions. On this chart, where the y axis is cost as a percentage of current cost and the x axis is future years, you should establish a minimum improvement line of 5% per year. The rationale behind this is that improving hardware (e.g., servers, storage, etc) and improving productivity, yield an improving unit cost tide of at least 5% a year. Thus, to truly progress and improve, your utility functions should well exceed a 5% per year improvement if they are below 1st quartile. This approach also conveys the necessity and urgency of not sitting on our laurels in the technology space. Often, with this set of performance metrics practices employed along with CPI and other best practices, you can then achieve 1st quartile performance within 18 to 24 months for your utility function.

What has been your experience with unit cost or other performance measures? Where you able to achieve sustained advantage with these metrics?

Best,

Jim Ditmore and Chris Collins

 

Tying Consumption to Cost: Allocation Best Practices

In 1968, Garrett Hardin wrote about the over-exploitation of common resources in an essay titled the “The Tragedy of the Commons“. While Garrett wrote about the overexploitation of common pastureland where individual herders overused and diminished common pasture, there can be a very similar effect with IT resources within a large corporation. If there is no cost associated with the usage of IT resources by different business unit, than each unit will utilize the the IT resources to maximize its potential benefit to the detriment of the corporate as a whole. Thus, to ensure effective use of the IT resources there must be some association of cost or allocation between the internal demand and consumption by each business unit. A best practice allocation approach enables business transparency of IT cost and business drivers of IT usage so that thoughtful business decisions for the company as a whole can be made with the minimum of allocation overhead and effort.

A well-designed allocations framework will ensure this effective association as well as:

  • provide transparency to IT costs and the particular business unit costs and profitability,
  • avoid wasteful demand and alter overconsumption behaviors
  • minimize pet projects and technology ‘hobbies’

To implement an effective allocations framework there are several foundation steps. First, you must ensure you have the corporate and business unit CFOs’ support and the finance team resources to implement and run the allocations process. Generally, CFOs look for greater clarity on what drives costs within the corporation.  Allocations allow significant clarity on IT costs which are usually a good-sized chunk of the corporation’s costs.  CFOs are usually highly supportive of a well-thought out allocations approach. So, first garner CFO support along with adequate finance resources.

Second, you must have a reasonably well-defined set of services and an adequately accurate IT asset inventory. If these are not in place, you must first set about defining your services (e.g. and end user laptop service that includes laptop, OS, productivity software, and remote access or a storage service of high performance Tier 1 storage by Terabyte) and ensuring your inventory of IT assets is minimally accurate (70 to 80 %). If there are some gaps, they can be addressed by leveraging a trial allocation period where numbers and assets are published, no monies are actually charged, but every business unit reviews its allocated assets with IT and ensures it is correctly aligned. Once you have the service defined and the assets inventoried, your finance team must then set about to identify which costs are associated with which services. They should work closely with your management team to identify a ‘cost pool’ for each service or asset component. Again, these costs pools should be at least reasonably accurate but do not need to be perfect to begin a successful allocation process.

The IT services defined should be as readily understandable as possible. The descriptions and missions should not be esoteric except where absolutely necessary. They should be easily associated with business drivers and volumes (such as number of employees, or branches, etc) wherever possible.  In essence, all major categories of IT expenditure should have an associated service or set of services and the services should be granular enough so that each service or component can be easily understood and each one’s drivers should be easily distinguished and identified. The targets should should be somewhere between 50 and 150 services for the typical large corporation.  More services than 150 will likely lead to more effort being spent on very small services and result in too much overhead. Significantly, less than 50 services could result in clumping of services that are hard to distinguish or enable control. Remember the goal is to provide adequate allocations data at the minimum effort for effectiveness.

The allocations framework must have an overall IT owner and a senior Finance sponsor (preferably the CFO). CFOs want to implement systems that encourage effective corporate use of resources so they are a natural advocate for a sensible allocation framework. There should also be a council to oversee the allocation effort and provide feedback and direction where majors users and the CFO or designate are on the council. This will ensure both adequate feedback as well as buy-in and support for successful implementation and appropriate methodology revisions as the program grows. As the allocations process and systems mature, ensure that any significant methodology changes are reviewed and approved by the allocation council with sufficient advance notice to the Business Unit CFOs. My experience has been that everyone agrees to a methodology change if it is in their favor and reduces their bill, but everyone is resistant if it impacts their business unit’s finances regardless of how logical the change may be. Further, the allocation process will bring out intra business unit tensions toward each other, especially for those that have an increase versus those that have a decrease, if the process is not done with plenty of communication and clear rationale.

Once you start the allocations, even if during a pilot or trial period, make sure you are doing transparent reporting. You or your leads should have a monthly meeting with each business area with good clear reports. Include your finance lead and the business unit finance lead in the meeting to ensure everyone is on the same financial page.  Remember, a key outcome is to enable your users to understand their overall costs, what the cost is for each services and, what business drivers impact which services and thus what costs they will bear. By establishing this linkage clearly the business users will then look to modify business demand so as to optimize their costs. Further, most business leaders will also use this allocations data and new found linkage to correct poor over-consumption behavior (such as users with two or three PCs or phones) within their organizations. But for them to do this you must provide usable reporting with accurate inventories. The best option is to enable managers to peruse their costs through an intranet interface for such
end-user services such as mobile phones, PCs, etc . There should be readily accessible usage and cost reports to enable them to understand their team’s demand and how much each unit costs.  They should have the option right on the same screens to discontinue, update or start services. In my experience, it is always amazing that once leaders understand their costs, they will want to manage them down, and if they have the right tools and reports, managing down poor consumption happens faster than a snowman melting in July — exactly the effect you were seeking.

There are a few additional caveats and guides to keep in mind:

  • In your reporting, don’t just show this month’s costs, show the cost trend over time and provide a projection of future unit costs and business demand
  • Ensure you include budget overheads in the cost allocation, otherwise you will have a budget shortfall and neglect key investment in the infrastructure to maintain it.
  • Similarly, make sure you account for full lifecycle costs of a service in the allocation — and be conservative in your initial allocation pricing, revisions later that are upward due to missed costs will be painful
  • For ‘build’ or ‘project’ costs, do not use exact resource pricing. Instead use an average price to avoid the situation where every business unit demands only the lowest cost IT  resources for their project resulting in a race to the bottom for lowest cost resources and no ability to expand capacity to meet demand since these would be high cost resources on the margin.
  • Use allocations to also avoid First-In issues to new technologies (set the rate at the project volume rate not the initial low volume rate) and to encourage transition off of expensive legacy technologies (Last out increases)
  • And lastly, and ensure your team knows and understands their services and their allocations and can articulate why what costs what they cost

With this framework and approach, you should be able to build and deliver an effective allocation mechanism that enables the corporation to avoid the overconsumption of free, common resources and properly direct the IT resources to where the best return for the corporation will be. Remember though that in the end this is an internal finance mechanism so the CFO should dictate the depth, level and allocation approach and you should ensure that the allocations mechanism does not become burdensome beyond its value. remember that allocations framework.

What have been your experiences with allocations frameworks? What changes or additions to these best practices would you add?

Best, Jim Ditmore

 

Evolving Metrics to Match Your Team’s Maturity

We have covered quite a bit of ground with previous posts on IT metrics but we have some important additions to the topic. The first, that we will cover today, is how to evolve your metrics to match your team’s maturity. (Next week, we will cover unit costs and allocations).

To ground our discussion, let’s first cover quickly the maturity levels of the team. Basing them heavily on the CMM, there are 5 levels:

  1. Ad hoc: A chaotic state with no established processes. Few measures are in place or accurate.
  2. Repeatable: The process is documented sufficiently and frequently used. Some measures are in place.
  3. Defined: Processes are defined and standard and highly adhered. Measures are routinely collected and analyzed.
  4. Managed: Processes are effectively controlled through the use of process metrics with some continuous process improvement (CPI).
  5. Optimized: Processes are optimized with statistical and CPI prevalent across all work.

It is important to match your IT metrics to the maturity of your team for several reasons:

  • capturing metrics which are beyond the team’s maturity level will be difficult to gather and likely lack accuracy
  • once gathered, there is potential for unreliable analysis and conclusions
  • and it will be unlikely that actions taken can result in sustained changes by the team
  • the difficulty and likely lack of progress and results can cause the team to negatively view any metrics or process improvement approach

Thus, before you start your team or organization on a metrics journey, ensure you understand their maturity so you can start the journey at the right place. If we take the primary activities of IT (production, projects, and routine services), you can map out the evolution of metrics by maturity as follows:

Production metrics – In moving from a low maturity environment to a high maturity, production metrics should evolve from typical inward-facing, component view measures of individual instances to customer view, service-oriented measures with both trend and pattern view as well and incident detail. Here is a detailed view:

Production Metrics Evolution

 

Project metrics – Measures in low maturity environments are project-centric usually focus on date and milestone with poor linkage to real work or quality. As the environment matures, more effective measures can be implemented that map actual work and quality as the work is being completed and provide accurate forecasts of project results. further, portfolio and program views and trends are available and leveraged.

Project Metrics Evolution

 

Routine Services – Low maturity measures are typically component or product-oriented at best within a strict SLA definition and lack a service view and customer satisfaction perspective. Higher maturity environments address these gaps and leverage unit costs, productivity, and unit quality within a context of business impact.

Routine Services Metrics Evolution

The general pattern is that as you move from low to medium and then to high maturity: you introduce process and service definition and accompanying metrics; you move from task or single project views to portfolio views; quality and value metrics are introduced and then exploited; and a customer or business value perspective becomes the prominent measure as to delivery success. Note that you cannot just jump to a high maturity approach as the level of discipline and understanding must be built over time with accumulating experience for the organization. To a degree, it is just like getting fit, you must go to the gym regularly and work hard – there is nothing in a bottle that will do it for you instead.

By matching the right level of metrics and proper next steps to your organization’s maturity, you will be rewarded with better delivery and higher quality, and your team will be able to progress and learn and leverage the next set of metrics. You will avoid a ‘bridge too far’ issue that often occurs when new leaders come into an organization that is less mature than their previous one, yet they impose the exact same regimen as they were familiar with previously. And then they fail to see why there are resultant problems and the blame either falls on the metrics framework imposed or the organization, when it is neither… it is the mismatch between the two.

And you will know your team has successfully completed their journey when they go from:

  • Production incidents to customer impact to ability to accurately forecast service quality
  • Production incidents to test defects to forecasted test defects to forecasted defects introduced to production
  • Unit counts of devices to package offerings to customer satisfaction
  • Unit counts or tasks to unit cost and performance measures to unit cost trajectories and performance trajectories

What has been your experience applying a metrics framework to a new organization? How have you adjusted it to ensure success with the new team?

Best, Jim Ditmore

Metrics Roundup: Key Techniques and References

As an IT leader either of a function or of a large IT team with multiple functions, what is the emphasis you should place on metrics and how are you able to leverage them to attain improvement and achieve more predictable delivery? What other foundational elements must be in place for you to effectively leverage the metrics? Which metrics are key measures or leading indicators and which ones are lagging or less relevant?

For those of you just joining, this is the fourth post on metrics and in our previous posts we focused on key aspects of IT metrics (transparency, benchmarking, and a scientific approach). You can find these posts in the archives or, better yet, in the pages linked under the home page menu of Best Practices. The intent of the RecipeforIT blog is to provide IT leaders with useful, actionable practices, techniques and guidance to be more successful IT leaders and enable their shops to achieve much higher performance. I plan to cover most of the key practices for IT leaders in my posts, and as a topic is covered, I try to migrate the material into Best Practices pages. So, now back to metrics.

In essence there are three types of relevant metrics:

  • operational or primary metrics – those metrics used to monitor, track and decision the daily or core work. Operational metrics are the base of effective management and are the fundamental measures of the activity being done. It is important that these metrics are collected inherently as part of the activity and best if the operational team  collects, understands and changes their direction based on these metrics.
  • verification or secondary metrics – those metrics used to verify that the work completed meets standards or is functioning as designed. Verification metrics should also be collected and reviewed by the same operational team though potentially by different members of the team or as participation as part of a broader activity (e.g. DR test). Verification metrics provide an additional measure of either overall quality or critical activity effectiveness.
  • performance or tertiary metrics – those metrics that provide insight as to the performance of the function or activity. Performance metrics enable insight as to the team’s efficiency, timeliness, and effectiveness.

Of course, your metrics for a particular function should consist of those measures needed to successfully execute and manage the function as well as those measures that demonstrate progress towards the goals of your organization. For example, let’s take am infrastructure function: server management. What operational metrics should be in place? What should be verified on a regular basis? And what performance metrics should we have? While this will vary based on the maturity, scale, and complexity of the server team and environment, here is a good subset:

Operational Metrics:

  • Server asset counts (by type, by OS, by age, location, business unit, etc) and server configurations by version (n, n-1, n-2, etc) or virtualized/non-virtual and if EOL or obsolete
  • Individual, grouped and overall server utilization, performance, etc by component (CPU, memory, etc)
  • Server incidents, availability, customer impacts by time period, trended with root cause and chronic or repeat issues areas identified
  • Server delivery time, server upgrade cycle time
  • Server cost overall and by type of server, by cost area (admin, maintenance, HW, etc) and cost by vendor
  • Server backup attempt and completion, server failover in place, etc

Verification metrics:

  • Monthly sample of the configuration management database server records for accuracy and completeness, Ongoing scan of network for servers not in the configuration management database, Regular reporting of all obsolete server configs with callouts on those exceeding planned service or refresh dates
  • Customer transaction times, Regular (every six months) capacity planning and performance reviews of critical business service stacks including servers
  • Root cause review of all significant impacting customer events, auto-detection of server issues versus manual or user detection ratios
  • DR Tests, server privileged access and log reviews, regular monthly server recovery or failover tests (for a sample)

Performance metrics:

  • Level of standardization or virtualization, level of currency/obsolescence
  • Level of customer impact availability, customer satisfaction with performance, amount of headroom to handle business growth
  • Administrators per server, Cost per server, Cost per business transaction
  • Server delivery time, man hours required to deliver a server

Obviously, if you are just setting out, you will collect on some of these metrics first. As you incorporate their collection and automate the work and reporting associated with them you can then tackle the additional metrics. And you will vary them according to the importance of different elements in your shop. If cost is critical, then reporting on cost and efficiency plays such as virtualization will naturally be more important. If time to market or availability are critical, than those elements should receive greater focus. Below is a diagram that reflects the construct of the three types of metrics and their relationship to the different metrics areas and score cards:

Metrics Framework

So, you have your metrics framework, what else is required to be successful leveraging the metrics?

First and foremost, the culture of your team must be open to alternate views and support healthy debate. Otherwise, no amount of data (metrics) or facts will enable the team to change directions from the party line. If you and your management team do not lead regular, fact-based discussions where course can be altered and different alternatives considered based on the facts and the results, you likely do not have the openness needed for this approach to be successful. Consider leading by example here and emphasize fact based discussions and decisions.

Also you must have defined processes that are generally adhered. If your group’s work is heavily ad hoc and different each time, measuring what happened the last time will not yield any benefits. If this is the case, you need to first focus on defining even at a high level, the major IT processes and help your team’s adopt them. Then you can proceed to metrics and the benefits they will accrue.

Accountability, sponsorship and the willingness to invest in the improvement activities are also key factors in the speed and scope of the improvements that can occur. As a leader you need to maintain a personal engagement in the metrics reviews and score card results. They should into your team’s goals and you should monitor the progress in key areas. Your sponsorship and senior business sponsorship where appropriate will be major accelerators to progress. And hold teams accountable for their results and improvements within their domain.

How does this correlate with your experience with metrics? Any server managers out there that would have suggestions on the server metrics? I expect we will have two further posts on metrics:

  • a post on how to evolve the metrics you measure as you increase the maturity and capability of your team,
  • and one on unit costing and allocations

I look forward to your suggestions.

Best, Jim

 

A Scientific Approach to IT Metrics

In order to achieve a world class or first quartile performance, it is critical to take a ‘scientific’ approach to IT metrics. Many shops remain rooted in ‘craft’ approaches to IT where techniques and processes are applied in an ad hoc manner to the work at hand and little is measured. Or, a smattering of process improvement methodologies (such as Six Sigma or Lean) or development approaches (e.g., Agile) are applied indiscriminately across the organization. Frequently then, due to a lack of success, the process methods or metrics focus are then tarred as being ineffective by managers.

Most organizations that I have seen that were mediocre performers typically have such craft or ad hoc approaches to their metrics and processes. And this includes not just the approach at the senior management level but at each of the 20 to 35 distinct functions that make up an IT shop (e.g., Networking, mainframes, servers, desktops, service desk, middleware , etc, and each business-focused area of development and integration). In fact, you must address the process and metrics at each distinct function level in order to then build a strong CIO level process, governance and metrics. And if you want to achieve 1st quartile or world-class performance, a scientific approach to metrics will make a major contribution. So let’s map out how to get to such an approach.

1) Evaluate your current metrics: You can pick several of the current functions you are responsible for and evaluate them to see where you are in your metrics approach and how to adjust to apply best practices. Take the following steps:

  • For each distinct function, identify the current metrics that are routinely used by the team to execute their work or make decisions.
  • Categorize these metrics as either operational metrics or reporting numbers. If they are not used by the team to do their daily work or they are not used routinely to make decisions on the work being done by the team, then these are reporting numbers. For example, they may be summary numbers reported to middle management or reported for audit or risk requirements or even for a legacy report that no one remembers why it is being produced.
  • Is a scorecard being produced for the function? An effective scorecard would have quantitative measures for the deliverables of the functions as well as objective scores for function goals that have been properly cascaded for the overall IT goals

2) Identify gaps with the current metrics: For each of IT functions there should be regular operational metrics for all key dimensions of delivery (quality, availability, cost, delivery against SLAs, schedule). Further, each area should have unit measures to enable an understanding of performance (e.g., unit cost, defects per unit, productivity, etc). As an example, the server team should have the following operational metrics:

    • all server asset inventory and demand volumes maintained and updated
    • operational metrics such as server availability, server configuration currency, server backups, server utilization should all be tracked
    • also time to deliver a server, total server costs, and delivery against performance and availability SLAs should be tracked
    • further secondary or verifying metrics such as server change success, server obsolescense, servers with multiple backup failures, chronic SLA or availability misses, etc should be tracked as well
    • function performance metrics should include cost per server (by type of server), administrators per server, administrator hours to build a server, percent virtualized servers, percent standardized servers, etc should also be derived

3) Establish full coverage: By comparing the existing metrics against the full set of delivery goals, you can quickly establish the appropriate operational metrics along with appropriate verifying metrics. Where there are metrics missing that should be gathered, work with the function to incorporate the additional metrics into their daily operational work and processes. Take care to work from the base metrics up to more advanced:

    • start with base metrics such as asset inventories and staff numbers and overall costs before you move to unit costs and productivity and other derived metrics
    • ensure the metrics are gathered in as automated a fashion as possible and as an inherent part of the overall work (they should not be gathered by a separate team or subsequent to the work being done

Ensure that verifying metrics are established for critical performance areas for the function as well. An example of this for the server function could be for the key activity of backups for a server:

    • the operational metrics would be perhaps backups completed against backups scheduled
    • the verifying metric would be twofold:
      • any backups for a single server that fail twice in a row get an alert and an engineering review as to why they failed (typically, for a variety of reasons 1% or fewer of your backups will fail, this is reasonable operational performance. But is one server does not get a successful backup for many days, you are likely putting the firm at risk if there is a database or disk failure, thus the critical alert)
      • every month or quarter, 3 or more backups are selected at random, and the team ensures they can successfully recover from the backup files. This will verify everything associated with the backup is actually working.

4) Collect the metrics only once: Often, teams collect similar metrics for different audiences. The metrics that they use to monitor for example configuration currency or configuration to standards, can be mostly duplicated by risk data collected against security parameter settings or executive management data on a percent server virtualization. This is a waste of the operational team’s time and can lead to confusing reports where one view doesn’t match another view. I recommend that you establish and overall metrics framework that includes risk and quality metrics as well as management and operational metrics so that all groups agree to the proper metrics. The metrics are then collected once, distilled and analyzed once, and congruent decisions can then be made by all groups. Later this week I will post a recommended metrics framework for a typical IT shop.

5) Drop the non-value numbers activity: For all those numbers that were identified as being gathered for middle management reports or for legacy reports with an uncertain audience; if there is no tie to a corporate or group goal, and the numbers are not being used by the function for operational purposes, I recommend to stop collecting the numbers and stop publishing any associated reports. It is non-value activity.

6) Use the metrics in regular review: At both the function team level and function management level the metrics should be trended, analyzed and discussed. These should be regular activities: monthly, weekly, and even daily depending on the metrics. The focus should be on how to improve, and based on the trends are current actions, staffing, processes, etc, enabling the team to improve and be successful on all goals or not. A clear feedback loop should be in place to enable the team and management to identify actions to take place to correct issues apparent through the metrics as quickly and as locally as possible. This gives control of the line to the team and the end result is better solutions, better work and better quality. This is what has been found in manufacturing time and again and is widely practiced by companies such as Toyota in their factories.

7) Summarize the metrics from across your functions into a scorecard: Ensure you identify the key metrics within each function and properly summarize and aggregate the metrics into an overall group score card. Obviously the score card should match you goals and key services that you deliver. It may be appropriate to rotate in key metrics from a function based on visibility or significant change. For example, if you are looking to improve overall time to market(TTM) of your projects, it may be appropriate to report on server delivery time as a key subcomponent and hopefully leading indicator of your improving TTM.  Including on your score card, even at a summarized level, key metrics from the various functions, will result in greater attention and pride being taken in the work since there is a very visible and direct consequences. I also recommend that on a quarterly basis, that you provide an assessment as to the progress and perhaps highlights of the team’s work as reflected in the score card.

8 ) Drive better results through proactive planning: The team and function management, once the metrics and feedback loop are in place, will be able to drive better performance through ongoing improvement as part of their regular activity. Greater increases in performance may require broader analysis and senior management support. Senior management should do proactive planning sessions with the function team to enable greater improvement to occur. The assignment for the team should be how take key metrics and what would be required to set them on a trajectory to a first quartile level in a certain time frame. For example, you may have both a cost reduction goal overall and within the server function there is a subgoal to achieve greater productivity (at a first quartile level)  and reduce the need for additional staff. By asking the team to map out what is required and by holding a proactive planning session on some of the key metrics (e.g. productivity) you will often identify the path to meet both local objectives that also contribute to the global objectives. Here, in the server example, you may find that with a moderate investment in automation, productivity can be greatly improved and staff costs reduced substantially. Thus both objectives could be obtained by the investment.  By holding such proactive sessions, where you ask the team to try and identify what would needs to be done to achieve a trajectory on their key metrics as well as considering what are the key goals and focus at the corporate or group level, you can often identify such doubly beneficial actions.

By taking these steps, you will employ a scientific approach to your metrics. If you add a degree of process definition and maturity, you will make significant strides to controlling and improving your environment in a sustainable way. This will build momentum and enable your team to enter a virtuous cycle of improvement and better performance. And then if add to the mix process improvement techniques (in moderation and with the right technique for each process and group), you will accelerate your improvement and results.

But start with your metrics and take a scientific approach. In the next week, I will be providing metrics frameworks that have stood well in large, complex shops along with templates that should help the understanding and application of the approach.

What metrics approaches have worked well for you? What keys would you add to this approach? What would you change?

Best, Jim

IT Service Desk: Building a Responsive and Effective Desk

This is the 4th in a series of posts on best practices in the IT Service Desk arena. To catch the previous material, you can check out the first post or you can read through the best practice reference pages on the the IT Recipes site. To help you best use this site, please know that as material is covered in the posts, we subsequently use it to properly build an ongoing reference area that can be used when you encounter a particular issue or problem area and are looking for how to solve it. There’s a good base of material in the reference area on efficiency and cost cutting, project management, recruiting talent, benchmarking, and now service desk. If you have any feedback on how to improve the reference area structure, don’t hesitate to let us know. We will be delivering one more post on service desk after this one and then I will be shifting to leadership techniques and building high performance teams.

One of the key challenges of the Service Desk is to respond to a customer transaction in a timely manner. Often, two situations occur: either efficiency or budget restrictions result in lengthened service times and poor perception of the service or the focus is purely on speed to answer and the initial interaction is positive but the end result is not highly effective. Meeting the customer expectations on timeliness, being cost effective, and delivering real value is the optimal performance that is our target.

Further, this optimal performance must be delivered in a complex environment. Timeliness must be handled differently for each activity (for example, the service for a significant production incident or service loss is handled as a ‘live’ telephone call, whereas an order for new equipment would be primarily submitted via a web form). The demand for the services is often 24 hours a day and global with multiple languages and interaction occurs over phone, web chat, and intranet (and soon, mobile app interfaces). This optimal performance should have both the cost and the effectiveness of the service desk measured holistically, that is, all the costs to deliver a service should be totaled including the end user and customer cost (e.g., wait time, restoration time, lost revenue opportunity, etc) and engineering time (e.g., time required to go back a gather data to deliver a service or time avoided if service is automated or handled by the service desk).

A great Service Desk not only delivers the operational numbers, it ensures that the workload flowing through the process is ‘value add’ activity that is required and necessary. The Service Desk must ensure that it measures performance as a cost / benefit to the whole organisation and not just in isolation. Doing the ‘right thing’ may actually move the narrower operational Service Desk metrics in the wrong direction; yet at the enterprise level it remains the right thing to do.

Optimize your service desk by managing demand and improving productivity

There are two primary factors that drive your service desk cost and delivery:

  • the number of calls and in particular the peak volume of calls, and,
  • the cost base of your service desk (mostly people costs: salary, benefits, taxes, etc

The volume and pattern of transaction demand is in turn the primary driver of the number of people required and is the key determinant of the overall cost base of the Service Desk. More specifically, the peak load of the Service Desk (the time at which call volumes are highest) is the time that drives your peak staffing volume and is therefore the most important target of demand reduction (i.e. the point that reductions in call / transaction volume are most likely to be realised as a financial cost saving or improved responsiveness to customers).

There are three key opportunities:

  • Mange the transaction volume
  • Manage the transaction pattern
  • Manage the transaction time

And in each opportunity area, we will look to apply best practices such that we improve the effectiveness of the service desk and IT experience overall.

Managing the Transaction Volume

Reducing the overall volume of transactions presented to the department reduces total workload. And while reducing the number of transactions is a good thing, these reductions may not be realised as cost savings or reduced customer wait times if they simply increase idle time during your quieter periods and do not reduce the peak load. The peak load is the point at which resourcing levels are at their highest and yet you are likely to have negative capacity (i.e. your customers will queue as you cannot resource fully to meet the peak). Eradicating demand even within troughs is valuable; however the true value is to focus on the peak. So start by identifying your key volume drivers and your peak load key volume drivers through statistical analysis. The use of Pareto analysis will usually demonstrate that a significant volume of your calls (+80%) are driven by a fairly small number of categories of call, sometimes the top 15 / 20 call types can account for as much as 80% of the total volume of calls. Then, for each call type impacting the peak, do the following analysis and actions:

  • Is it a chronic issue — meaning, is it a repetitive problem that users experience (i.e. Citrix is down every Monday morning, or new users are issued the wrong configuration to properly access data, etc)? If it is, then rank by frequency, missed SLAs and total cost (e.g. 200 users a week with the issue costing 2 hours of lost time is a $32,000/month problem). Allocate the investment based on SLA criticality and ROI and assign it to the appropriate engineering area to address with signoff by the service desk required when completed.
  • Is it a navigation or training issue? Having significant numbers of users call the service desk as a last resort to find out how to do something is an indicator that your systems or your intranet is not user friendly, intuitive or well-designed. Use these calls to drive navigation and design improvements for your systems and your intranet in each release cycle. Look to make it a normal input to improve usability of your systems.
  • Is it that requests can only be handled manually? As I have mentioned in previous posts, often the internal systems for employees (e.g. HR, Finance and IT) are the least automated and have manual forms and workflow. Look to migrate as much as possible to a self-serve, highly automated workflow approach. This particularly true for password administration. Unless you have full logical access automation, it is likely that User Administration and Password Management are key call drivers, particularly at peak as users arrive at work and attempt to log on to systems. Automation of password resets at an application / platform level is often achievable quickly and at a much lower cost than a fully integrated solution. Assign to your engineering and development teams so you can make significant peak load demand reductions with little or no investment and corresponding user experience improvement.
  • Can you automate or standardize? If you cannot automate then look to standardise wherever possible. For example, have the Service Desk work in partnership with your IT Security group and ensure that you adopt a corporate standard construction for passwords and log on credentials. This will result in users being less likely to lock themselves out of their accounts and reduce the peak load. And ensure the standards don’t go overboard on security. I once had a group where the passwords were changed for everyone every month. The result was 20,000 additional calls to the service desk because people forgot their passwords, and lax security because nearly everyone else was writing down their passwords. We changed it back to quarterly and saved $400,000 a quarter in reduced calls and made the users happy (and improved security).
  • Can you eliminate calls due to misdirection? Identify failure demand which are calls that are generated by weaknesses in the design or performance of support processes, including: wrong department / misdirected calls (or IVR choices), use of the Service Desk as a ‘Directory Enquiries’ function, repeat calls and chaser calls (i.e. where the customer hasn’t been able to provide all of the required details in one call or had to chase because their expectations have not been met). Failure demand should be eradicated through the re-design of support processes / services to eliminate multiple steps and common defects as well as improved customer communication and education.
  • Can you increase self service? Identify calls that could be resolved without the intervention of the Service Desk, i.e. through the use of alternative channels such as self service. Work with business lines and gain agreement to direct callers to the alternative (cheaper) channels. To encourage adoption, market the best channels and where necessary withdraw the services of the Service Desk to mandate the use of automation or self service solutions.
  • Is root cause addressed by the engineering teams? Undertake robust Problem Management processes and ensure that your engineering and application groups have clear accountabilities to resolve root cause issues and thus reduce the volume of calls into the Service Desk. A good way to secure buy in is to convert the call volumes into a financial figure and ensure the component management team has full awareness and responsibility to reduce these costs they are causing.
  • Can you streamline your online or intranet ticket creation and logging process? Organizations increasingly want to capture management information about technical faults that were fixed locally and it is not uncommon for business lines to request that a ticket is logged  just for management information purposes. Design your online ticket logging facility to be able to handle all of these transactions. Whilst such information is valuable, the Service Desk agent adds no value through their involvement in the transaction.
  • Do you have routing transactions that can have their entry easily automated?Consider reviewing operating procedures and identifying those transactions in which your agents undertake ‘check list’ style diagnosis before passing a ticket to resolver groups. In these instances, creating web forms (or templates within your Service Management toolset) enables the customer to answer the check list directly, raise their own ticket and then route directly to support.

Managing the Transaction Pattern

If workload cannot be eradicated (i.e. it is value-added work that must be done by the agent) then we next look to shift the work from peak to non-peak service times. Delivering service ‘at the point of peak customer demand’ is the most expensive way to deliver service as it increases the resource profile required and could build in high levels of latent non-productive time for your agents.

One technique to apply to shift the work from peak to non-peak is through customer choice. Leverage your IVR or your online self service ticketing systems to enable the customer to choose a call back option at non-peak times if they call in at peak. Many customer would prefer to  to select a specific time of service with certainty versus waiting an indeterminate amount of time for an agent. They can structure their workday productively around the issue. But you must ensure your team reliably calls back at the specified time.

Customer education around busy and quiet periods to call, messages on your IVR and even limiting some non-essential services to only being available during low demand hours will all help to smooth workflow and reduce the peak load. Further, providing a real-time systems production status toolbar on the intranet will minimize repeat call-ins for the same incident or status query calls.

You can also smooth or shift calls due to system releases and upgrades. Ensure that your releases and rollouts are scheduled at the with peak times in mind. A major rollout of a new system at the end of the month on a Monday morning when the service desk experiences a peak and there are other capacity stresses is just not well-planned.  Userids, passwords, and training should all be staged incrementally for a rollout to smooth demand and provide a better system introduction and resulting user experience. As a general rule doing a pilot implementation to gauge true user interaction (and resulting likely calls) is a good approach to identifying potential hotspots before wide introduction and fixing them.

Managing the Transaction Time

Transaction time can be improved in two ways:

  • improve the productivity and skill of the agent (or reduce the work to be done)
  • increase the resources to meet the demand thus reducing the wait time for the transactions.

Start by ensuring your hiring practices are bringing onboard agents with the right skills (technical, customer interface, problem-solving, and languages). Encourage agents to improve their skills through education and certification programs. Have your engineering teams provide periodic seminars on the systems of your firm and how they work. Ensure the service desk team is trained as part of every major new system release or upgrade.  Implement a knowledge management system that is initially written jointly by the engineering team and the service desk team. Enables comments and updates and hints to be added by your service desk agents. Ensure the taxonomy of the problems set is logical and easily navigated. And then ensure the knowledge base and operational documentation is updated for every major system release.

Another method to improve the productivity of your service desk is to capture the service and transaction data by the various service desk subteams (e.g., by region or shift, etc). There will be variation across the subteams, and you can use these variations to pinpoint potential best practices. Identifying and implementing best practice across your desk should lead to a convergence over time of call duration to an optimal number. Measuring the mean average and the standard deviation around the mean should demonstrate convergence over time if best practice is being embedded and used consistently across the workforce. Remember that just having the lowest service time per call may not be the optimal practice. Taking a bit longer on the call and delivering a higher rate of first call resolution is often a more optimal path.  Your agents with the longest call duration may be fixing more transactions; however it could be that some of these are too time consuming and should have been time-shifted as other customers are now being left to queue.

Managing transaction time has to be done very purposefully; otherwise quality is placed at risk. If agents believe they are under pressure to meet a certain transaction time, they will sacrifice quality to do so. This will result in re-work, reduced volumes of calls resolved at first point of contact and reduced customer satisfaction as they will receive inconsistent and reduced service. Transaction time has to be managed as a secondary measure to quality and resolution rates to prevent this from occurring. There should never be a stated deadline as to when a call has become too lengthy – each customer interaction has to be managed on its own merits and only in the aggregate (say a weekly or monthly average) can you fairly compare the delivery of your agents against each other.

Resource planning is the science of matching your supply of resources to meet the demand from the customer within a specified period of time (let’s say 20 seconds). Call Centres will usually manage their resource profile in 15 minute intervals. The mechanics of doing this is driven by probability – the probability of a call being presented within a 15 minute period (predicted using historical data gathered from your telephony) and the probability that an agent will become available within 20 seconds of the call being presented to answer that call. The ‘magic number’ of agents required in a 15 minute period is met when these probabilities are aligned so that we will meet the required level of service (e.g. 90% of calls will be answered in 20 seconds).

The volume of calls presented is one half of this equation, the frequency with which an agent becomes available is the other. Agents become available when they join the pool of active agents (i.e. when they sign in for a shift) or when they complete a call and are ready to receive the next call. The average transaction time (call length plus any additional time placing the agent in an unavailable state) determines how frequently they will become available in any given 15 minute period. A Call Centre with an average call duration of 2 ½ minutes will have each agent becoming available 6 times in a 15 minute period, whereas a call duration of 6 minutes will only have each agent becoming available 2 ½ times. The probability of an agent becoming available within the 20 second window in the first call centre is significantly higher and their resource requirements will therefore be much lower than the second. The right number for your business will be for you to determine. Then apply the best practice staffing approaches mentioned in our earlier posts. Recruit a talented team in the right locations and look to leverage part-time staff to help fulfill peak demand.

Here are a few best practice techniques for you to consider in managing the Transaction Time:

  • When calculating transaction time, ensure that you include not only the length of active talk time but also any other factor that makes the agent unavailable for the next call to be presented (e.g. any rest time that you have built into the telephony, any ‘wrap up’ time that the agent can manually apply to block other calls being presented etc…).
  • Present calls direct to agent headsets (i.e. a beep in the ear) rather than have their phones ring and require manually answering.
  • Analyse what the difference is between your best and worst performing agents, determine what best practice is and roll it out across the team. This may include everyone having access to the right applications on their desktop, having the right shortcuts loaded in their browsers, keeping high volume applications open rather than logging in when a relevant call is presented and a thousand other nuances that all add up to valuable seconds on a call.
  • Do a key stroke by key stroke analysis of how your agents use the Service Management toolset. Manage the logical workflow of a ticket, automate fields where possible, build templates to assist quick information capture and ensure that there is no wasted effort (i.e. that every key stroke counts). Ensure that support groups are not inflating call duration by requesting fields that are re-keyed in tickets for their own convenience.
  • Invest in developing professional coaching skills for your Team Leaders (you may even want to consider dedicated performance coaches) and embed call duration as an important element in your quality management processes. (Focusing on the length of call being right and appropriate for the circumstances and not just short). Coach staff through direct observation and at the time feedback.
  • Ensure that your performance metrics and rewards are aligned so that you reward quality delivery and your people have a clear understanding of the behaviours that you are driving. Ensure that performance is reviewed against the suite of measures (resolution, duration, quality sampling etc…) and not in isolation.
  • Build your checks and measures to keep the process honest. Measure and manage each element of the process to ensure that the numbers are not being manipulated by differences in agent behaviour. Run and check reports against short calls, agent log in / log out, abandoned calls and terminated calls. How agents use these statuses can fundamentally change their individual performance metrics and so it is the role of leaders to ensure that the playing field is level and that the process is not being subverted through negative behaviors.

On a final note on resource planning, if you have more than once central desk, look to consolidate your service desks. If you have different desks for different technologies or business areas, consolidating them will invariable lower cost and actually improve service. The economies of scale in call centers are very material, Further size and scale make it easier to run a Call Center that consistently deliver the quality, call response time and benchmarks favorable to the external market. Don’t let technology or business divisions sub-optimize this enterprise utility.

Effectively resourcing a Service Desk is about the application and manipulation of the laws of supply and demand. The Service Desk is not a passive victim of these forces and great Service Desks will be heavily focused on maximising the productivity, efficiency and effectiveness of the supply of labour. They will equally be managing their demand profile to ensure that all work is required and value add, workflow is managed to smooth demand away from peaks and that customers needs are satisfied through the most effective and efficient channels to deliver exceptional customer service.

We look forward to your thoughts or additions to these best practices for service desk.

Best, Steve and Jim