Beyond Big Data

April 22, 2013 Anthony Watson

Today’s post on Big Data is authored by Anthony Watson, CIO of Europe, Middle East Retail & Business Banking at Barclays Bank. It is thought-provoking take on ‘Big Data’ and how best to effectively use it. Please look past the atrocious British spelling :). We look forward to your comments and perspective. Best, Jim Ditmore

In March 2013, I read with great interest the results of the University of Cambridge analysis of some 58,000 Facebook profiles. The results predicted unpublished information like gender, sexual orientation, religious and political leanings of the profile owners. In one of the biggest studies of its kind, scientists from the university’s psychometrics team developed algorithms that were 88% accurate in predicting male sexual orientation, 95% for race and 80% for religion and political leanings. Personality types and emotional stability were also predicted with accuracy ranging from 62?75%. The experiment was conducted over the course of several years through their MyPersonality website and Facebook Application. You can sample a limited version of the method for yourself at http://www.YouAreWhatYouLike.com.

Not surprisingly, Facebook declined to comment on the analysis, but I guarantee you none of this information is news to anyone at Facebook. In fact it’s just the tip of the iceberg. Without a doubt the good people of Facebook have far more complex algorithms trawling, interrogating and manipulating its vast and disparate data warehouses, striving to give its demanding user base ever richer, more unique and distinctly customised experiences.

As an IT leader, I’d have to be living under a rock to have missed the “Big Data” buzz. Vendors, analysts, well-?intentioned executives and even my own staff – everyone seems to have an opinion lately, and most of those opinions imply that I should spend more money on Big Data.

It’s been clear to me for some time that we are no longer in the age of “what’s possible” when it comes to Big Data. Big Data is “big business” and the companies that can unlock, manipulate and utilise data and information to create compelling products and services for their consumers are going to win big in their respective industries.

Data flow around the world and through organisations is increasing exponentially and becoming highly complex; we’re dealing with greater and greater demands for storing, transmitting, and processing it. But in my opinion, all that is secondary. What’s exciting is what’s being done with it to enable better customer service and bespoke consumer interactions that significantly increase value along all our service lines in a way that was simply not possible just a few years ago. This is what’s truly compelling. Big Data is just a means to an end, and I question whether we’re losing sight of that in the midst of all the hype.

Why do we want bigger or better data? What is our goal? What does success look like? How will we know if we have attained it? These are the important questions and I sometimes get concerned that – like so often before in IT – we’re rushing (or being pushed by vendors, both consultants and solution providers alike) to solutions, tools and products before we really understand the broader value proposition. Let’s not be a solution in search of a problem. We’ve been down that supply-centric road too many times before.

For me it’s simple; Innovation starts with demand. Demand is the force that drives innovation. However this should not be confused with the axiom “necessity is the mother of invention”. When it comes to technology we live in a world where invention and innovation are defining the necessity and the demand. It all starts with a value experience for our customers. Only through a deep understanding of what “value” means to the customer can we truly be effective in searching out solutions. This understanding requires an open mind and the innovative resolve to challenge the conventions of “how we’ve always done it.”

Candidly I hate the term “Big Data”. It is marketing verbiage, coined by Gartner that covers a broad ecosystem of problems, tools, techniques, products, and solutions. If someone suggests you have a Big Data problem, that doesn’t say much as arguably any company operating at scale, in any industry, will have some sort of challenge with data. But beyond tagging all these challenges with the term Big Data, you’ll find little in common across diverse industries, products or services.

Given this diversity across industry and within organisations, how do we construct anything resembling a Big Data strategy? We have to stop thinking about the “supply” of Big Data tools, techniques, and products peddled by armies of over eager consultants and solution providers. For me technology simply enables a business proposition. We need to look upstream, to the demand. Demand presents itself in business terms. For example in Financial Services you might look at:

Who are our most profitable customers and, most importantly, why?
How do we increase customer satisfaction and drive brand loyalty?
How do we take excess and overbearing processes out of our supply chain and speed up time to market/service?
How do we reduce our losses to fraud without increasing compliance & control costs?

Importantly, asking these questions may or may not lead us down a Big Data road. But we have to start there. And the next set of questions is not about the solutions but framing the demand and potential solutions:

How do we understand the problem today? How is it measured? What would improvement look like?
What works in our current approach, in terms of the business results? What doesn’t? Why? What needs to improve?
Finally, what are the technical limitations in our current platforms? Have new techniques and tools emerged that directly address our current shortcomings?
Can we develop a hypothesis, an experimental approach to test these new techniques, so that they truly can deliver an improvement?
Having conducted the experiment, what did we learn? What should we abandon, and what should we move forward with?

There’s a system to this. Once we go through the above process, we start the cycle over. In a nutshell, it’s the process of continuous improvement. Some of you will recognise the well?known cycle of Plan, Do, Check, Act (“PDCA”) in the above.

Continuous improvement and PDCA are interesting, in that they are essentially the scientific method applied to business and one of the notable components of the Big Data movement is the emerging role of the Data Scientist.

So, who can help you assess this? Who is qualified to walk you through the process of defining your business problem and solving them through innovative analytics? I think it is the Data Scientist.

What’s a Data Scientist? It’s not a well?defined position, but here would be an ideal candidate:

Hands?on experience with building and using large and complex databases, relational and non-relational, and in the fields of data architecture and information management more broadly
Solid applied statistical training, grounded in a broader context of mathematical modeling.
Exposure to continuous improvement disciplines and industrial theory.
Most Importantly: Functional understanding of whatever industry is paying their salary i.e., Real world operational experience – theory is valuable; “scar tissue” is essential.

This person should be able to model data, translate that model into a physical schema, load that schema from sources, and write queries against it, but that’s just the start. One semester of introductory stats isn’t enough. They need to know what tools to use and when, and the limits and trade?offs of those tools. They need to be rigorous in their understanding and communication of confidence levels in their models and findings, and cautious of the inferences they draw.

Some of the Data Scientist’s core skills are transferrable, especially at the entry level. But at higher levels, they need to specialise. Vertical industry problems are rich, challenging, and deep. For example, an expert in call centre analytics would most certainly struggle to develop comparable skills in supply chain optimisation or workforce management.

And ultimately, they need to be experimentalists – true scientists engaged in a quest for knowledge on behalf of their company or organisation with an unresolvable sense of curiosity: engaged in a continuous cycle of:

examining the current reality,
developing and testing hypotheses, and
delivering positive results for broad implementation so that the cycle can begin again.

There are many sectors we can apply Big Data techniques to: financial services, manufacturing, retail, energy, and so forth. There are also common functional domains across the sectors: human resources, customer service, corporate finance, and even IT itself.

IT is particularly interesting. It’s the largest consumer of capital in most enterprises. IT represents a set of complex concerns that are not well understood in many enterprises: projects, vendors, assets, skilled staff, and intricate computing environments. All these come together to (hopefully) deliver critical and continuous value in the form of agile, stable and available IT services for internal business stakeholders, and most importantly external customers.

Given the criticality of IT, it’s often surprising how poorly managed IT is in terms of data and measurement. Does IT represent a Big Data domain? Yes, absolutely. From the variety of IT deliverables and artefacts and inventories, to the velocity of IT events feeding management consoles, to the volume of archived IT logs, IT itself is challenged by Big Data. IT is a microcosm of many business models. We in IT don’t do ourselves any favours starting from a supply perspective here, either. IT’s legitimate business questions include:

Are we getting the IT we’re paying for? Do we have unintentional redundancy in what we’re buying? Are we paying for services not delivered?
Why did that high severity incident occur and can we begin to predict incidents?
How agile are our systems? How stable? How available?
Is there a trade-off between agility? stability? and/or availability? How can we increase all three?

With the money spent on IT, and its operational criticality, Data Scientists can deliver value here as well. The method is the same: understand the current situation, develop and test new ideas, implement the ones that work, and watch results over time as input into the next round.

For example, the IT organisation might be challenged by a business problem of poor stakeholder trust, due to real or perceived inaccuracies in IT cost recovery. In turn, it is then determined that these inaccuracies stem from poor data quality for the IT assets on which cost recovery is based.

Data Scientists can explain that without an understanding of data quality, one does not know what confidence a model merits. If quality cannot be improved, the model remains more uncertain. But often, the quality can be improved. Asking “why” – perhaps repeatedly – may uncover key information that assists in turn with developing working and testable hypotheses for how to improve. Perhaps adopting master data management techniques pioneered for customer and product data will assist. Perhaps measuring the IT asset data quality trends over time is essential to improvement – people tend to focus on what is being measured and called out in a consistent way. Ultimately, this line of inquiry might result in the acquisition of a toolset like Blazent, which provides IT analytics & data quality solutions enabling a true end?to-end view of the IT ecosystem. Blazent is a toolset we’ve deployed at Barclays to great effect.

Similarly, a Data Scientist schooled in data management techniques, and with an experimental, continuous improvement orientation might look at an organisation’s recurring problems in diagnosing and fixing major incidents, and recommend that analytics be deployed against the terabytes of logs accumulating every day, both to improve root cause analysis, and ultimately to proactively predict outage scenarios based on previous outage patterns. Vendors like Splunk and Prelert might be brought in to assist with this problem at the systems management level. SAS has worked with text analytics across incident reports in safety-?critical industries to identify recurring patterns of issues.

It all starts with business benefit and value. The Big Data journey must begin with the end in mind, and not rush to purchase vehicles before the terrain and destination is known. A Data Scientist, or at least someone operating with a continuous improvement mind-?set who will champion this cause, is an essential component. So, rather than just talking about “Big Data,” let’s talk about “demand-?driven data science.” If we take that as our rallying cry and driving vision, we’ll go much further in delivering compelling, demonstrable and sustainable value in the end.

Best, Anthony Watson

Massive Mobile Shifts and Keeping Score

April 9, 2013 Jim D

As the first quarter of 2013 has come to a close, we see a technology industry moving at an accelerated pace. Consumerization is driving a faster level of change, with consequent impacts on technology ecosystem and the companies occupying different perches. From rapidly growing BYOD demand to the projected demise of the PC, we are seeing consumers shift their computing choices much faster than corporations, and some suppliers struggling to keep up. These rapid shifts require corporate IT groups to follow more quickly in their services. From implementing MDM (mobile device management), to increasing the bandwidth of wireless networks to adopting tablets and smartphones as the primary customer interfaces for future development, IT teams must adjust to ensure effective services and a competitive parity or advantage.

Let’s start with mobile. Consumers today use their smartphones to conduct much of their everyday business. And they use the devices if not for the entire transaction, then often to research or initiate the transaction. The lifeblood of most retail commerce has heavily shifted to the mobile channel. Thus, companies must have significant and effective mobile presence to achieve competitive advantage (or even survive). Mobile has become the first order of delivery for company services. Next in importance is the internet and then internal systems for call centers and staff. And since the vast majority of mobile devices (smartphone or tablet) are not Windows-based (nor is the internet), application development shops need to build or augment current Windows-oriented skills to enable native mobile development. Back end systems must be re-engineered to more easily support mobile apps. And given your company’s competitive edge may be determined by its mobile apps, you need to be cautious about fully outsourcing this critical work.

Internally such devices are becoming a pervasive feature in the corporate landscape. It is important to be able to accommodate many of the choices of your company’s staff and yet still secure and manage the client device environment. Thus, implementations of MDM to manage these devices and enable corporate security on the portion of the device that contains company data are increasing at a rapid pace. Further, while relatively few companies currently have a corporate app store, this will become prevalent feature within a few years and companies will shift from a ‘push’ model of software deployment to a ‘pull’ model. Further consequences of the rapid adoption of mobile devices by staff include such items as needing to implement wireless at your company sites, adding visitor wireless capabilities (like a Starbucks wifi), or just increasing the capacity to handle the additional load (a 50% increase in internal wifi demand in January is not unheard as everyone returns to the office with their Christmas gifts).

A further consequence of the massive shift to smartphones and tablets is the diminishing reach and impact of Microsoft based on Gartner latest analysis and projections. The shift away from PCs and towards tablets in the consumer markets reduces the largest revenue sources of Microsoft. It is stunning to realize that Microsoft with its long consumer market history, could become ever more dependent on the enterprise versus consumer market. Yet, because the consumer’s choices are rapidly making inroads into the corporate device market, even this will be a safe harbor for only a limited time. With Windows 8, Microsoft tried to address both markets with one OS platform, perhaps not succeeding well in either. A potential outcome for Microsoft is to introduce the reported ‘Blue’ OS version which will be a complete touch interface (versus a hybrid touch and traditional). Yet, Microsoft has struggled to gain traction against Android and iOS tablets and smartphones, so it is hard to see how this will yield significant share improvement. And with new Chrome devices and a reputed cheap iPhone coming, perhaps even Gartner’s projections for Microsoft are optimistic. The last overwhelming consumer OS competitive success Microsoft had was against OS/2 and IBM — Apple iOS and Google Android are far different competitors! With the consumer space exceedingly difficult to make much headway, my top prediction for 2013 is that Microsoft will subsequently introduce a new Windows ‘classic’ to satisfy the millions of corporate desktops where touch interfaces are inadequate or application have not been redesigned. Otherwise, enterprises may sit pat on the current versions for an extended period, depriving Microsoft of critical revenue streams. Subsequent to the 1st version of this post, there were reports of Microsoft introducing Windows 8 stripped of the ‘Metro’ or touch interface! Corporate IT shops need to monitor these outcomes because once a shift occurs, there could be a rapid transition not just in the OS, but in the productivity suites and email as well.

There is also upheaval in the PC supplier base as a result of the worldwide sales decline of 13.9% (year over year in Q1). Also predicted here in January, HP struggled the most among the top 3 of HP, Lenovo and Dell. HP was down almost 24%, barely retaining the title of top volume manufacturer. Lenovo was flat, delivering the best performance in a declining market. Lenovo delivered 11.7 million units in the quarter, just below HP’s 12 million units. Dell suffered a 10.9% drop, which given the company is up for sale, is remarkable. Acer and other smaller firms saw major drops in sales as well (more than 31% for Acer). The ongoing decline of the market will see massive impact on the smaller market participants, with consolidation and fallout likely occurring late this year and early in 2014. The real question is whether HP can turn around their rapid decline. It will be a difficult task because the smartphone, tablet and Chrome book onslaught is occurring when HP is facing a rejuvenated Lenovo and a very aggressive Dell. Ultrabooks will provide some margin and volume improvement, but not enough to make up for the declines. Current course suggests that early 2014 will see a declining market where Lenovo is comfortably leading followed by a lagging HP fighting tooth and nail with Dell for 2nd place. HP must pull off a major product refresh, supply chain tightening, and aggressive sales to turn it around. It will be a tall order.

Perhaps the next consumerization influence will be the greater use of desktop video. Many of our employees have experienced the pretty good video of Skype or Facetime and potentially will be expecting similar experiences in the corporate conversations. Current internal networks often do not have the bandwidth for such casual and common video interactions, especially for smaller campuses or remote offices. It will be important for IT shops to manage the introduction of the capabilities so that more critical workload is not impacted.

How is your company’s progress on mobile? do you have an app store? Have you implemented desktop video? I look forward to hearing from you.

Best, Jim Ditmore

Cloud Trends: Turning the Tide on Data Centers

March 6, 2013 Jim D

A recent study by Intel shows that the compute load that required 184 single-core processors in 2005 now can be handled with just 21 processors where every nine servers are replaced by one.

For 40 years, technology rode Moore’s Law to yield ever-more-powerful processors at lower cost. Its compounding effect was astounding: One of the best analogies is that we now have more processing power in a smart phone than the Apollo astronauts had when they landed on the moon. At the same time though, the electrical power requirements for those processors continued to increase at a similar rate as the increase in transistor count. While new technologies (CMOS, for example) provided a one-time step-down in power requirements, each turn-up in processor frequency and density resulted in similar power increases.

As a result, by the 2000-2005 timeframe there were industry concerns regarding the amount of power and cooling required for each rack in the data center. And with the enormous increase in servers spurred by Internet commerce, most IT shops have labored for the past decade to supply adequate data center power and cooling.

Meantime, most IT shops have experienced compute and storage growth rates of 20% to 50% a year, requiring either additional data centers or major increases in power and cooling capacity at existing centers. Since 2008, there has been some alleviation due to both slower business growth and the benefits of virtualization, which has let companies reduce their number of servers by as much as 10 to 1 for 30% to 70% of their footprint. But IT shops can deploy virtualization only once, suggesting that they’ll be staring at a data center build or major upgrade in the next few years.

But an interesting thing has happened to server power efficiency. Before 2006, such efficiency improvements were nominal, represented by the solid blue line below. Even if your data center kept the number of servers steady but just migrated to the latest model, it would need significant increases in power and cooling. You’d experience greater compute performance, of course, but your power and cooling would increase in a corresponding fashion. Since 2006, however, compute efficiency (green line) has improved dramatically, even outpacing the improvement in processor performance (red lines).

The chart above shows how the compute efficiency (performance per watt — green line) has shifted dramatically from its historical trend (blue lines). And it’s improving about as fast as compute performance is improving (red lines), perhaps even faster. The chart above is for the HP DL 380 server line over the past decade, but most servers are showing a similar shift.

This stunning shift is likely to continue for several reasons. Power and cooling costs continue to be a significant proportion of overall server operating costs. Most companies now assess power efficiency when evaluating which server to buy. Server manufacturers can differentiate themselves by improving power efficiency. Furthermore, there’s a proliferation of appliances or “engineered stacks” that eke significantly better performance from conventional technology within a given power footprint.

A key underlying reason for future increases in compute efficiency is the fact that chipset technologies are increasingly driven by the requirements for consumer mobile devices. One of the most important requirements of the consumer market is improved battery life, which also places a premium on energy-efficient processors. Chip (and power efficiency) advances and designs in the consumer market will flow back into the corporate (server) market. An excellent example is HP’s Moonshot program which leverages ARM chips (previously typically used in consumer devices only) for a purported 80%+ reduction in power consumption. Expect this power efficiency trend to continue for the next five and possibly the next 10 years.

So how does this propitious trend impact the typical IT shop? For one thing, it reduces the need to build another data center. If you have some buffer room now in your data center and you can move most of your server estate to a private cloud (virtualized, heavily standardized, automated), then you will deliver more compute poer yet also see a leveling and then a reduction in the number of servers(blue line) and a similar trend in the power consumed (green line).

This analysis assumes 5% to 10% business growth, (translating to a need for a 15% to 20% increase in server performance/capacity). You’ll have to employ best practices in capacity and performance management to get the most from your server and storage pools, but the long-term payoff is big. If you don’t leverage these technologies and approaches, your future is the red and purple lines on the chart: ever-rising compute and data center costs over the coming years.

By applying these approaches, you can do more than stem the compute cost tide; you can turn it. Have you started this journey? Have you been able to reduce the total number of servers in your environment? Are you able to meet your current and future business needs and growth within your current data center footprint?

What changes or additions to this approach would you make? I look forward to your thoughts and perspective.

Best, Jim Ditmore

Note this post was first published on January 23rd in InformationWeek. It has been updated since then.

A Cloudy Future: Hard Truths and How to Best Leverage Cloud

January 26, 2013 Jim D

We are long into the marketing hype cycle on cloud. That means that clear criteria to assess and evaluate the different cloud options are critical. Given these complexities, what approach should the medium to large enterprise take to best leverage cloud and optimize their data center? What are the pitfalls as well? While cloud computing is often presented as homogenous, there are many different types of cloud computing from infrastructure as a service (IaaS) to software as a service (SaaS) and many flavors in between. Perhaps some of the best examples are Amazon’s infrastructure services (IaaS), Google’s Email and office productivity services (SaaS), and Salesforce.com’s customer relationship management or CRM services (SaaS). Typically, the cloud is envisioned as an accessible and low cost compute utility in the sky that is always available. Despite this lofty promise, companies will need to select and build their cloud environment carefully to avoid fracturing their computing capabilities, locking themselves into a single, higher cost environment or impacting their ability to differentiate and gain competitive advantage – or all three.

The chart below provides an overview of the different types of cloud computing:

Note the positioning of the two dominant types of cloud computing:

there is the specialized Software-as-a-Service (SaaS) where the entire stack from server to application (even version) are provided — with minimal variation
there is the very generic IaaS or PaaS where a set of server and OS version(s) is available with types of storage. Any compatible database, middleware, or application can be installed to then run.

Other types of cloud computing include private cloud – essentially IaaS that an enterprise builds for itself. The private cloud variant is the evolution of the current corporate virtualized server and storage farm to a more mature instance with clearly defined service configurations, offerings, billing as well as highly automated provisioning and management.

Another impacting technology in the data center is engineered stacks. These are a further evolution of the computer appliances that have been available for decades. Engineered stacks are tightly specified, designed and engineered components integrated to provide superior performance and cost. These devices have typically been in the network, security, database and specialized compute spaces. Firewalls and other security devices have long leveraged an this approach where generic technology (CPU, storage, OS) is closely integrated with additional special purpose software and sold and serviced as an packaged solution. There has been a steady increase in the number of appliance or engineered stack offerings moving further into data analytics, application servers, and middleware.

With the landscape set it is important to understand the technology industry market forces and the customer economics that will drive the data center landscape over the next five years. First, the technology vendors will continue to invest and increase the SaaS and engineered stack offerings because they offer significantly better margin and more certain long term revenue. A SaaS offering gets a far higher Wall Street multiple than traditional software licenses — and for good reason — it can be viewed as a consistent ongoing revenue stream where the customer is heavily locked in. Similarly for engineered stacks, traditional hardware vendors are racing to integrate as far up the stack as possible to both create additional value but more importantly enable locked in advantage where upgrades, support and maintenance can be more assured and at higher margin than traditional commodity servers or storage. It is a higher hurdle to replace an engineered stack than commodity equipment.

The industry investment will be accelerated by customer spend. Both SaaS and engineered stacks provide appealing business value that will justify their selection. For SaaS, it is speed and ease of implementation as well as potentially variable cost. For engineered stacks, it is a performance uplift at potentially lower costs that often makes the sale. Both SaaS and engineered stacks should be selected where the business case makes sense but with the cautions of:

for SaaS:
- be very careful if it is core business functionality or processes, you could be locking away your differentiation and ultimate competitiveness.
- know how you will get your data back before you sign should you stop using the SaaS
- make sure you have ensured integrity and security of your data in your vendor’s hands
for engineered stacks:
- understand where the product is in its lifecycle before selecting
- anticipate the eventual migration path as the product fades at the end of its cycle

For both, ensure you avoid integrating key business logic into the vendor’s product. Otherwise you will be faced with high migration costs at the end of the product life or when there is a better compelling product. There are multiple ways to ensure that your key functionality and business rules remain independent and modular outside of the vendor service package.

With these caveats in mind, and a critical eye to your contract to avoid onerous terms and lock-ins, you will be successful with the project level decisions. But you should drive optimization at the portfolio level as well. If you are a medium to large enterprise, you should be driving your internal infrastructure to mature their offering to an internal private cloud. Virtualization, already widespread in the industry, is just the first step. You should move to eliminate or minimize your custom configurations (preferably less than 20% of your server population). Next, invest in the tools and process and engineering so you can heavily automate the provisioning and management of the data center. Doing this will also improve the quality of service).

Make sure that you do not shift so much of your processing to SaaS that you ‘balkanize’ your own utility. Your data center utility would then operate subscale and inefficiently. Should you overreach, expect to incur heavy integration costs on subsequent initiatives (because your functionality will be spread across multiple SaaS vendors in many data centers). And you can expect to experience performance issues as your systems operate at WAN speeds versus LAN speeds across these centers. And expect to lose negotiating position with SaaS providers because you have lost your ‘in-source’ strength.

I would venture that over the next five year the a well-managed IT shop will see:

the most growth in its SaaS and engineered stack portfolio,
a conversion from custom infrastructure to a robust private cloud with a small sliver of custom remaining for unconverted legacy systems
minimal growth in PaaS and IaaS (growth here is actually driven by small to medium firms)

This transition is represented symbolically in the chart below:

Data Center Transition Over the Next 5 Years

So, on the road to a cloud future, SaaS and engineered stacks will be a part of nearly every company’s portfolio. Vendor lock-in could be around every corner, but good IT shops will leverage these capabilities judiciously and develop their own private cloud capabilities as well as retain critical IP and avoid the lock-ins. We will see far greater efficiency in the data center as custom configurations are heavily reduced. So while the prospects are indeed ‘cloudy’, the future is potentially bright for the thoughtful IT shop.

What changes or guidelines would you apply when considering cloud computing and the many offerings? I look forward to your perspective.

This post appeared in its original version at Information Week January 4. I have extended and revised it since then.

Best, Jim Ditmore

A Cloudy Future: SaaS and Balkanization of the Data Center

November 24, 2012 Jim D

As I mentioned in my previous post, I will be exploring infrastructure trends, and in particular, cloud computing. But while cloud computing is getting most of the marketing press, there are two additional phenomena that are capturing as much if not more of the market: computer appliances and SaaS. I have just covered computer appliances in my preceding post and we will cover SaaS here. This will then set the stage for a comprehensive cloud discussion that will yield effective cloud strategies for IT leaders.

While SaaS started back in the 1960s with service bureaus, it was reborn with the advent of the Internet which made it much easier for client companies and their users to easily access the services. Salesforce.com is arguably one of the earliest and best examples of SaaS. It nows serves over 100,000 customers and 1 million subscribers with an entire ecosystem of software services and the ability to build custom extensions and applications above the base CRM and sales force functions. SaaS has continued to see robust growth rates (up to 20%) and will exceed an estimated $14B in annual revenues per Gartner (out of a total industry of $300 to 400B). Growth can be attributed to several advantages:

startup costs are low and as you increase in scale the costs are variable based on your usage
you can quickly implement to a solid level of base functionality
you do not need a large staff of specialized experts to build and maintain your systems

SaaS is here to stay and will continue to grow smartly given its particular advantages and the typically greater margins for service providers. In fact, many traditional software providers are adding SaaS options to their services to enter into this higher margin area. As a consumer of software, perhaps that is the first tipoff for ‘buyer beware’ for SaaS: the high desire of the software industry to move consumers to SaaS from traditional perpetual license models.

What are the advantages of SaaS versus traditional offerings to a software firm? Instead of a potentially large upfront license payment and then much smaller maintenance payments, a SaaS provider receives ongoing strong payments that grow over time — a much more reliable revenue stream. And because they provide software and infrastructure, the total sale is higher. With enough of a base, the margins can actually be even greater than those for traditional software. And, best of all, it is harder for you customers to leave or switch to an alternate supplier. The end result is greater and more reliable revenue and margins, prompting a higher valuation for the software company (and the great desire to migrate customers to SaaS).

As a consumer of SaaS, these results as well as other service factors and business implications must be kept in mind before entering into the service contract. First, some consideration must be given to business criticality of the software. As a business you in essence have three alternatives means to encode your processes and knowledge: custom software, packaged software, and SaaS services. (There is also broader outsourcing but I omit that here). For peripheral services, such as payroll or finance, it is entirely appropriate to use packaged software or SaaS (e.g., ADP or Workday). Typically, there is no competitive advantage beyond having a competent payroll service that maintains confidentiality of employee data. ADP has been providing this type of service for decades to a plethora of companies. But when you begin to utilize SaaS for more core business functions then you have potential competitive risk for your business. Your rate of innovation and product improvement will be limited to a large degree to the rate of advance of the SaaS provider. Even with packaged software your IT team can apply significant extensions and customizations that enable substantial differentiation. This ability becomes minimized with SaaS. For small and medium sized companies these drawbacks likely do not outweigh the benefits of leveraging a ‘good’ platform with little upfront investment. But for a large company, a SaaS course can minimize your advantages over competition. For an excellent historical example, take the case of when I was at JPMC/BankOne and its use of the service provider First Data for credit card software. While First Data provided adequate capabilities and services, the service eventually turned into an anchor. As JPMC tried to drive a faster cycle of features and capabilities to distance itself from the competition, First Data lagged behind in its cycle time – in part because as a SaaS, it was maintaining a broad industry utility. Even worse, once a new feature was introduced for JPMC (and primarily funded by JPMC) it would be available 6 or 12 months later for the rest of the First Data customers. It was not possible to negotiate better exclusivity conditions nor was it in First Data’s interest to have divergent code bases and capabilities for its customers. After detailed analysis, and consideration of the scale economies mentioned below, the only way to achieve sustainable business advantages was to in-source a modern version of credit card services and then customize and build it out to achieve feature and product supremacy. This is then exactly what JPMC then did. While this may be the extreme example (where you are a top competitor in an industry), it represents the underlying limitations of using a SaaS service (in essence, a function utility) for key business capability. The ability to differentiate from competitors and win in the marketplace will be compromised.

Even services that appear to have minimal business criticality can have such impact. For example, if you decide to leverage SaaS for e-mail and chat capabilities, then how will you handle integration of presence (usually provided by email and chat utilities) into your call center apps or other business applications or deliver next generation collaboration capabilities across your workforce? Such an integration across a service provider, your applications, and multiple data centers now becomes more difficult (especially for performance sensitive applications like call center and CTI) to provide differentiating business functions versus comparable services within your data center that are easily accessible to your core applications.

Second, particularly in this time of greater risk awareness and regulation, consideration must be given to data and security. You must be carefully inspect the data protection and security measures of the SaaS provider. While you would certainly review security practices for any significant supplier, with a SaaS firm it is even more important as they will have your data on their premise. Further, it is possible that there is ‘intermingling’, where other customers of the SaaS data is stored in the same databases and access the same systems. Protections must be in place to prevent viewing or leaking of each company’s data. And there may be additional regulatory requirements regarding where and how data is stored (especially for customer or employee data) that you must ensure the SaaS also meets.

In addition to data protection measures, you should also ensure that you will always be able to access and extract your data. Since your data will be at the SaaS site and in their database format, your firm is at a disadvantage if their is a contract dispute or you wish to transfer services to another supplier or in-house. Thus, it is important to get safeguards as part of the original agreement that provide for such extract. Even better, a daily extract of the data of your data in a usable format back to your data center ensures you have access and control of your data. Or alternately, a data ‘escrow’ provision in the contract can ensure you can access your data. Perhaps the best advice I have heard on this matter is ‘Don’t get into a SaaS arrangement until you know how you are going to get your data out’.

A third SaaS consideration for IT shops is the potential loss of scale and ‘balkanization’ of their data center due to cumulative SaaS decisions. In other words, while a handful or even multiple SaaS alternatives may make sense when each is considered singly, the cumulative effect of dozens of such agreements will be to reduce the scalability of your internal team and undermine the IT economics. For example, if you have out-sourced 40 or 50% of your IT capacity to SaaS, then the remaining infrastructure must be substantially resized and will no longer operate at the same economic level as previous. This will then likely cause increased costs for the remaining internal applications. Realize that with this ‘balkanization’, your data and processing will then be executed in and spread out in many different data centers. This results in potential performance and operational issues as you try to manage a largely ‘out-sourced’ shop through many different service agreements. Moreover, as you try to subsequently integrate various components to achieve process or product or customer advantages you will find significant issues as you try to tie together multiple SaaS vendors (who are not interested in working with each other) with your remaining systems. Thus, the cumulative effects of multiple SaaS can be far-reaching beyond just the services being evaluated.

So, on the road to a cloud future, SaaS will be a part of nearly every company’s portfolio. There are a number of advantage to SaaS and on the market there are very good SaaS offerings especially for small to medium sized companies. And a few that I would recommend for large companies (e.g., ServiceNow). But medium and large enterprise should chose their SaaS vendors wisely using the following guidelines:

think long and hard before using SaaS for core function, processes, or customer data
effectively treat such a service as an out-source and have contractual safeguards on the service levels. Closely review current customer experience (not just their references either) as this is a strong indicator of what you will find. And consider and plan (in the contract) how you will exit the service if it does not work out.
ensure you have control of your data, and you know how you will get the data out if all does not go well
consider the cumulative effects of SaaS and ensure you do not lose enterprise integration or efficiencies as you consider each offering singly.

By leveraging these guidelines I think you can be successful with SaaS and minimize downside from a poor service. My forecast is that it will be a significant part of most portfolios in the next 5 years. This is particularly true if you are a small or medium sized company and your objective is to quickly follow (or stay with) the market in capabilities. Large companies though will benefit as well when applied judiciously.

In my next post, I will cover the Cloud futures overall and how best to navigate in the coming years (it will be based on both this post as well as the previous post on appliances).

What changes or guidelines would you apply when looking at SaaS? I look forward to your perspective.

Best, Jim Ditmore

A Cloudy Future: The Rise of Appliances and SaaS

October 22, 2012 Jim D

As I mentioned in my previous post, I will be exploring infrastructure trends, and in particular, cloud computing. But while cloud computing is getting most of the marketing press, there are two additional phenomena that are capturing as much if not more of the market: computer appliances and SaaS. So, before we dive deep into cloud, let’s explore these other two trends and then set the stage for a comprehensive cloud discussion that will yield effective strategies for IT leaders.

Computer appliances have been available for decades, typically in the network, security, database and specialized compute spaces. Firewalls and other security devices have long leveraged an appliance approach where generic technology (CPU, storage, OS) is closely integrated with additional special purpose software and sold and serviced as an packaged solution. Specialized database appliances for data warehousing were quite successful starting in the early 1990s (remember Teradata?).

The tighter integration of appliances gives significant advantage over traditional approaches with generic systems. First, since the integrator of the package often is also the supplier of the software and thus can achieve improved tuning of performance and capacity of the software with a specific OS and hardware set. Further, this integrated stack then requires much less install and implementation effort by the customer. The end result can be impressive performance for similar cost to a traditional generic stack without the implementation effort or difficulties. Thus appliances can have a compelling performance and business case for the typical medium and large enterprise. And they are compelling for the technology supplier as well because they will command higher prices and are much higher margin than the individual components.

It is important to recognize that appliances are part of a normal tug and pull between generic and specialized solutions. In essence, throughout the past 40 years of computing, there has been the constant improvement in generic technologies under the march of Moore’s law. And with each advance there are two paths to take: leverage generic technologies and keep your stack loosely coupled so you can continue to leverage the advance of generic components or, closely integrated your stack with the then most current components and drive much better performance from this integration.

By their very nature though, appliances become rooted in a particular generation of technology. The initial iteration can be done with the latest version of technology but the integration will likely result in tight links to the OS, hardware and other underlying layers to wring out every performance improvement available. These tight links yield both the performance improvement and the chains to a particular generation of technology. Once an appliance is developed and marketed successfully, ongoing evolutionary improvements will continue to be made, layering in further links to the original base technology. And the margins themselves are addictive with the suppliers doing everything possible to maintain the margins (thus evolutionary low cost advances will occur but revolutionary (next generation) will likely require too high of an investment to maintain the margins). This then spells the eventual fading and demise of that appliance, as the generic technologies continue their relentless advance and typically surpass the appliance in 2 or 3 generations. This is represented in the chart below and can be seen in the evolution of data warehousing.

The first instances of data warehousing were done using the primary generic platform of the time (the mainframe) and mainstream databases. But with the rise of another generic technology, proprietary chipsets out of the midrange and high end workstation sector, Teradata and others combined these chipsets with specialized hardware and database software to develop much more powerful data warehouse appliances. From the late 1980s through the 1990s the Teradata appliance maintained a significant performance and value edge over generic alternatives. But that began to fray around 2000 with the continued rise of mainstream databases and server chipsets along with low cost operating systems and storage that could be combined to match the performance of Teradata at much lower cost. In this instance, the Teradata appliance held a significant performance advantage for about 10 years before falling back into or below the mainstream generic performance. The value advantage diminished much sooner of course. Typically, the appliance performance advantage is for 4 to 6 years at most. Thus, early in the cycle (typically 3 to 4 generic generations or 4 to 5 years), an appliance offering will present material performance and possibly cost advantages over traditional, generic solutions.

As a technology leader, I recommend the following considerations when looking at appliances:

If you have real business needs that will drive significant benefit from such performance, then investigate the appliance solution.
Keep in mind that in the mid-term the appliance solution will steadily lose advantage and subsequently cost more than the generic solution. Understand where the appliance solution is in its evolution – this will determine its effective life and the likely length of your advantage over generic systems
Factor the hurdle, or ‘switchback’ costs at the end of its life. (The appliance will likely require a hefty investment to transition back to generic solutions that have steadily marched forward).
The switchback costs will be much higher where business logic is layered in (e.g. for middleware, database or business software appliances versus network or security appliances (where there is minimal custom business logic layered in).
Include the level of integration effort and cost required. Often a few appliances within a generic infrastructure will have a smooth integration and less cost. On the other hand, weaving multiple appliances within a service stack can cause much higher integration costs and not yield desired results. Remember that you have limited flexibility with an appliance due to its integrated nature and this could cause issues when they are strung together (e.g., a security appliance with a load balance appliance with a middleware appliance with a business application appliance and data warehouse appliance (!)).
Note for certain areas, security and network in particular, often the follow-on to an appliance will be a next generation appliance from the same or different vendor. This is because there is minimal business logic incorporated in the system (yes, there are lots of parameter settings like firewall rules customer for a business, but the firewall operates essentially the same regardless of the business that uses it).

With these guidelines, you should be able to make better decisions about when to use an appliance and how much of a premium you should pay.

In my next post, I will cover SaaS and I will then bring these views together with a perspective on cloud in a final post.

What changes or additions would you make when considering appliances? I look forward to your perspective.

Best, Jim Ditmore

IT Security in the Headlines – Again

October 13, 2012 Jim D

Again. Headlines are splashed across front pages and business journals where banks, energy companies, and government web sites have been attacked. As I called out six months ago, the pace, scale and intensity of attacks had increased dramatically in the past year and was likely to continue to grow. Given one of the most important responsibilities of a CIO and senior IT leaders is to protect the data and services of the firm or entity, security must be a bedrock capability and focus. And while I have seen a significant uptick in awareness and investment in security over the past 5 years, there is much more to be done at many firms to reach proper protection. Further, as IT leaders, we must understand IT is in deadly arms race that requires urgent and comprehensive action.

The latest set of incidents are DD0S attacks against US financial institutions. These have been conducted by Muslim hacker groups purportedly in retaliation for the Innocence of Muslims film. But this weekend’s Wall Street Journal outlined that the groups behind the attacks are sponsored by the Iranian government – ‘the attacks bore “signatures” that allowed U.S. investigators to trace them to the Iranian government’. This is another expansion of the ‘advanced persistent threats’ or APTs that now dominate hacker activity. APTs are well-organized, highly capable entities funded by either governments or broad fraud activities that enables them to carry out hacking activities at unprecedented scale and sophistication. As this wave of attacks migrates from large financial institutions like JP Morgan Chase and Wells Fargo to mid-sized firms, IT departments should be rechecking their defenses against DD0S as well as other hazards. If you do not already have explicit protection against DDoS, I recommend leveraging a carrier network-based DDoS service as well as having a third party validate your external defenses against penetration. While the stakes currently appear to be a loss of access to your websites, any weaknesses found by the attackers will invariably be subsequently exploited for fraud and potential data destruction. This is exactly the path of the attacks against energy companies including Saudi Aramco that recently preceded the financial institutions attack wave. And no less than Leon Panetta spoke about the most recent attacks and consequences. As CIO, your firm cannot be exposed as lagging in this arena without possible significant impact to reputation, profits, and competitiveness.

So, what are the measures you should take or ensure are in place? In addition to the network-based DDoS service mentioned above, you should implement these fundamental security measures first outlined in my April post and then consider the advanced measures to keep pace in the IT security arms race.

Fundamental Measures:

1. Establish a thoughtful password policy. Sure, this is pretty basic, but it’s worth revisiting and a key link in your security. Definitely require that users change their passwords regularly, but set a reasonable frequency–any less than three months and users will write their passwords down, compromising security. As for password complexity, require at least six characters, with one capital letter and one number or other special character.

2. Publicize best security and confidentiality practices. Do a bit of marketing to raise user awareness and improve security and confidentiality practices. No security tool can be everywhere. Remind your employees that security threats can follow them home from work or to work from home.

3. Install and update robust antivirus software on your network and client devices. Enough said, but keep it up-to-date and make it comprehensive (all devices).

4. Review access regularly. Also, ensure that all access is provided on a “need-to-know” or “need-to- do” basis. This is an integral part of any Sarbanes-Oxley review, and it’s a good security practice as well. Educate your users at the same time you ask them to do the review. This will reduce the possibility of a single employee being able to commit fraud resulting from retained access from a previous position.

5. Put in place laptop bootup hard drive encryption. This encryption will make it very difficult to expose confidential company information via lost or stolen http://www.buyambienmed.com laptops, which is still a big problem. Meanwhile, educate employees to avoid leaving laptops in their vehicles or other insecure places.

6. Require secure access for “superuser” administrators. Given their system privileges, any compromise to their access can open up your systems completely. Ensure that they don’t use generic user IDs, that their generic passwords are changed to a robust strength, and that all their commands are logged (and subsequently reviewed by another engineering team and management). Implement two-factor authentication for any remote superuser ID access.

7. Maintain up-to-date patching. Enough said.

8. Encrypt critical data only. Any customer or other confidential information transmitted from your organization should be encrypted. The same precautions apply to any login transactions that transmit credentials across public networks.

9. Perform regular penetration testing. Have a reputable firm test your perimeter defenses regularly.

10. Implement a DDoS network-based service. Work with your carriers to implement the ability to shed false requests and enable you to thwart a DDoS attack.

Advanced Practices:

a. Provide two-factor authentication for customers. Some of your customers’ personal devices are likely to be compromised, so requiring two-factor authentication for access to accounts prevents easy exploitation. Also, notify customers when certain transactions have occurred on their accounts (for example, changes in payment destination, email address, physical address, etc.).

b. Secure all mobile devices. Equip all mobile devices with passcodes, encryption, and wipe clean. Encrypt your USD flash memory devices. On secured internal networks, minimize encryption to enable detection of unauthorized activity as well as diagnosis and resolution of production and performance problems.

c. Further strengthen access controls. Permit certain commands or functions (e.g., superuser) to be executed only from specific network segments (not remotely). Permit contractor network access via a partitioned secure network or secured client device.

d. Secure your sites from inadvertent outside channels.Implement your own secured wireless network, one that can detect unauthorized access, at all corporate sites. Regularly scan for rogue network devices, such as DSL modems set up by employees, that let outgoing traffic bypass your controls.

e. Prevent data from leaving. Continuously monitor for transmission of customer and confidential corporate data, with the automated ability to shut down illicit flows using tools such as NetWitness. Establish permissions whereby sensitive data can be accessed only from certain IP ranges and sent only to another limited set. Continuously monitor traffic destinations in conjunction with a top-tier carrier in order to identify traffic going to fraudulent sites or unfriendly nations.

f. Keep your eyes and ears open. Continually monitor underground forums (“Dark Web”) for mentions of your company’s name and/or your customers’ data for sale. Help your marketing and PR teams by monitoring social networks and other media for corporate mentions, providing a twice-daily report to summarize activity.

g. Raise the bar on suppliers. Audit and assess how your company’s suppliers handle critical corporate data. Don’t hesitate to prune suppliers with inadequate security practices. Be careful about having a fully open door between their networks and yours.

h. Put in place critical transaction process checks. Ensure that crucial transactions (i.e., large transfers) require two personnel to execute, and that regular reporting and management review of such transactions occurs.

i. Establish 7×24 security monitoring. If your firm has a 7×24 production and operations center, you should supplement that team with security operations specialists and capability to monitor security events across your company and take immediate action. And if you are not big enough for a 7×24 capability, then enlist a reputable 3rd party to provide this service for you.

I recommend that you communicate the seriousness of these threats to your senior business management and ensure that you have the investment budget and resources to implement these measures. Understand the measures above will bring you current but you will need to remain vigilant given the arms race underway. Ensure your 2013 budget allows further investment, even if as a placeholder. For those security pros out there, what else would you recommend?

In the next week, I will outline recommendations on cloud which I think could be very helpful given the marketing hype and widely differing services and products now broadcast as ‘cloud’ solutions.

Best, Jim Ditmore

Both Sides of the Staffing Coin: Building a High Performance Team -and- Building a Great IT Career

October 2, 2012 Jim D

I find it remarkable that despite the slow recovery the IT job market remains very tight. This poses significant hurdles for IT managers looking to add talent. In the post below, I cover how to build a great team and team into good seams of talent. I think this will be a significant issue for IT managers for the next three to four years – finding and growing talent to enable them to build high performance teams.

And for IT staffers, I have mapped out seasoned advice on how to build your capabilities and experience to enable you to have a great career in IT. Improving IT staff skills and capabilities is of keen interest not to just the staff, but also to IT management so that their team is much more productive and capable. And on a final note, I would suggest that anyone who is in the IT field should consider reaching out to high schoolers and college students and encourage them to consider a career in IT. Currently, in the US, there are fewer IT graduates each year than IT jobs that open. And this gap is expected to widen in the coming years. So IT will continue to be a good field for employees, and IT leaders will need to encourage others to join in so we can meet the expected staffing needs.

Please do check out both sides of the coin, and I look forward to your perspectives. Note that I did publish variants on these posts in InformationWeek over the past few months.

Best, Jim Ditmore

Building a High Performance Team Despite 4% IT Unemployment

Despite a national unemployment rate of more than 8%, the overall IT unemployment rate is at a much lower 4% or less. Further, the unemployment rates for IT specialties such as networking, IT security or data base are even lower — at 1% or less. This makes finding capable IT staff difficult and is compounded because IT professionals are less likely to take new opportunities (turnover rates are much less than average over the past 10 years). Unfortunately these tough IT staffing conditions are likely to continue and perhaps be exacerbated if the recovery actually picks up pace. With such a tight IT job market, how do you build or sustain your high performance IT team?

I recommend several tactics to incorporate into your current staffing approach that should allow you to improve your current team and acquire the additional talent needed for your business to compete. Let’s focus first on acquiring talent. In a tight market you must always be present to enable you to acquire the talent when they first consider looking for a position. You must move to a ‘persistent’ recruiting mode. If your group is still only opening positions after someone leaves or after a clear funding approval is granted, you are late to the game. Given the extended recruiting times, you will likely not acquire the staff in time to meet your needs. Nor will you consistently be on the market when candidates are seeking employment. Look instead to do ‘pipeline recruiting’. That is, for those common positions that you know you will need over the next 12 months, set up an enduring position, and have your HR team persistently recruit for these ‘pipeline positions’. Good examples would be Java or mobile developers, project managers, network engineers, etc. Constantly recruit, interview and when you find an ‘A’ caliber candidate, hire them — whether you have the exact position open or not. You can be certain that you will need the talent, so hire them and put them on the next appropriate project to be worked on from your demand list. Not only will you now have talent sourced and available when you need it because you are always out in the market, you will develop a reputation as a place where talent is sought and you will have an edge when those ‘A’ players who seldom look for work in the market, decide to seek a new opportunity.

Another key tactic is to extend the pipeline recruiting to interns and graduates. Too many firms only look for experienced candidates and neglect this source. In many companies, graduates can be a key long term source of their best senior engineers. Moreover, they can often contribute much more than most managers give them credit, especially if you have good onboarding programs and robust training and education offerings for your staff. I have seen uplifting results for legacy teams when they have brought on bright, enthusiastic talent and combined it with their experienced engineers — everyone’s performance often lifts. They will bring energy to your shop and we will have the added dividend of increasing the pool of available, experienced talent. And while it will take 7 to 15 years for them to become the senior engineers and leaders of tomorrow, they will be at your company, not at someone else’s (if you don’t start, you will never have them).

The investment in robust training and education for graduates should pay off also for your current staff and potential hires. Your current staff, by leveraging training, can improve their skills and productivity. And for potential hires, an attractive attribute of a new company is a strong training program and focus on staff development. These are wise investments as they will pay back in higher productivity and engagement, and greater retention and attraction of staff. You should couple the training program with clearly defined job positions and career paths. These should spell out for your team what the competencies and capabilities of both their current position as well as what is needed to move to the next step in their career. Their ability to progress with clarity will be a key advantage in your staff’s growth and retention as well as attracting new team members. And in a tight job market, this will let your company stand out in the crowd.

Another tactic to apply is to leverage additional locations to acquire talent. If you limit yourself to one or a few metropolitan areas, you are limiting the potential IT population you are drawing from. Often, you can use additional locations to tap entirely new sources of talent at potentially lower costs than your traditional locations. Given the lower mobility of today’s candidates, it may effective to open a location in the midwest, in rustbelt cities with good universities or cities such as Charlotte or Richmond. Such 2nd tier cities can harbor surprisingly strong IT populations that have lower costs and better retention than 1st tier locations like California or Boston or New York. The same is true of Europe and India. Your costs are likely to be 20 to 40% less than headline locations, with attrition rates perhaps 1/3 less.

And you can go farther afield as well. Nearshore and offshore locations from Ireland to Eastern Europe to India should be considered. Though again, it is worth avoiding the headline locations and going to places like Lithuania or Romania, or 2nd tier cities in India or Poland. You should look to tap the global IT workforce and gain advantage through diverse talent, ability to work longer through a ‘follow the sun’ approach, and optimized costs and capacity. Wherever you go though, you will need to enable an effective distributed workforce. This requires a minimum critical mass in each site, proper allocation of activities in a holistic manner, robust audio and video conferencing capabilities, and effective collaboration and configuration management tools. If done well, a global workforce can deliver more at lower costs and with better skills and time to market. For large companies, such a workforce is really a mandatory requirement to achieve competitive IT capabilities. And to some degree, you could say IT resources are like oil, you go wherever in the world you can to find and acquire them.

Don’t forget to review your recruiting approach as well. Maintain high standards and ensure you select the right candidates through using effective interviewing and evaluation techniques. Apply a metrics-based improvement approach to your recruiting process. What is the candidate yield on each recruiting method? Where are your best candidates coming from? Invest more in recruiting approaches that yield good numbers of strong candidates. One set of observations from many years of analyzing recruiting results: your best source of strong candidates is usually referrals and weak returns typically come from search firms and broad sweep advertising. Building a good reputation in the marketplace to attract strong candidates takes time, persistence, and most important, an engaging and rewarding work environment.

With those investments, you will be able to recruit, build and sustain a high performance team even in the tightest of markets. While I know this is a bit like revealing your favorite fishing spot, what other techniques have you been able to apply successfully?

Best, Jim Ditmore

In the Heat of Battle: Good Guidelines for Production

September 3, 2012 Jim D

If you have been in IT for any stretch, you will have experienced a significant service outage and the blur of pages, conference calls, analysis and actions to recover. Usually such a service incident call occurs at 2 AM, and there is a fog that occurs as a diverse and distributed team tries to sort through a problem and its impacts while seeking to restore service. And often, poor decisions are made or ineffective directions taken in this fog which extend the outage. Further, as part of the confusion, there can be poor communications with your business partners or customers. Even for large companies with a dedicated IT operations team and command center, the wrong actions and decisions can be made in the heat of battle as work is being done to restore service. While you can chalk many of the errors to either inherent engineering optimism or a loss of orientation after working a complex problem for many hours, to achieve outstanding service availability you must enable crisp, precise service restoration when an incident occurs. Such precision and avoidance of mistakes in ‘the heat of battle’ comes from a clear command line and operational approach. This ‘best practice’ clarity includes defined incident roles and operational approach communicated and ready well before such an event. Then everyone operates as a well-coordinated team to restore service as quickly as possible.

We explore these best practice roles and operational approaches in today’s post. These recommended practices have been derived over many years at IT shops that have achieved sustained first quartile production performance*. The first step is to have a production incident management processes which based on an ITIL approach. Some variation and adaption of ITIL is of course appropriate to ensure a best fit for your company and operation but ensure you are leveraging these fundamental industry practices and your team is fully up to speed on them. Further, it is preferable to have a dedicated command center which monitors production and has the resources for managing a significant incident when it occurs.

Assuming those capabilities are in place, there should be clear roles for your technology team in handling a production issue. The incident management roles that should be employed include:

Technical leads — there may be one or more technical leads for an incident depending on the nature of the issue and impact. These leads should have a full understanding of the production environment and be highly capable senior engineers in their specialty. Their role is to diagnose and lead a problem resolution effort in their component area (e.g. storage, network, DBMS, etc). They also must reach out and coordinate with other technical leads to solve those issues that lie between specialties (e.g. DBMS and storage).
Service lead — the service lead is also an experienced engineer or manager and one who understands all systems aspects and delivery requirements of the service that has been impacted. This lead will help direct what restoration efforts are a priority based on their knowledge of what is most important to the business. They would also be familiar with and be able to direct service restoration routines or procedures (e.g. a restart). They also will have full knowledge of the related services and potential downstream impacts that must be considered or addressed. And they will know which business units and contacts must be engaged to enact issue mitigation while the incident is being worked.
Incident lead — the incident lead is a command centre member who is experienced in incident management, has strong command skills, and understands problem diagnosis and resolution. Their general knowledge and experience should extend from the systems monitoring and diagnostics tools available to application and infrastructure components and engineering tools as well as a base understanding of the services IT must deliver for the business. The incident lead will drive all problem resolution actions as needed including
- engaging and directing component and application technical leads and teams and restoration efforts,
- collection and reporting of impact data,
- escalation as required to ensure adequate resources and talent are focused on the issue
Incident coordinator – in addition to the incident lead there should also be an incident coordinator. This command centre member is knowledgeable on the incident management process and procedures and handles key logistics including setting up conference calls, calling or paging resources, drafting and issuing communications, and importantly, managing to the incident clock for both escalation and task progress. The coordinator can be supplemented by additional command centre staff for a given incident particularly if multiple technical resolution calls are spawned by the incident.
Senior IT operations management – for critical issue, it is also appropriate for senior IT operations management to both be present on the technical bridge ensuring proper escalation and response occurs. Further, communications may need to be drafted for senior business personal providing status, impact, and prognosis. If it is a public issue, it may also be necessary to coordinate with corporate public relations and provide information in the issue.
Senior management – As is often the case with a major incident, senior management from all areas of IT and perhaps even the business will look to join the technical call and discussions focused on service restoration and problem resolution. While this should be viewed as natural desire (perhaps similar to slowing and staring at a traffic accident), business and senior management presence can be disruptive and prevent the team from timely resolution. So here is what they are not to do:
- Don’t join the bridge, announce yourself and ask what is going on, this will deflect the team’s attention from the work at hand and waste several minutes bringing you up to speed and extending the problem resolution time (I have seen this happen far too often)
- Don’t look to blame, the team will likely slow or even shut down due to fear of repercussions when honest open dialogue is needed most to understand the problem.
- Don’t jump to conclusions on the problem, the team could be led down the wrong path. Few senior managers have the ability to be up-to-date on the technology and have strong enough problem resolution skills to provide reliable suggestions. If you are one of them, go ahead and enable the team to leverage your experience, but be careful if your track record says otherwise.

Before we get to the guidelines to practice during an incident, I also recommend ensuring your team has the appropriate attitude and understanding at the start of the incident. Far too often, problems start small or the local team thinks they have it well in hand. They then avoid escalating the issue or reporting it as a potential critical issue. Meanwhile critical time is lost, and potentially mistakes made by the local team then compound the issue. By the time escalation to the command centre does occur, the customer impact has become severe and the options to resolve are far more limited. I refer to this as trying to put out the fire with a garden hose. It is important to communicate to the team that it is far better to over-report an issue than report it late. There is no ‘crying wolf’ when it comes to production. The team should first call the fire department (the command center) with a full potential severity alert, and then can go back to putting out the fire with the garden hose. Meanwhile the command center will mobilize all the needed resources to arrive and ensure the fire is put out. If everyone arrives and the fire is put out, all will be happy. And if the fire is raging, you now have the full set of resources to properly overcome the issue.

Now let’s turn our attention to best practice guidelines to leverage during a serious IT incident.

Guidelines in the Heat of Battle:

1. One change at a time (and track all changes)

2. Focus on restoring service first, but list out the root causes as you come across them. Remember most root cause analysis and work comes long after service is restored.

3. Ensure configuration information is documented and maintained through the changes

4. Go back to the last known stable configuration (back out all changes if necessary to get back to the stable configuration). Don’t let engineering ‘optimism’ forward engineer to a new solution unless it is the only option.

5. Establish clear command lines (one for technical, one business interface) and ensure full command center support. It is best for the business not to participate in the technology calls — it is akin to watching sausage get made (no one would eat it if they saw it being made). Your business will feel the same way about technology if they are on the calls.

6. Overwhelm the problem (escalate and bring in the key resources – yours and the vendor’s). Don’t dribble in resources because it is 4 AM in the morning. If you work in IT, and you want to be good, this is part of the job. Get the key resources on the call and ensure you hold the vendor to the same bar as you hold your team.

7. Work in parallel wherever reasonable and possible. This should include spawning parallel activities (and technical bridges) to work multiple reasonable solutions or backups.

8. Follow the clock and use the command center to ensure activities stay on schedule. You must be able to decide when a path is not working and focus resources on better options and the clock is a key component of that decision. And escalation and communication must occur with rigor so maintain confidence and bring necessary resources to bear.

9. Peer plan, review and implement. Everything done in an emergency (here, to restore service and fix a problem) is highly likely to inject further defects into your systems. Too many issues have been complicated by during a change implementation when a typo occurs or the command is executed in the wrong environment. Peer planning, review, and implementation will significantly improve the quality of the changes you implement.

10. Be ready for the worst, have additional options and have a backout plan for the fix. You will save time and be more creative to drive better solutions if you address potential setback proactively rather than waiting for them to happen and then reacting.

Recall that the ITIL incident management objective is to ‘restore normal operations as quickly as possible with the least possible impact on either the business or the user, at a cost-effective price.’ These best practice guidelines will help you drive to a best practice incident management capability.

What would you add or change in the guidelines? How have you been able to excellent service restoration and problem management? I look forward to hearing from you.

P.S. Please note that these best practices have been honed over the years in world class availability shops for major corporations with significant contributions from such colleagues as Gary Greenwald, Cecilia Murphy, Jim Borendame, CHris Gushue, Marty Metzker, Peter Josse, Craig Bright, and Nick Beavis (and others).

Riding with the Technology Peloton

August 14, 2012 Jim D

One of the most important decisions that technology leaders make is when to strike out and leverage new and unique technologies for competitive advantage and when to stay with the rest of the industry and stay on a common technology platform. Nearly every project and component contains a micro decision of the custom versus common path. And while it is often easy to have great confidence in our ability and capacity to build and integrate new technologies, the path of striking out on new technologies ahead of the crowd is often much harder and has less payback than we realize. In fact, I would suggest that the payback is similar to what occurs during cycling’s Tour de France: many, many riders strike out in small groups to beat the majority of cyclists (or peloton), only to be subsequently caught by the peloton but with enormous energy expended, fall further behind the pack.

In the peloton, everyone is doing some of the work. The leaders of the peloton take on the most wind resistance but rotate with others in pack so that work is balanced. In this way the peloton can move as quickly as any cyclist can individually but at 20 or 30% less energy due to much less wind resistance. Thus, with energy conserved, later in the race, the peloton can move much faster than individual cyclists. Similarly, in developing a new technology or advancing an existing technology, with enough industry mass and customers (a peloton), the technology can be advanced as quickly or more than quickly than an individual firm or small group and at much less individual cost. Striking out on your own to develop highly customized capabilities (or in concert with a vendor) could leave you with a high cost capability that provides a brief competitive lead only to be quickly passed up by the technology mainstream or peloton.

If you have ever watched one of the stages of the Tour de France, what can be most thrilling is to see a small breakaway group of riders trying to build or preserve their lead over the peloton. As the race progresses closer to the finish, the peloton relentlessly (usually) reels in and then passes the early leaders because of its far greater efficiency. Of course, those riders who time it correctly and have the capacity and determination to maintain their lead can reap huge time gains to their advantage.

Similarly, I think, in technology and business, you need to choose your breakaways wisely. You must identify where you can reap gains commensurate with the potential costs. For example, breaking away on commodity infrastructure technology is typically not wise. Plowing ahead and being the first to incorporate the latest in infrastructure or cloud or data center technology where there is little competitive advantage is not where you should invest your energy (unless that is your business). Instead, your focus should be on those areas where an early lead can be driven to business advantage and then sustained. Getting closer to your customer, being able to better cross-sell to them, significantly improving cycle time or quality or usability or convenience, or being first to market with a new product — these are all things that will win in the marketplace and customers will value. That is where you should make your breakaway. And when you do look to customize or lead the pack, understand that it will require extra effort and investment and be prepared to make and sustain it.

And while I caution selecting the breakaway course, particular in this technology environment where industry change is on an accelerated cycle already, I also caution against being in the back of the peloton. There, just as in the Tour de France when you are lagging and in the back, it is too easy to be dropped by the group. And once you drop from the peloton, you must now work on your own to work even harder just to get back in with the peloton. Similarly, once an IT shop falls significantly behind the advance of technology, and loses pace with its peers, further consequence incur. It becomes harder to recruit and retain talent because the technology is dated and the reputation is stodgy. Extra engineering and repair work must be done to patch older systems that don’t work well with newer components. And extra investment must be justified with the business to ‘catch’ technology back up. So you must keep the pace with the peloton, and even better be a leader among your peers in technology areas of potential competitive advantage. That way, when you do see a breakaway opportunity for competitive advantage you are positioned to make it.

The number of breakaways you can do of course depends on the size of your shop and the intensity of IT investment in your industry. The larger you are, and the greater the investment, the more breakaways you can afford. But make sure they are truly competitive investments with strong potential to yield benefits. Otherwise you are far better off ensuring you stay at the front of the peloton leveraging best-in-class practices and common but leading technology approaches. Or as an outstanding CEO that I worked for once said ‘There should be no hobbies’. Having a cool lab environment without rigorous business purpose and ongoing returns (plenty of failures are fine as long as there are successful projects as well) is a breakaway with no purpose.

I am sure there are some experienced cyclists among our readers — how does this resonate? What ‘breakaways’ worked for you or your company? Which ones got reeled in by the industry peloton?

I look forward to hearing from you.

Best, Jim Ditmore

Recipes for IT

Best practices for achieving high performance IT

Category: Best Practices

Beyond Big Data

Massive Mobile Shifts and Keeping Score

Cloud Trends: Turning the Tide on Data Centers

A Cloudy Future: SaaS and Balkanization of the Data Center

A Cloudy Future: The Rise of Appliances and SaaS

IT Security in the Headlines – Again

Both Sides of the Staffing Coin: Building a High Performance Team -and- Building a Great IT Career

In the Heat of Battle: Good Guidelines for Production

Riding with the Technology Peloton