Production Ready is an important extension of your change management process.
Let us presume that you have already defined a solid change process based on ITIL. You have clear change review teams, proper change windows and moved to deliveries through releases. Let’s also assume you are using the change data to identify which groups are late in their preparation for changes, or where change defects are clustered around and why. These understandings are regularly used to improve and streamline the change processes. You use the failed change data as your gold mine of information to understand which groups need to improve and where they should improve. Like all important date, you are transparent with it, publishing the results by team and by root cause clusters. As an IT leader, you make the necessary investments and align efforts to correct the identified deficiencies and avoid future outages.
And you are ready to extend the change process by introducing production ready. Production Ready is when a system or major update can be introduced into production because it is ready on all the key performance aspects: security, recoverability, reliability, maintainability, usability, and operability. In our typical rush to deliver key features or products, the sustainability of the system is often neglected or omitted. The Production Ready process minimizes poor delivery in these areas.
Production Ready is a set of criteria defined by criticality of system that are input into the requirements and design process, tested against during build, and then verified by Operations as part of the change approval process. Critical systems that must operate at high availability are held to the highest criteria, less critical systems have lower levels of criteria. The criteria can be adjusted based on organizational needs but in essence are consolidated design points for normal non-functional requirements that ensure security, recoverability, maintainability, etc. Further, consolidating such production criteria into one place as opposed to having different approval steps for security or architecture or other teams eases the burden for the application teams. Importantly, the Production Ready practice is envisioned to leverage templates and tools as opposed to a checklist or review approach. This enables re-use and rapid production of real and usable artifacts by the application team. It also enables Operations to obtain a consistently engineered set of deliverables that are much more usable.
Let’s cover the three parts of Production Ready:
- Criteria
- Production Ready Steps and Roles
- Tools and Templates
The Production Ready criteria should cover all of the core non-functional requirements of the enterprise. Normally, this should include:
- Security
- Recoverability
- Availability
- Maintainability
- Operability
- Usability
These non-functional requirements differ of course by the criticality of the system. Generally, a system that must have high availability, must also be highly recoverable, secure, operable, etc. In some instances, systems that have highly confidential data (e.g. a reporting system with customer confidential information) may not require stringent availability or recoverability, but these are the exception not the general rule.
Usually, when classifying, organizations end up with 5 classes of systems:
- Critical – requires highest levels of availability and recoverability, etc
- Important – require robust levels of availability and recoverability, etc
- Standard – require typical level of availability and recoverability, etc
- Departmental – require minimal availability and recoverability
- Confidential but not critical – require robust security but typical levels of availability and recoverability
Your production ready criteria should match the system classes you establish. For example, the Production Ready template for Critical Systems would include:
- Systems architecture must use one of the approved high availability design templates
- Systems placement must be in the high availability data centers and server farms using the highly available network design
- Security design must be developed in concert with security engineering and use approved methods
- Security penetration testing and access management review must be conducted prior to production
- Backup and recovery design must use the approved critical recovery design and the designated storage pools
- Proper testing of backups and data recovery verification must be done in concert with Operations.
- System design and system documentation should be to highest standard and placed in the operational documentation database using the available template
- A design workshop should be held with Operations level 2 personnel prior to production
- Customer and usability documentation should be placed in the Service Desk Knowledge Base prior to production along with a workshop explaining the usage of the system and likely questions
- The asset management database must be updated for all changes including relationship links at the time of production
- Monitoring and instrumentation must be completed for the entire channel along with workshops with Operations to ensure proper labeling, alerting, and SLAs are established for system performance
- Recoverability documentation must use the available templates and be place in the Operations database
- Recoverability must be tested jointly with Operations and any issues or changes noted and addressed
- Batch job streams must be fully documented according to the available template and automated where possible with the approved toolset. A handover workshop with Operations must also occur.
Less critical systems would have less stringent criteria to be applied.
Production Ready must be applied early in the design process in a collaborative manner so that it does not end up as an insurmountable wall for the application team at release time. It is far better, if the application team is delivering a new critical channel or major upgrade to one, that all teams – Security, Operations, Architecture, Infrastructure – assist them to build a solid, high performing system. This is done by having well-understood design templates that meet the requirements. For example, there are only a few suitable patterns to use for handling a high volume of transaction from the internet in a highly available and recoverable manner. These patterns should be well documented by the architecture team and take into account security, data center, and other infrastructure requirements. The application team can then use this pattern (or its next release) as the foundation for their design. It is critical that this collaboration occur early so that money is not wasted on unviable solutions and then IT is unable to effectively deliver for the business. So for each criteria, there should be a template and representative examples of good implementations. If these are not available, then Operations should sponsor getting them built by their respective owning organizations (e.g., Security or Architecture).
As the systems progress through the build stages, it is expected that the owning organizations will conduct appropriate inspections on early deliverables meeting the criteria. These are done with the application teams with the aim to identify defects as early as possible. Once the systems has been built and tested, as part of the change process for major changes, a production ready review is conducted jointly by Operations with the application team. This will identify any last remaining items and ensure a smooth entry into production.
By leveraging Production Ready for your important systems changes, you will be able to address existing operational technical debt (e.g., poor documentation for major system channels) as part of the normal lifecycle of change and do it in a smooth and proactive manner. Further, you will bring greater awareness and appreciation of non-functional requirements to your application developers resulting in better systems designs and implementations.
What adjustments or extensions would you make to Production Ready?
Best, Jim Ditmore