The Azure Well-Architected Framework is a set of guidelines spanning five key pillars that can be used to optimise your workloads. In the previous blogs we covered Reliability, Security and Cost Optimisation alongside relevant services, processes and assessments. This time we’ll focus on the Operational Excellence pillar of the framework.
The services and technologies you use in the cloud differ hugely compared to those on-premises. But, what doesn’t differ is the requirement that all deployments and environments are reliable and predictable. Operational excellence is the forth pillar of the Well-Architected framework that covers the operational processes you require to ensure applications continue to operate.
The key processes that fall within operational excellence are Workload Automation, Workload Release, Monitoring and Testing. The end goal is to achieve superior operational practices.
Similar to the previous Security and Cost Optimisation pillars, Operational Excellence must be thought about throughout the lifecycle of a workload, including design and architecture phases, but especially once the workload is running. The management of a service and the related processes should not be retrofitted to environments or services, you must think about these areas early on as it will reduce management overhead in the long term.
A Well-Architected workload viewed through the lens of Operational Excellence is a workload this is released in an automated manner, monitored and tested in an efficient way to ensure the application provides value not just to your customers, but to your internal development and operations teams.
Specific to Operational Excellence, at a high-level you should be thinking about the following areas and processes:
When designing for Operational Excellence in Azure, there are a set of principals covered in the Framework that you must think about, those principles include:
Some of the best tips or recommendations for operational excellence are as follows:
Azure policy is a free Azure service that allows you to enforce resource-level rules across your Azure estate that can assist in the adoption on operational best practices. Azure Policy is also a great tool for configuration drift management and monitoring. For example, Azure Policy can ensure all workloads adhere to a specific set of security rules such as HTTPS usage or TLS.
Azure Advisor is a fantastic resource that provides a set of Azure Policy recommendations that, in turn, can be used to identify opportunities to implement best practices across your workloads.
Use the DevOps checklist to review your design and management from a DevOps Standpoint. The checklist covers culture, development, testing, release, monitoring and management. The checklist can be found here
Strangler Fig is a cloud design pattern that covers incrementally migrating a legacy system by gradually replacing specific pieces of functionality with new apps or services. Eventually, the older system is ‘strangled’ by the new system and eventually it takes over.
Take time to understand and plan your operating model and internal teams. For example, managing loosely coupled architecture requires procedural decoupling as teams shouldn’t have to depend on partner teams to support, approve or operate their workloads.
We will continue to cover the remaining pillars throughout this series of blogs. As highlighted on previous posts, you can review your current posture against the five well-architected pillars. The tool is free and can be accessed here.
For a more in-depth Architecture Review or a specific Operational Excellence Review feel free to reach out to our Azure Cloud Experts.