Metrics - Measuring Continuous Delivery
This post was originally written for TechTownTraining blog.
This is Part 4 in the series on Continuous Delivery. In previous parts, we introduced to you the concept of continuous delivery, how you can prepare your organization for adopting CD and also defining your product architecture to easily support a Continuous Delivery development model.
In this article, we’re going to discuss about Continuous Delivery metrics. With proper measurement of various metrics during the process of adopting Continuous Delivery, you can correct your course of make necessary adjustments to your plans at any stage.
These metrics can help you understand and monitor how your team is performing over time during your journey of adopting continuous delivery model.
Before you begin…
Continuous delivery means different things to different people depending on the roles of the people in the organization. For higher management, it might be a cultural change, and for developers it might be a new way of developing and delivering software. For customers you are serving, it might revolutionize the way they are doing business.
Depending on how you define these goals, some of these metrics may be of different importance levels to your team. Hence, the first step is to identify the expectations that you have on the continuous delivery practices.
Before you figure out what metrics to track, you need to identify what challenges and problems you are trying to solve - Not deploying often enough, too many defects leaking into production, low turnaround time for implementing new features etc. There are many more types of metrics you can measure and we’ll have a run down of each of them in this article.
Identifying the correct problems to solve will result in definition of the appropriate metrics that you want to measure.
If you are using an Agile methodology, you are probably aware of what velocity means.
Velocity is a metric for ‘work done’. Measuring velocity is sometimes called velocity tracking. The velocity metric is used for planning sprints and measuring team performance.
You want to ship code as fast and often as possible. Velocity tracks how fast you can do this. Of Course, it will vary from team to team, but what really matters is how each of these teams are performing over a period of time.
At the end of the day, if you are shipping software to the customers frequently, but with low quality, you would be dousing production fires all the time!
You should measure how you are doing on quality. There are tools available for this supporting most programming languages. They are also static code analysis tools that you can make use of to catch issues before deploying into production. The measure on quality is also important if you don’t really care how fast you ship, but care about quality.
Measuring how often you do deployments into production would likely give you good benefits. If the frequency of deployments is too low, it could indicate that you are filling in each deployment cycle with too many code changes and features. Ultimately, the goal is to do smaller deployments as frequently as possible.
Lead time is the the time between the initiation and completion of a production process. It is one of the parameters that has its roots in Lean Manufacturing.
For software development, lead time as the amount of time that occurs between starting on a work item until it is deployed on the customer’s production system. Based on the velocity that we discussed earlier, this gives an indication of how long would it take on average until the work gets to production.
The number of defects are a good indicator of application quality. Ideally, you would want most of the bugs to be identified during the development phase or any other testing phase within the organization. The worst possible place where the bug is identified is in the production environment with the customer.
Also, most organizations ‘just fix’ the bugs when they happen. However, I would encourage to perform a root cause analysis and also put in automated steps (like additional unit tests) so that they are not repeated again.
Also, a measure of Defect Escape Rate i.e. how many defects are being found in production versus internal testing is of great importance.
As we discussed in previous articles, automated tests are the foundation of a well built CD practice. It is highly recommended that your team makes extensive usage of unit and integration tests, of course all automated.
Since we rely heavily on automation testing, tracking how well your automated tests work, what percentage of tests fail on each commit, how many of these failures are severe etc are a good set of metrics.
Most customers now expect software to be running flawlessly 24/7.
But is your software actually capable of doing that? The last thing you want is for your application to be down at critical phases for your customer. Tracking all outages, both planned and unplanned would give an indication of where things are going wrong.
Service level agreement (SLA) of your software and requirements have to be continuously monitored and perfected.
Tracking errors and exceptions in your application, even if they are not visible to the customer and end users is critical. They might indicate ongoing issues that could blow up anytime.
Identifying exceptions being thrown in your code , capturing timings of various important aspects such as database connections, remote invocations, network latency etc indicate areas of your software that need improvement.
Because production systems usually deal with high data load, it is important you have a mechanism in place to keep a check on the errors and look for spikes.
Additionally, after a deployment, you would want to monitor the traffic on the system as well. If you suddenly have no traffic or a giant spike in traffic, something could be wrong. Example, a DoS attack.
If you are using a microservices based architecture, you should use an aggregator service to collect logs and other monitoring information as it is hard, if not almost impossible to keep an eye on each service separately.
Time to Recovery/Resolution
Once you have identified issues from the production environment, this metric helps you track how long it takes to resolve or recover from the issue. This could either be number of days (most commonly seen in organizations) or number of hours (Well done, that’s excellent!).
If your system supports automatic recovery from failures, this time could indicate ‘Time to Recovery’ instead of ‘Time to Resolution’. It is typically measured in hours.
Over and beyond the metrics listed above, there could be other metrics you can track that are specific to your application and organization.
They need not be related to continuous delivery, but they’ll be critical for monitoring the usage of your applications in the production environment.
With this, we come to a conclusion on the series on Continuous Delivery. Remember that adoption of Continuous Delivery does not have a fixed destination, but is rather a journey where there will be a number of challenges and ache! Once these are overcome, the resulting product and the value brought to your organization would be priceless!