What Are DORA Metrics & How They Can Help Your Software Delivery Process
9 min read
Software engineering is hard. Building a world class software engineering org is even harder.
You need to deliver high quality products and on time, every time. Without a structure the delivery process would crumble before it is even set up let alone ship great products consistently.
This is where frameworks like DORA Metrics come into play.
Developed by the DevOps Research and Assessment (DORA) team at Google, these metrics provide a data-driven approach to gauging and refining your software delivery process.
And no my friend, this is not just about crunching numbers — you’re actually taking steps to transform your development pipeline into a seamless & efficient powerhouse.
What Are DORA Metrics?
Think of DORA metrics as the 4 pillars of the proverbial building that’s your software process. They can become a strong foundation if paid attention to and constructed well.
These are team level metrics that help you with a great idea of the general health of your software delivery process.
However, these 4 metrics are not the complete picture. Building a great software engineering team means you need to also put hardcore effort into avoiding developer burnout, build a strong and diverse team and most importantly focus on the overall developer happiness!
Also worth mentioning here is that the base metrics are going to be different for different teams and products.
A product with 10 Million MAU will not have the same metrics/DORA goals as a product which just launched a week ago.
Alright, let’s look into the 4 core pillars and how we can leverage them to improve the delivery process.
These pillars offer a 360 view of how well your org is shipping software, focusing on four crucial areas:
Deployment Frequency: How often your team successfully gets code into production. Think of it as your deployment heartbeat.
Lead Time for Changes: The interval from a developer's code commit to its deployment. It's your sprint time from idea to implementation.
Change Failure Rate: The proportion of releases that result in failures in production. This metric is your quality gatekeeper.
Mean Time to Restore (MTTR): The average time to recover from a production failure. It’s your measure of resilience and response efficiency.
Tracking these metrics allows your team to diagnose issues, streamline processes, and make data-driven decisions to continuously improve your software delivery.
Deployment Frequency
Deployment Frequency tracks how often your team pushes code to production.
This metric is vital for gauging the agility and efficiency of your development process.
A high deployment frequency signals that your team can rapidly deliver new features, enhancements, and bug fixes, ensuring your product stays competitive and quickly adapts to user feedback.
Examples and Implementation
Example 1: Continuous Integration and Deployment (CI/CD) Pipeline
Implementing a robust CI/CD pipeline is one of the most effective ways to increase deployment frequency.
Here’s a simple example using GitHub Actions:
name: CI/CD Pipeline
on:
push:
branches:
- main
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v2
- name: Set up Node.js
uses: actions/setup-node@v2
with:
node-version: '20'
- name: Install dependencies
run: npm install
- name: Run tests
run: npm test
- name: Deploy to production
if: success()
run: npm run deploy
This pipeline automatically runs tests and deploys code to production every time a new commit is pushed to the main branch, ensuring rapid and consistent deployments.
Example 2: Feature Flags
Using feature flags allows you to deploy code frequently without exposing incomplete features to end-users.
Here’s a simple implementation using Unleash in a Node.js application:
const { initialize } = require('unleash-client');
const unleash = initialize({
url: 'https://app.unleash-hosted.com/demo/api/',
appName: 'my-node-app',
instanceId: 'my-instance-id',
customHeaders: {
Authorization: 'YOUR_API_KEY'
}
});
unleash.on('error', console.error);
unleash.on('ready', () => {
const isEnabled = unleash.isEnabled('new-feature-flag', { userId: 'user-key' });
if (isEnabled) {
console.log('New feature is enabled!');
} else {
console.log('New feature is disabled.');
}
});
This example checks whether the 'new-feature-flag' is enabled for a specific user and logs the result.
Lead Time for Changes
Lead Time for Changes measures the time it takes for a code commit to reach production deployment.
This metric is all about the efficiency of your development pipeline. Short lead times indicate a streamlined workflow with minimal bottlenecks, ensuring faster delivery of updates and new features.
Examples and Implementation
Example 1: Automated Testing
Automated testing can significantly reduce lead time by catching bugs early in the development cycle.
Here’s an example using Jest in a React application:
// App.test.js
import React from 'react';
import { render, screen, fireEvent } from '@testing-library/react';
import '@testing-library/jest-dom/extend-expect';
import App from './App';
// Test to check if the "Learn React" link is rendered
test('renders learn react link', () => {
render(<App />);
const linkElement = screen.getByText(/learn react/i);
expect(linkElement).toBeInTheDocument();
});
// Test to check if the welcome message is rendered
test('renders welcome message', () => {
render(<App />);
const welcomeMessage = screen.getByText(/Welcome to the React application!/i);
expect(welcomeMessage).toBeInTheDocument();
});
Explanation:
Basic Render Test: Ensures that the "Learn React" link is rendered correctly in the DOM.
Welcome Message Test: Verifies that the welcome message is displayed as expected.
By using automated tests like these, you can catch issues early in the development cycle, reducing the lead time for changes and ensuring a smooth deployment process.
These tests can be run automatically on each commit, providing immediate feedback and maintaining a high-quality codebase.
Example 2: Code Review Automation
Automating code reviews can also expedite the lead time for changes.
Here is a GitHub Actions example for integrating SonarQube into your CI/CD pipeline for automated code review.
I’m assuming that you have a SonarQube server set up and the token stored in GitHub Secrets.
name: Code Review
on:
pull_request:
types: [opened, synchronize]
jobs:
sonarqube:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v2
- name: Set up JDK 11
uses: actions/setup-java@v1
with:
java-version: '11'
- name: Cache SonarQube dependencies
uses: actions/cache@v2
with:
path: ~/.sonar/cache
key: ${{ runner.os }}-sonar
- name: Install SonarQube Scanner
run: |
wget https://binaries.sonarsource.com/Distribution/sonar-scanner-cli/sonar-scanner-cli-4.6.2.2472-linux.zip
unzip sonar-scanner-cli-4.6.2.2472-linux.zip -d $HOME/
export PATH=$HOME/sonar-scanner-4.6.2.2472-linux/bin:$PATH
- name: Run SonarQube analysis
run: |
sonar-scanner \
-Dsonar.projectKey=YOUR_PROJECT_KEY \
-Dsonar.sources=. \
-Dsonar.host.url=https://your-sonarqube-server.com \
-Dsonar.login=${{ secrets.SONARQUBE_TOKEN }}
Change Failure Rate
Change Failure Rate measures the percentage of deployments that result in production failures.
CFR evaluates the stability and reliability of your software.
A high change failure rate indicates potential weaknesses in your testing and deployment processes, showing us areas that need improvement to ensure smoother releases.
Examples and Implementation
Example 1: Enhanced Testing
Incorporating comprehensive testing strategies, including unit tests, integration tests, and end-to-end tests, can help reduce the change failure rate.
Here’s an example of a Cypress end-to-end test:
describe('User Login Flow', () => {
beforeEach(() => {
// Visit the login page before each test
cy.visit('https://yourapp.com/login');
});
it('should display the login form', () => {
// Check if the login form is visible
cy.get('form#login').should('be.visible');
});
it('should show an error message for invalid credentials', () => {
// Enter invalid credentials and submit the form
cy.get('input[name="email"]').type('invalid@example.com');
cy.get('input[name="password"]').type('wrongpassword');
cy.get('button[type="submit"]').click();
// Check for error message
cy.get('.error-message').should('contain', 'Invalid email or password');
});
});
This example test covers various scenarios, including:
Form Visibility: Ensuring the login form is visible when the page loads.
Invalid Credentials: Checking that the application correctly handles invalid login attempts.
Example 2: Canary Releases
Canary releases are a deployment strategy that allows you to roll out new features or updates to a small subset of users before a full-scale release.
This helps in identifying and mitigating potential issues early on, reducing the risk of wider failures.
By gradually increasing the traffic to the canary deployment, you can monitor the performance and stability under real-world conditions.
If any issues arise, you can quickly revert to the stable version with minimal impact on users.
This helps ensure that new features and updates are thoroughly tested in production before a full rollout, reducing the risk of failures and improving the reliability of your software.
Mean Time to Restore (MTTR)
Mean Time to Restore (MTTR) gives you the average duration required to recover from a production failure.
This number serves as a key indicator of your system's resilience and the efficiency of your incident response processes.
A lower MTTR shows your team's ability to swiftly diagnose and resolve problems, reducing downtime and minimizing disruption for users.
Examples and Implementation
Example 1: Comprehensive Monitoring and Alerting
Implementing robust monitoring and alerting systems using tools like Prometheus and Grafana ensures that you can detect and respond to incidents promptly:
# Prometheus configuration for monitoring
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets: ['localhost:9093']
rule_files:
- "alert.rules.yml"
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node_exporter'
static_configs:
- targets: ['localhost:9100']
# Example Prometheus alert rule configuration
groups:
- name: example_alert_rules
rules:
- alert: HighCPUUsage
expr: node_cpu_seconds_total{mode="idle"} < 20
for: 2m
labels:
severity: warning
annotations:
summary: "High CPU Usage Detected"
description: "CPU usage has been above 80% for more than 2 minutes."
- alert: InstanceDown
expr: up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Instance Down"
description: "An instance is down for more than 1 minute."
In this example, Prometheus is configured to scrape metrics from both itself and a node_exporter instance every 15 seconds.
Alert rules are defined in a separate file (alert.rules.yml) and include alerts for high CPU usage and instances going down.
These alerts will help your team quickly detect and address issues, helping improve MTTR.
Example 2: Incident Response Automation
Automating incident response with tools like PagerDuty can help even more:
{
"incident": {
"type": "incident",
"title": "Database connectivity issue",
"service": {
"id": "PABC123",
"type": "service_reference"
},
"priority": {
"id": "P1",
"type": "priority_reference"
},
"urgency": "high",
"body": {
"type": "incident_body",
"details": "Database connection timeout observed on production environment. Initial investigation suggests a high number of concurrent connections."
},
"escalation_policy": {
"id": "PEZ0X1",
"type": "escalation_policy_reference"
}
}
}
Integrating DORA Metrics to Your Process in 5 Minutes
To leverage DORA metrics effectively, integrate them into your existing development and operations workflows.
Middleware Open-Source gives you a seamless way to track these metrics in real-time, offering actionable insights to drive your development efforts.
By integrating the dashboard with your CI/CD pipeline, you can visualize trends and identify areas for improvement.
This continuous feedback loop helps in making informed decisions, optimizing processes, and enhancing overall software delivery performance.
Installing Middleware
Ensure that you have docker installed and running. For details check out our readme in the repo.
Open the terminal and run the following command:
docker volume create middleware_postgres_data docker volume create middleware_keys
docker run --name middleware
-p 3333:3333
-v middleware_postgres_data:/var/lib/postgresql/data
-v middleware_keys:/app/keys
-d middlewareeng/middleware:latest
docker logs -f middleware
Wait for sometime for the services to be up.
The app shall be available on your host at http://localhost:3333.
3 Quick Practical Tips
Automate Everything: From testing to deployment, automation reduces human error, accelerates processes, and of course, enables you to measure DORA metrics in your org..
Continuous Monitoring: Implement robust monitoring to quickly detect and resolve issues.
Regular Reviews: Conduct regular reviews of your metrics to understand trends and areas needing improvement. For an idea of why reviews are crucial for your own, check out this post: <embed the senior engineer reviews blog>
TL;DR
DORA Metrics give you a sharp, quantifiable view into your software delivery process.
By zeroing in on Deployment Frequency, Lead Time for Changes, Change Failure Rate, and MTTR, you can pinpoint bottlenecks, boost collaboration, and speed up your release cycles.
Integrate these metrics into your workflow with tools like Middleware Open Source to get real-time insights and drive continuous improvement in your development practices.
All this effort helps keep your org ahead & helps the engineering leaders balance tech and strategic business goals well.
That’s all for today folks!