Hugging Face and DORA Metrics: Fast Code, Slow Response

Dhruv Agarwal

·Sep 25, 2024·

5 min read

Cover Image for Hugging Face and DORA Metrics: Fast Code, Slow Response

1. Thesis: Shackled Beast
- The Cycle:
2. Strengths: Riding the AI Wave
3. Using Strengths to Overcome Weaknesses
Fun Fact
Conclusion: Fast Roadmap, Slower Execution

Reviewer Dependency Graph for Hugging Face

Guess this constellation?

That’s called “Hugging Face’s Transformers” 🤗 (made via the reviewer dependency graph for Hugging Face’s Transformers)

Amid the whirlwind of AI advancements, Hugging Face has emerged as the backbone of innovation—much like how GitHub revolutionized code. It’s difficult to envision the current pace of AI development without Hugging Face’s contributions, especially its Transformers repository.

Let’s take a dive into the pace of development at Hugging Face’s Transformers repo by applying Dora Metrics using Middleware Open Source. We’ll cover three key aspects—no more, no less:

Thesis: More waiting, less building (with long delays and slow recovery times).
Strengths: Highlighting the rapid roadmap-building process and extensive contributions.
Leveraging Strengths: Using what works well to tackle what doesn’t.

We’ll explore how Hugging Face’s fast-moving development is being held back by prolonged response times, extended rework cycles, and slow recovery, even after approvals are secured.

1. Thesis: Shackled Beast

While Hugging Face powers through quick iterations, it finds itself "shackled" by delays in response time, rework, and post-approval wait times. The numbers tell the story:

June 2024: Deployment frequency hit 201 releases.
July-September 2024: Deployment dropped slightly, but still maintained a robust 170-188 releases per month.

However, the team’s growing workload is evident in longer lead times and rising rework:

Lead Time: 8 days in June stretched to nearly 12 days by September.
Merge Time: Grew from 2.9 to 4.7 days over the same period.
Rework: Jumped from 2.3 to 3.7 days.

These delays indicate that while the team is highly productive, much of their effort is spent waiting—for first responses, reviews, and rework to be completed.

The Cycle:

A contributor submits a PR (pull request), but it can take days to get a first response. Then comes the rework cycle—further extending the lead time. Even after approval, the code waits in limbo before deployment.

This pattern not only affects development velocity but also hampers the team’s ability to respond quickly to incidents. As the team grows busier, recovery times from incidents have hovered around 4 days, keeping HF in the less desirable category of the 2023 State of DevOps Report for recovery metrics.

2. Strengths: Riding the AI Wave

Despite these delays, the Transformers repository is a powerhouse of innovation, driven by a thriving AI community and an aggressive roadmap. A few highlights:

Over 54 contributors in recent months, pushing forward new features, bug fixes, and documentation at a rapid pace.
A consistent focus on CI/CD improvements ensures that when deployments do happen, they are seamless.
AI advancements keep HF at the cutting edge, adding substantial new features like support for Bloom models and Llama3.

The list of contributors and the richness of features being added speak to the vibrancy of HF’s ecosystem. Frequent PR merges, robust testing pipelines, and collaborative reviews keep the deployment engine running, even if there are slowdowns along the way.

3. Using Strengths to Overcome Weaknesses

How can Hugging Face overcome these bottlenecks?

1. More Reviewers, Less Waiting

If you refer to the reviewer dependency above (generated using MiddlewareHQ) - Right now, only three maintainers bear the brunt of reviewing hundreds of PRs, which inevitably leads to delays. Spreading the load by training more frequent contributors as reviewers could ease this bottleneck and improve response times.

While it’s tough for open source organisations, a commercial tech team can start leveraging small tricks for quicker reviews while not disrupting the reviewer’s schedules:

Block a fixed time slot everyday for reviews with the key reviewers.
Start encouraging other key contributors to review. Doing this might require to pen down the coding guidelines which could contain generic, language specific and business specific context to take care of.
Adding a PR checklist might help reduce the back-n-forth during reviews and make the PR more review ready for the reviewer. You can get started with a PR checklist as simple as this one

2. Consensus Before Code

One major source of rework stems from differing ideas on the approach to a problem. Encouraging contributors to discuss solutions before diving into code could avoid a lot of back-and-forth revisions later. One can use this guide to encourage a holistic technical plan before starting to code.

3. Faster Iteration, Faster Recovery

As deployment speed increases, the team’s ability to respond to incidents will naturally improve. Shorter lead times will mean quicker iterations and ultimately faster recovery times, getting Hugging Face out of the slow recovery zone. Making it a +ve spiral loop of productivity!

Fun Fact

Did you know? Hugging Face’s README lists 100 different projects that use Transformers, a milestone similar to how we plan to reach 100 case studies on open-source projects using DORA metrics.

Conclusion: Fast Roadmap, Slower Execution

Hugging Face excels at shipping features fast, with a high deployment frequency month over month. However, challenges like rework, delayed reviews, and slow incident recovery times are pulling the shackled beast back. By leveraging its strengths—more reviewers, better pre-code consensus, and faster iterations—Hugging Face can continue setting the pace for AI development while reducing operational drag.

If you’re in a similar situation as the HF’s transformer, we’d really encourage you to give a shot to DORA metrics using Middleware Open Source. You could follow this guide to analyse your team or write to our team at productivity[at]middlewarehq.com with your questions and we’ll be happy to generate a suggestion study for your repo - pro bono 😊

We at Middleware really enjoy sharing our insights about productivity from the best teams and would feel really encouraged if you could share with your community/friends as well 🙏🏼