QE Score: A Powerful Tool or Just Another Metric Trap?

The QE Score, presented as a software quality maturity indicator, aims to standardize and objectify the assessment of Quality Engineering practices. Integrated into the DevOps tool ecosystem (CI/CD, SonarQube, Jira…), it is intended as a catalyst for best practices. But like any indicator, it has its strengths… and its weaknesses.

Here’s an overview of the potential limitations and contradictions of the QE Score, to encourage a thoughtful, strategic, and above all, human-centered use.

starterkit pour mise en place d'un quality engineering score

📏 1. The Illusion of Objectivity

The QE Score presents itself as an “objective” indicator of software quality, relying on measurable data from standard tools like SonarQube, Jira, or CI pipelines. This promise of objectivity is appealing—especially in an environment where quality is often seen as vague or subjective.

But this objectivity is, in fact, relative. The data it relies on is shaped by human decisions: how quality rules are configured in SonarQube, how bugs are categorized in Jira, how tests are structured in the pipelines… Two teams using the same tools can achieve very different scores—not because one is “better,” but because they have different levels of maturity in using these tools or different priorities.

The QE Score gives an impression of neutrality and rigor, but it stands on an interpretive, contextual, and sometimes arbitrary foundation. It should therefore be approached with caution, enriched with qualitative insights, and never treated as absolute truth.

🛠️ 2. A Tool-Centric Bias

The QE Score relies heavily on data analysis from technical tools: SonarQube for code quality, Jira for issue tracking, and CI/CD pipelines for test and deployment automation. This approach offers clear advantages in terms of standardization and traceability, but it introduces a fundamental bias: it only measures the visible, instrumentable, and automatable aspects of quality.

Quality Engineering – QE Score – A Tool-Centric Bias: Only the visible, measurable, and automatable dimensions of quality are captured.

This bias can create a tunnel vision effect, where what can be measured is prioritized, to the detriment of other equally critical aspects that are harder to quantify—such as user experience, functional clarity, long-term maintainability, documentation, or the quality of communication between teams.

In short: what the QE Score doesn’t measure… doesn’t exist (at the risk of ignoring it in the quality strategy).

The risk: directing efforts towards what’s measurable, rather than what’s truly important for the product’s success.

The QE Score remains a powerful tool for objectifying certain technical aspects, but it must be accompanied by a critical, multidimensional perspective, particularly by integrating human feedback and qualitative analysis.

⚖️ 3. Comparing the Incomparable?

The QE Score aims to be a benchmarking tool between projects, products, or teams. But can we really compare a microservices scale-up, a legacy banking application, and an exploratory R&D product using the same evaluation criteria?

Two major issues arise:

• Contexts vary drastically.

Some teams have advanced automation capabilities, others do not. Some work on APIs, others on systems with no exposed interfaces. Some can easily instrument their code, while others operate in highly constrained environments (legacy, security, hardware, etc.).

• Not all criteria apply.

For instance, a product without a public API logically cannot be evaluated on API testing. Similarly, if the source code is not auditable by a tool like SonarQube—either due to its compiled nature or unsupported technology—then an entire dimension of the score, related to code quality, will be missing or underestimated.

The standardization of the score can create an illusion of fairness, while masking the real diversity of contexts, constraints, and objectives.

🧼 4. Score-Washing and Goodhart's Law

"When a measure becomes a target, it ceases to be a good measure." — Goodhart's Law

Quality Engineering - QE Score - Score-Washing and Goodhart's Law

Under constant scrutiny, display, and praise, the QE Score can end up becoming an end in itself, rather than a tool for continuous improvement. This is the well-known phenomenon of score-washing: optimizing the score… without necessarily improving the real quality of the product.

Concrete example: A team, under pressure to increase its QE Score, decides to massively add automated tests to boost its coverage rate. However, these tests are superficial, fail to cover critical cases, and are rarely maintained. The score goes up, but the real value of these tests is low. The illusion of quality is present, but the software risk remains unchanged.

Another common scenario: bugs are quickly closed in Jira to prevent them from lowering the score, sometimes without proper resolution or by classifying them as "non-reproducible."

The danger here is twofold: on one hand, decisions are made based on a distorted indicator, and on the other, a "cosmetic" performance culture is created, which can undermine the deep quality of the product.

This is what we call score-washing: masking deep issues behind flattering indicators.

👤 5. The End User: The Great Absent

The QE Score evaluates software quality through technical indicators: test coverage, open bugs, code quality, pipeline stability… But it overlooks a fundamental element: the end-user experience. None of the criteria in the score measure customer satisfaction, real product adoption, or churn rate.

Quality Engineering - QE Score - The End User: The Great Absent

In other words, a product can show a high technical score—flawless automated tests, well-structured code, and no known bugs—and yet still fail once it’s in production. Why? Because it might be difficult to use, unintuitive, or simply irrelevant to the actual expectations of users.

Let’s take a concrete example: an expense management app. From a technical quality standpoint, it’s exemplary: complete automated tests, zero critical bugs, flawless performance. Yet, end users find it cumbersome: too many steps, unclear interface, and a frustrating mobile experience. The result: adoption stagnates, field teams prefer to continue using Excel files, and the product is rarely used despite its high QE Score.

This is the limitation of a purely technical indicator: it says nothing about perceived usefulness, ease of use, or user satisfaction. After all, what is a quality product if not one that is useful, reliable, and appreciated?

🤖 6. Automation Doesn’t Tell the Whole Story

The QE Score relies heavily on automation: tests, pipelines, static analysis… This allows for the generation of regular and reliable indicators, but it only covers a technically visible portion of quality.

Crucial aspects are left out of this logic: code readability, maintainability, documentation, the real relevance of tests, and architectural scalability. These elements are difficult to capture automatically, yet they have a direct impact on perceived quality and long-term productivity.

Example: An application may have an excellent QE Score, but be a nightmare to maintain due to lack of clarity or understandable structure. Conversely, a simpler system, but well-designed and documented, will offer a much better experience… even though this won’t show up in the score.

Automation is an asset, but it cannot replace human analysis. Therefore, the QE Score cannot be the sole reflection of a product’s quality.

🔐 7. Privacy and Integration Challenges

Aggregating data from multiple tools requires deep access to the development environment. This raises questions around security, data governance, and the complexity of integration in heterogeneous ecosystems.

🧭 Conclusion: QE SCORE - A Valuable Tool, but Not Infallible

The QE Score is a compass, not a map.

It can guide efforts, spark discussions, and identify potential issues. However, it cannot, on its own, summarize the quality of a product or process.

It is up to us, quality practitioners, to maintain critical thinking, context, and, most importantly, the human element at the heart of our engineering approaches. Because once we become aware of its technical, human, and structural limitations, the QE Score becomes a true lever for continuous improvement. Used with perspective and intelligence, it can structure conversations, align teams on quality goals, and highlight real progress.

It is in this balance between measurement and discernment that its true power lies.