We ran NLP topic modeling across the backstage/backstage repository. 41,652 signals. 5 topic clusters. Here's what the data says about where engineering energy goes — and what it costs.
Backstage is a CNCF incubating project — an open platform for building developer portals, originally created at Spotify. Hundreds of companies use it to manage services, docs, and tooling at scale.
This isn't a feature review or a star count. It's a look at the language of the issue tracker — what people are actually filing, fixing, and fighting about.
Analyzed by Beyond The AlignmentWe ran NLP frequency analysis across issue titles in the repository. The most common words paint a clear picture:
What this tells us: The dominant words are maintenance verbs — update, fix, bump, chore. Words like feature, add, create barely register. When a project's most common language is about upkeep rather than building, that's a signal worth paying attention to.
Pattern: Linguistic DriftAll 41,652 signals were classified into 5 topic clusters using LDA. The distribution is lopsided:
What this tells us: Topic 2 alone accounts for more signals than all other topics combined. That's not a topic — it's a center of gravity. When one concern consumes this much bandwidth, everything else is competing for the margins.
Pattern: Gravitational CollapseHealth Status: Attention
The analyzer flagged two recommended actions:
What this tells us: Every bump, every chore, every update is a cycle not spent on features. Over half the repository's energy is consumed by keeping third-party packages from breaking the platform. This is the maintenance tax — and it compounds.
Pattern: Velocity LockWhat this tells us: Component development and backend configuration are holding up. But the three flagged topics share a common thread: they're all downstream of dependency health. When attention areas outnumber healthy ones, that's a leading indicator — not a trailing one.
Pattern: Cascading DecayThe numbers above don't change, but what they mean depends on where you sit. Here's how five different roles would read this report.
If you're maintaining this repo, the data confirms what you probably already feel: most of your work is keeping things from breaking, not building things that work. Every "minor bump" compounds into hours of labor that never shows up on a roadmap.
The pattern here is what we call the treadmill effect — full velocity, zero displacement. The CI pipeline runs, PRs merge, version numbers increment. But the product doesn't move forward. Recognizing this pattern early is the difference between managing it and being consumed by it.
Velocity metrics look normal because the team is busy. But busy isn't the same as productive. When 3 of 5 topic areas are flagged, the roadmap isn't slipping because of bad planning — it's slipping because invisible toil is eating the capacity.
This is the signal loss problem. The useful information — which topics are healthy, which are degrading — gets buried under the volume of routine maintenance noise. Without a way to separate the two, resource allocation decisions are based on incomplete data.
For a CNCF incubating project, these signals matter at the ecosystem level. Backstage is infrastructure that other companies build on. When dependency health degrades in a project like this, the downstream effects ripple across hundreds of adopters.
By the time a project stops shipping features, it's usually too late to intervene. The early warning is always in the issue tracker — in the volume of unaddressed friction. This data suggests the institutional memory of the project is increasingly concentrated in maintenance patterns rather than development patterns.
GitHub stars are a vanity metric. What the topic model reveals is the maintenance-to-feature ratio — the actual cost of keeping this codebase alive versus moving it forward. A ~4:1 ratio means for every unit of new capability, four units go to upkeep.
This is the semantic drift problem applied to due diligence. The public narrative (adoption, community, stars) drifts from the engineering reality (toil, dependency debt, maintenance burden). The gap between the two is where risk hides.
Dependency health is security health. Topics 4 and 5 — templates and infrastructure — are also flagged. If maintainers are saturated with Topic 2 churn, the question becomes: who is watching the vulnerability surface in the areas that don't generate as much noise?
The quiet topics are the dangerous ones. T4 and T5 are small by signal volume but critical by function. When the loudest topic consumes all the attention, the lower-volume topics — where security and infrastructure issues live — go under-monitored. That's the blindspot.
Backstage is a critical piece of cloud-native infrastructure. The issue tracker tells us the project is healthy in its core areas — but carrying a significant maintenance burden that, left unaddressed, will compound.
This analysis was generated by Beyond The Alignment — NLP-powered signal analysis for open source projects.
Methodology: LDA topic modeling + Claude API for topic enhancement, health classification, and recommended actions.
© 2026 Beyond The Alignment