Building Data-Intensive Applications · Concept Map

Principles over tools.

This is a field guide for building modern applications and systems. No vendor logos — just the concepts that keep meaning stable, risk bounded, and systems trustworthy.

This work was inspired by Stewart Brand's Pace Layering and Martin Kleppmann's Designing Data-Intensive Applications. We're grateful for their frameworks and for sharing them generously with the community.

Orrery Atlas. A mechanical map of principles. The closer a principle is to the core, the more it anchors meaning. Rings show pace: fast surfaces move quickly, slow foundations change rarely, and cross‑cutting forces pull on every layer.

Meaning
Alignment

Interfaces

Experiments

Search & Analytics

Data Contracts

Pipelines

Integration

Ontology

Storage

Transactions

Governance

Security

Observability

Fast pace Medium pace Slow pace Cross‑cutting LLM concern high LLM concern medium LLM concern low

Plate 01 · Orrery Of Meaning

If the core drifts, every orbit inherits the ambiguity.

01 · Map Click any concept Opens the inspector — role, key decision, deeper topics, and see-also links.

02 · Lenses Toggle overlays Highlight semantic drift, LLM risk, pace layer, coupling, or failure mode across the map.

03 · Inspector Read, decide, navigate Each node explains its role, surfaces the decision it forces, and links to related concepts.

04 · Orrery Click any planet Jumps to that concept on the map. Orbit speed reflects pace — fast ring changes weekly.

A quick reference for what to consider when building applications and systems. Start with the Map, then flip on lenses to see risk, drift, and pace.

Bring a notebook. This guide is meant for annotating tradeoffs, not choosing tools.

Use the Lenses tab to reveal overlays and the legend key.

Field Notes · Architecture Map

Layers

Draw a boundary where language changes, not where teams sit. Most modernization failures happen after the tools are chosen — because meanings were never aligned.

Quick Reference — Beyond Semantic Risk

LLM systems checklist

Reliability & Safety
Guardrails, fallback behavior, and safe failure modes when the model is wrong.

Determinism & Invariants
Where LLM output must obey rules (transactions, workflows, calculations).

Data Contracts
Strict schemas for tool calls, events, and outputs; validation is non‑negotiable.

Observability
Semantic regression tests, drift alerts, and traceability for model decisions.

Security & Abuse
Prompt injection, data exfiltration, and policy compliance across tools.

Human Escalation
Clear handoff paths, overrides, and accountability when the model is uncertain.

Evaluation Harness
Red‑team tests, regression suites, and prompt/model versioning discipline.

Cost & Latency
Token budget, caching, and model routing that preserves UX and margins.

Use this checklist alongside the map: first decide what must be true (principles), then choose how fast each part can change (pace).

Plate 04 · LLM Reference Notes

If you can’t test it, you can’t trust it.

Field Notes — Analogies & Rubrics

Notes for reflection

Use these analogies to explain the stack without mentioning tools. Pair them with the rubric to assess LLM‑specific risk.

Foundations
Like building codes: you rarely change them, but everything depends on them.

System Models
Like architectural blueprints: they define the shape of meaning across systems.

Dataflow
Like logistics networks: timing and routing determine what the system can promise.

Serving
Like storefronts: fast‑changing experiences that must still honor the core language.

Operations
Like air‑traffic control: continuous coordination that keeps the whole system safe.

LLM Concern Rubric
Concern = drift likelihood × blast radius. High means meaning can change and many systems/users are affected.

Plate 06 · Analogies Field Notes

An analogy that holds under pressure is a model worth keeping.

Field Notes · Lens Guide

Risk Intersection — The Danger Zone

Where lenses overlap

When high semantic drift, high LLM concern, and fast pace converge, you get the highest-risk nodes. These are where meaning shifts quickly, models amplify the drift, and users feel it immediately.

APIs & Query
Surface contracts change often, many consumers depend on them, and LLM output flows through here.

Search & Indexing
Relevance models shift, labels drift, and users notice quality changes immediately.

Event-Driven Systems
Event schemas bind producers and consumers. Schema evolution is silent until something breaks.

ML & Features
Feature stores, training/serving skew, and label drift compound over time without detection.

Experiences (UX)
Terminology changes cascade to user understanding. Misaligned labels erode trust.

Data Contracts
The line of visibility. When contracts slip, both sides of the boundary lose alignment.

Invest in contracts, regression tests, and semantic monitoring at these nodes first.

Semantic Drift

Lens

Meaning diverges quietly at boundaries — teams, services, schema versions — until the same word refers to different things in different places.

"Customer" in billing is not "customer" in support. "Active" in analytics is not "active" in auth. Each drift is defensible in isolation; together they make integration expensive and trust fragile. The nodes highlighted in this lens are where divergence is most likely to compound undetected.

What to look for: Schema migrations that rename without aligning, API versions that fork meaning, dashboard metrics that use different definitions of the same term, and handoffs where no one owns the canonical definition.

LLM Concern

Lens

Concern = drift likelihood × blast radius.

A concept has high LLM concern when model output directly shapes decisions and many users or systems are affected. Hallucination in search is worse than in a batch log. Schema violation in an API contract breaks downstream consumers.

What to look for: LLM output consumed without validation, model-generated labels becoming canonical, prompt changes that alter system behavior.

Pace Layers

Lens

Not all parts of a system should change at the same speed — and forcing them to is a primary source of architectural debt.

Fast layers (serving, product interfaces) change weekly or monthly. Slow layers (data models, storage infrastructure, foundational schemas) change yearly. The biggest risk is pace mismatch: a fast-layer team that owns a slow-layer dependency, or a slow-layer team pressured to ship at fast-layer velocity. Cross-cutting ops concerns don't change on a schedule — they're always active.

What to look for: Dependencies that cross pace boundaries in the wrong direction, governance processes that can't keep up with deployment cadence, and slow-layer decisions being made by teams who feel fast-layer pressure.

Coupling

Lens

How entangled a concept is with the rest of the system.

High coupling means changes cascade — data models, encoding formats, transaction semantics. Low coupling means you can swap implementations — storage engines, consensus algorithms. When modernizing, start with low-coupling concepts; when protecting, prioritize high-coupling ones.

What to look for: Concepts where a rename would require changes in five or more systems, schemas that three or more teams depend on, interfaces that both humans and machines consume.

Failure Modes

Lens

The failures you don't see are worse than the ones that page you at 3am.

Loud failures — storage outages, consensus failures, crashed services — are hard to miss and tend to be well-instrumented. Silent failures are the opposite: semantic drift in event schemas, stale derived data presented as fresh, governance gaps that compound over quarters. Toggle the Failure Mode lens on the map to see which nodes carry which risk. The most dangerous intersection is silent + high LLM concern: the model amplifies a drift no one has noticed.

Loud failures
Storage, consensus, replication, transactions — hard to miss, usually alertable. Design for fast recovery.

Silent failures
Encoding drift, consistency violations, governance gaps, stale caches — compound over weeks. Design for detection, not just prevention.

Plate 05 · Lens Field Notes

Every lens is a question. The answer depends on your system.

Field Notes · Pace Map

The same system, viewed by pace. Front stage changes fast, back stage changes slow, and the line of visibility is where alignment is enforced.

Front Stage · Fast Pace

Serving & Product surfaces

Product Interfaces

APIs, UX copy, and surface semantics that change frequently.

3 conceptsFast

Experimentation

A/B tests, feature flags, and shifting labels.

2 conceptsFast

Query Experiences

Search relevance, dashboards, and analyst views.

2 conceptsFast

Line Of Visibility

Dataflow & contracts

Data Contracts

Schemas, naming standards, and compatibility policies.

4 conceptsMedium

Dataflow & Pipelines

Batch, streaming, CDC, and transformation semantics.

4 conceptsMedium

Integration Surface

APIs, events, and shared data products.

3 conceptsMedium

Back Stage · Slow Pace

Models & foundations

Ontology & Core Semantics

Canonical definitions, invariants, and entity models.

3 conceptsSlow

Storage & Replication

Data models, transactions, and consistency guarantees.

4 conceptsSlow

Platform Runtime

Resilience, scalability, and recovery mechanics.

3 conceptsSlow

Cross-Cutting · Everywhere · Always

Semantic Governance

Ontology stewardship, glossary review, and drift detection.

4 conceptsCross

Quality & Observability

Testing, monitoring, and semantic regression alerts.

3 conceptsCross

Cross-cutting capabilities like governance and security usually cost less when they’re designed early, not retrofitted.

Plate 03 · Pace Field Notes

Fast layers change first, but slow layers decide the cost.