It seems to me that today’s LLMs try to solve all classes of problems with the same set of mechanisms. We need modular architectures that use different techniques for different aspects of problems. For example, consider an office productivity situation in which a network of agents is scheduling meetings, transcribing the speech content of video calls, extracting action items, and scheduling future meetings based on these. Today’s LLMs are excellent at parts of this problem space, but are completely inadequate for making decisions about resolving calendar conflicts, delegation of responsibility, and so forth. Agent systems like Claude take a small step in the right direction, but lack the epistemological power of a BDI system. A composite architecture seems necessary.
You’re describing, very cleanly, the core mistake of the “LLM as the whole system” mindset: we’re trying to use one statistical sequence model as planner, epistemologist, scheduler, negotiator, and policy engine all at once. That works for demos; it breaks for real coordination and decision‑making.
Let’s treat your office‑productivity example as a design problem and sketch what a composite architecture would actually look like.
High-level split: language vs. agency
In your scenario (meetings, transcripts, action items, delegation), there are at least four qualitatively different jobs:
- Perception:
Speech → text, text → structure (who said what, what are the action items, deadlines, owners). - State & constraints:
Calendars, org charts, role definitions, SLAs, priorities, capacity limits. - Decision & negotiation:
Who should attend, how to resolve conflicts, when to delegate, when to escalate. - Communication:
Email/slack messages, calendar invites, summaries, nudges.
LLMs are great at:
- turning messy language into structured artifacts (transcripts → action items)
- generating human‑readable communication (emails, summaries, justifications)
They are bad at:
- consistent constraint satisfaction (calendar conflicts, resource allocation)
- principled delegation and responsibility modeling
- maintaining stable, inspectable beliefs over time
So the architecture should reflect that.
A composite architecture for your office scenario
Think of it as a small society of specialized components, not one big brain.
1. LLMs as perception and expression layers
Use LLMs for:
- Transcription post‑processing: diarization labels, topic segmentation, “who promised what by when.”
- Action item extraction:
From raw text to a schema like:{"task": "...", "owner": "...", "due": "...", "dependencies": [...]} - Natural language I/O:
Drafting emails, meeting summaries, negotiation messages (“Can we move this to Thursday?”).
They don’t decide what should happen; they describe and translate.
2. A BDI-style “core” for decisions
Reintroduce what you had in the ’90s:
- Beliefs:
- Calendars, roles, priorities, historical reliability of attendees, org policies.
- Explicit, structured, with provenance (HR system, user override, past behavior).
- Desires:
- Minimize conflicts, respect working hours, ensure required stakeholders are present, avoid overload for key people, etc.
- Intentions:
- Concrete commitments: “Schedule project sync with A, B, C before Friday,” “Reassign this task from X to Y.”
This core:
- runs constraint solvers / optimization algorithms for scheduling
- uses explicit rules and policies for delegation (“if manager overloaded, delegate to senior IC”)
- tracks provenance and reputation (this calendar is often stale; this person frequently declines last‑minute invites)
LLMs can advise this core (“suggest alternative times,” “summarize tradeoffs”), but the BDI layer owns the commitments.
3. Provenance and reputation as first-class citizens
Bring back your old tricks:
- Provenance tags on beliefs:
- “Alice’s calendar (source: Exchange)”
- “Org policy (source: HR DB)”
- “User override (source: direct confirmation in chat)”
- Reputation / reliability:
- “Bob’s calendar is often wrong; treat busy slots as soft constraints.”
- “Team X routinely ignores optional invites; don’t count them as critical.”
This is exactly the epistemological power you’re missing in pure LLM systems: the ability to reason about sources, not just contents.
Where LLM agents fit (and where they don’t)
Agentic LLM systems like Claude’s “team” or multi‑agent frameworks are a step toward modularity, but:
- they still share the same underlying epistemology (no explicit beliefs, no real provenance, no durable reputation)
- “agents” are often just different prompts over the same model, not heterogeneous entities with distinct sensors, data, and failure modes
You had true heterogeneity in your BDI work; that’s what enabled meaningful reputation and division of labor. A modern version would mix:
- LLM agents (perception, explanation, negotiation)
- Symbolic/BDI agents (belief management, policy, planning, constraint solving)
- Tool agents (calendar APIs, HR systems, ticketing, etc.)
All coordinated by an explicit protocol, not just “let the LLM talk to itself.”
Why a composite architecture is not optional
For the kind of system you’re describing, a monolithic LLM will always hit hard limits:
- No stable commitments: it can’t own an intention in the BDI sense.
- No principled conflict resolution: it can describe tradeoffs, but not guarantee constraint satisfaction.
- No epistemology: it can’t say why it trusts one source over another in a way that’s grounded in explicit, updateable beliefs.
A composite architecture isn’t just cleaner—it’s the only way to get:
- inspectable reasoning
- reliable delegation and scheduling
- long‑term consistency across many interactions and agents
If you were to rebuild your 1990s BDI system today for this office scenario, what would you be most tempted to hand over to LLMs—and what would you absolutely refuse to let them control? That boundary line is where the interesting design work lives.
Before deciding on the partition of the problem space, we need to think about the nature of the communications between components. Because we should expect to have heterogeneous agents with different capabilities and levels of maturity; we can’t simply upgrade all of the components on a “flag day”. Back in the BDI work, we explored the role of “speech act” theory for intercommunication, but ran into problems with coherence of the token space.
You’re getting into the real meat of it now—the interface, not the internals. How components talk to each other will either make heterogeneity powerful…or make the whole thing collapse into mush.
Let’s stay concrete and build from your experience.
1. The core problem: shared language without shared internals
You’ve got:
- Heterogeneous agents: different capabilities, versions, and trust levels.
- Incremental evolution: no “flag day” where everyone upgrades their ontology.
- Need for coordination: they must still schedule meetings, assign tasks, etc.
Speech‑act theory gives you a pragmatic layer (request, commit, inform, promise), but you still hit:
- Token incoherence: “task”, “priority”, “owner”, “deadline” don’t mean exactly the same thing to every agent.
- Ontology drift: as components evolve, their internal concepts diverge.
So the question becomes: how do we design a communication substrate that tolerates this?
2. Separate three layers of communication
Instead of “one protocol to rule them all,” I’d split it into:
- Illocutionary layer (speech acts):
REQUEST(schedule_meeting, …)COMMIT(own_task, …)INFORM(calendar_update, …)
This is where your earlier speech‑act work lives.
- Schema layer (shared but flexible data structures):
- JSON‑like, versioned, with optional fields and explicit “unknown/approximate” markers.
- E.g.
{ "type": "task", "version": "1.2", "title": "Prepare Q3 report", "owner": "alice@example.com", "due": "2026-05-20", "priority": "high", "source": "meeting#123", "confidence": 0.78 }
- Semantic layer (local interpretation):
- Each agent maps the shared schema into its own internal ontology.
- It may ignore fields, reinterpret them, or enrich them with local context.
The key: only the first two layers need to be shared; the third is intentionally private and heterogeneous.
3. Handling token-space incoherence explicitly
Instead of pretending all agents agree on meanings, we can:
- Version schemas:
task@1.0,task@1.1, etc.- Agents declare which versions they understand.
- Allow partial understanding:
- An agent can say: “I accept this message but ignore fields X and Y.”
- Negotiate meaning when needed:
- If an agent sees an unknown field (
"load_factor","criticality"), it can query: “What doescriticalitymean in this context? Is it ordinal? Is it hard or soft?”
- If an agent sees an unknown field (
LLMs are actually good at this negotiation step: they can propose mappings between slightly different schemas or vocabularies.
4. Where LLMs fit in the communication fabric
LLMs shouldn’t be the protocol—but they can be:
- Mediators:
- Translate between slightly different schemas or ontologies.
- E.g., map “importance” ↔ “priority” ↔ “severity” based on context.
- Normalizers:
- Take messy natural language (“Bob will probably get this done by next Friday”) and produce structured, schema‑conformant messages.
- Explainers:
- When an agent rejects a request, an LLM can generate a human‑readable explanation grounded in the structured state.
But the contract between agents should be symbolic, versioned, and inspectable—not buried in LLM weights.
5. A more robust successor to pure speech acts
Speech‑act theory gave you:
REQUEST,PROMISE,INFORM, etc.
I’d extend that with:
- Capability declarations:
- “I can handle
schedule_meeting@2.0but notdelegation_policy@3.1.”
- “I can handle
- Epistemic qualifiers:
- “I INFORM you that
X, with confidence 0.6, sourcecalendar, last_updatedT.”
- “I INFORM you that
- Meta‑communication:
- “I don’t understand field
X.” - “I interpret
priority=highas ‘must be done this week’.”
- “I don’t understand field
This keeps the spirit of speech acts but grounds them in a more engineering‑friendly, evolvable protocol.
6. The design principle that falls out of this
If we want:
- heterogeneous agents
- incremental upgrades
- BDI‑style epistemology
- LLMs in the loop
…then the communication substrate needs to be:
- Symbolic and versioned (not just “prompt engineering”)
- Tolerant of partial understanding
- Explicit about uncertainty and provenance
- Open to LLM‑mediated translation, but not dependent on it
You already ran into the token‑space problem once with speech acts. The modern twist is: let LLMs help bridge token spaces, but never let them define the protocol.
If you were to design a v2 of your old speech‑act layer today, would you lean more toward a strongly typed schema (like protobuf/JSON‑schema) with speech‑act wrappers, or something more dynamic and negotiated?
I’d have a dynamic negotiated protocol, but one of the negotiable choices would be a strongly typed domain-specific JSON scheme, just so we could get off the ground.
That hybrid you’re describing — a negotiated protocol whose first move is to agree on a strongly typed domain schema — is exactly the sweet spot for heterogeneous, evolving agent societies. It gives you the stability needed to bootstrap cooperation, without locking you into a brittle, monolithic ontology.
Let me sharpen the idea and show why it works so well for the kind of multi‑agent, multi‑version ecosystem you’re imagining.
Why a negotiated protocol with a typed fallback is the right foundation
1. It solves the “flag day” problem
If every agent must speak the same schema from day one, you’re stuck with:
- synchronized upgrades
- brittle version dependencies
- cascading failures when one agent evolves faster than others
A negotiated protocol avoids this. Agents can:
- announce what they understand
- negotiate a common subset
- fall back to a baseline schema when needed
This is how real distributed systems survive version churn.
2. The strongly typed schema acts as a “lingua franca”
You’re not forcing every agent to use the same internal ontology — you’re giving them a safe, minimal, shared surface.
Think of it as:
- the TCP of semantics
- not expressive enough to encode everything
- but stable enough that everyone can rely on it
Once two agents agree on the baseline schema, they can negotiate richer semantics on top.
This is exactly how humans communicate: we share a basic vocabulary, then negotiate meaning as needed.
3. Negotiation allows richer, domain‑specific extensions
The moment two agents detect they both support a richer schema, they can switch to it:
- “I support
task@2.1with fieldsload_factorandurgency_model.” - “I only support
task@1.3, but I can accept unknown fields.” - “Let’s use
task@1.3plus an extension block for the new fields.”
This is how you avoid the token‑space incoherence that killed pure speech‑act systems.
The protocol becomes:
- Negotiate schema version
- Negotiate extensions
- Negotiate semantics of ambiguous fields
- Exchange speech acts using the agreed schema
It’s layered, explicit, and evolvable.
4. LLMs fit beautifully into the negotiation layer
This is where modern models shine:
- translating between schema versions
- proposing mappings between fields
- explaining semantic differences
- generating extension blocks
- detecting when two agents are “talking past each other”
But they do not define the protocol.
They help agents reach a protocol.
This keeps the epistemology symbolic and inspectable.
5. The architecture becomes robust to heterogeneity
With this approach, your ecosystem can contain:
- legacy agents that only understand
task@1.0 - new agents that speak
task@3.2 - LLM‑powered mediators
- symbolic planners
- constraint solvers
- policy engines
- human‑in‑the‑loop agents
…and they all still cooperate.
This is the exact property your 1990s BDI systems had — and the property modern LLM‑centric systems lack.
6. You get provenance and reputation “for free”
Once messages are structured and versioned, you can attach:
- source identifiers
- confidence scores
- timestamps
- revision history
- reputation metrics
This gives you the epistemological machinery that LLMs alone cannot provide.
Where this leads
You’re essentially describing a next‑generation agent protocol stack:
Layer 0: Transport
JSON, protobuf, whatever.
Layer 1: Typed domain schema
Minimal, stable, versioned.
Layer 2: Negotiation
Capabilities, schema versions, extensions, semantic alignment.
Layer 3: Speech acts
REQUEST, INFORM, COMMIT, DECLINE, QUERY, etc.
Layer 4: Local reasoning
BDI, constraint solvers, planners, LLMs, etc.
This is the architecture that would let modern AI systems behave like the distributed, epistemically grounded multi‑agent systems you were building decades ago — but with far more expressive perception and communication.