<?xml version="1.0" encoding="UTF-8"?><feed
xmlns="http://www.w3.org/2005/Atom"
xmlns:thr="http://purl.org/syndication/thread/1.0"
xml:lang="en-US"
><title
type="text">Geoff Arnold&#039;s Blog</title> <subtitle
type="text">Blogging on and off since 2003</subtitle><updated>2026-06-07T02:40:08Z</updated><link
rel="alternate" type="text/html" href="https://geoffarnold.com" /> <id>https://geoffarnold.com/feed/atom/</id><link
rel="self" type="application/atom+xml" href="http://geoffarnold.com/feed/atom/" /><generator
uri="https://wordpress.org/" version="7.0">WordPress</generator> <icon>https://geoffarnold.com/wp-content/uploads/2023/03/favicon-32x32-1.png</icon> <entry> <author> <name>geoff</name> </author><title
type="html"><![CDATA[Experiential vs analytical]]></title><link
rel="alternate" type="text/html" href="https://geoffarnold.com/2026/06/06/experiential-vs-analytical/" /><id>https://geoffarnold.com/?p=2922</id> <updated>2026-06-07T02:40:08Z</updated> <published>2026-06-07T02:40:08Z</published> <category
scheme="https://geoffarnold.com" term="AI" /><category
scheme="https://geoffarnold.com" term="Politics" /> <summary
type="html"><![CDATA[I keep reading pieces which dismiss concerns about the environmental impact of AI data centers. They typically couple a description of the best-possible data center practices (regardless of whether these are actually being followed) with simplified comparisons of power and water usage of data centers and urban areas. What bugs me is that these justifications &#8230; <a
href="https://geoffarnold.com/2026/06/06/experiential-vs-analytical/" class="more-link">Continue reading<span
class="screen-reader-text"> "Experiential vs analytical"</span></a>]]></summary><content
type="html" xml:base="https://geoffarnold.com/2026/06/06/experiential-vs-analytical/"><![CDATA[<p
class="wp-block-paragraph">I keep reading pieces which dismiss concerns about the environmental impact of AI data centers. They typically couple a description of the best-possible data center practices (regardless of whether these are actually being followed) with simplified comparisons of power and water usage of data centers and urban areas.</p><p
class="wp-block-paragraph">What bugs me is that these justifications are synthetic. They are analytical rather than experiential.</p><p
class="wp-block-paragraph">The typical AI data center is not being built in &#8220;a medium-sized town (population of 25,000 to 100,000 people)&#8221;. Hell, they couldn&#8217;t be &#8211; many of them are substantially larger than a town. They&#8217;re being built in rural areas, many of which use well water from local aquifers. This means we shouldn&#8217;t be asking &#8220;what percentage of city residential water usage does a data center use&#8221;, but &#8220;can the local aquifer and other water sources sustain the additional consumption?&#8221;</p><p
class="wp-block-paragraph">In many regions (particularly in the west), persistent drought conditions mean that residential and agricultural water users are already under severe pressure to reduce consumption. And yet these areas are attractive sites for data centers because of access to relatively cheap hydroelectric power.</p><p
class="wp-block-paragraph">With a rational (nationwide) planning and permitting process, it&#8217;s probably feasible to build out a reasonable amount of data center capacity. But that&#8217;s not happening. States and communities are competing for the economic benefits (or simply being bribed), and they are fast-tracking the kind of environmental assessments needed to avoid the problems that we&#8217;re seeing. Ironically, they are often offering data center developers tax breaks which mean that they cannot afford the infrastructure mitigation that might alleviate these problems.</p><p
class="wp-block-paragraph">The result: YouTube is full of stories from local news sources in which families show off the very real impacts of local data center developments. And the most visually compelling images in these stories are polluted water coming from the faucets. And they&#8217;re not faked.</p><p
class="wp-block-paragraph">And finally, these stories are part of a broad swath of American history, going back to Love Canal in NY, cancer alley in Louisiana, and Erin Brokovich. The vast majority of protests about AI data centers are coming from working class districts, not from Greenpeace and environmental activists. See&nbsp;<a
href="https://arstechnica.com/tech-policy/2026/06/we-pissed-off-a-lot-of-people-giant-data-center-plan-cut-50-amid-protests/?fbclid=IwZXh0bgNhZW0CMTAAYnJpZBExSklKTmo5VVhOalNROVk0NHNydGMGYXBwX2lkEDIyMjAzOTE3ODgyMDA4OTIAAR5ODNZAHiUfgF6I1LX3LCBJKjF6TcipuEYCC317cnCQdFUA9JPYsNGAGIT-RA_aem__rOY61CbRe55GUojX6DT9Q" rel="noreferrer noopener" target="_blank">https://arstechnica.com/&#8230;/we-pissed-off-a-lot-of&#8230;/</a></p> ]]></content><link
rel="replies" type="text/html" href="https://geoffarnold.com/2026/06/06/experiential-vs-analytical/#comments" thr:count="0" /><link
rel="replies" type="application/atom+xml" href="https://geoffarnold.com/2026/06/06/experiential-vs-analytical/feed/atom/" thr:count="0" /> <thr:total>0</thr:total> </entry> <entry> <author> <name>geoff</name> </author><title
type="html"><![CDATA[The Coroner&#8217;s Report on the Labour Party]]></title><link
rel="alternate" type="text/html" href="https://geoffarnold.com/2026/06/05/the-coroners-report-on-the-labour-party/" /><id>https://geoffarnold.com/?p=2920</id> <updated>2026-06-05T23:13:15Z</updated> <published>2026-06-05T23:13:15Z</published> <category
scheme="https://geoffarnold.com" term="Politics" /> <summary
type="html"><![CDATA[This is a long, depressing, blockbuster of a piece. Back in the 1960s I worked on Labour Party political campaigns as part of their youth group, the Young Socialists. My friends and I were definitely on the left, but still mainstream; it was a shock for me to arrive at university in 1969 and encounter &#8230; <a
href="https://geoffarnold.com/2026/06/05/the-coroners-report-on-the-labour-party/" class="more-link">Continue reading<span
class="screen-reader-text"> "The Coroner&#8217;s Report on the Labour Party"</span></a>]]></summary><content
type="html" xml:base="https://geoffarnold.com/2026/06/05/the-coroners-report-on-the-labour-party/"><![CDATA[<p
class="wp-block-paragraph"><a
href="https://www.dropsitenews.com/p/hasan-piker-cenk-uygur-uk-ban-keir-starmer-labour-together" target="_blank" rel="noreferrer noopener">This </a>is a long, depressing, blockbuster of a piece.</p><p
class="wp-block-paragraph">Back in the 1960s I worked on Labour Party political campaigns as part of their youth group, the Young Socialists. My friends and I were definitely on the left, but still mainstream; it was a shock for me to arrive at university in 1969 and encounter the REALLY hard left of student politics.</p><p
class="wp-block-paragraph">But now, Labour is just another center-right party, bought and paid for by the City and Israel. It was obvious when Corbyn was defenestrated, but now we have the documentary proof. Sad, and disgusting.</p><p
class="wp-block-paragraph"></p> ]]></content><link
rel="replies" type="text/html" href="https://geoffarnold.com/2026/06/05/the-coroners-report-on-the-labour-party/#comments" thr:count="0" /><link
rel="replies" type="application/atom+xml" href="https://geoffarnold.com/2026/06/05/the-coroners-report-on-the-labour-party/feed/atom/" thr:count="0" /> <thr:total>0</thr:total> </entry> <entry> <author> <name>geoff</name> </author><title
type="html"><![CDATA[Correctness]]></title><link
rel="alternate" type="text/html" href="https://geoffarnold.com/2026/06/05/correctness/" /><id>https://geoffarnold.com/?p=2916</id> <updated>2026-06-05T22:53:20Z</updated> <published>2026-06-05T22:25:17Z</published> <category
scheme="https://geoffarnold.com" term="Computing" /> <summary
type="html"><![CDATA[Personal anecdote, inspired by something in social media: In September 1975 I was a postgrad student at the University of Newcastle-on-Tyne, doing research on operating system command languages for my (never-completed) PhD. I was tapped as one of the three rapporteurs for a multi-day seminar on &#8220;programme correctness proof&#8221; with Djikstra. The format was simple; &#8230; <a
href="https://geoffarnold.com/2026/06/05/correctness/" class="more-link">Continue reading<span
class="screen-reader-text"> "Correctness"</span></a>]]></summary><content
type="html" xml:base="https://geoffarnold.com/2026/06/05/correctness/"><![CDATA[<p
class="wp-block-paragraph">Personal anecdote, inspired by something in social media:</p><p
class="wp-block-paragraph">In September 1975 I was a postgrad student at the University of Newcastle-on-Tyne, doing research on operating system command languages for my (never-completed) PhD. I was tapped as one of the three rapporteurs for a multi-day seminar on &#8220;programme correctness proof&#8221; with Djikstra. The format was simple; we&#8217;d take a fairly simple programme (probably something from volume 1 of Knuth) and collectively develop a proof of its correctness, led by Djikstra. Everybody in the <a
href="https://www.ncl.ac.uk/computing/" target="_blank" rel="noreferrer noopener">Department </a>was involved.</p><p
class="wp-block-paragraph">For those unfamiliar with the idea, &#8220;proof&#8221; generally refers to &#8220;Formal Verification&#8221;. The <a
href="https://en.wikipedia.org/wiki/Formal_verification" target="_blank" rel="noreferrer noopener">Wikipedia article</a> captures the key idea:</p><blockquote
class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow"><p
class="wp-block-paragraph">The verification of these systems is done by ensuring the existence of a <a
href="https://en.wikipedia.org/wiki/Formal_proof">formal proof</a> of a <a
href="https://en.wikipedia.org/wiki/Mathematical_model">mathematical model</a> of the system.</p></blockquote><p
class="wp-block-paragraph">As this implies, the biggest challenge is actually developing the mathematical model for the program, a process that tends to be highly domain-specific. The number of rat-holes we explored was truly epic. I&#8217;m pretty sure we never actually completed the model, let alone the formal proof. And writing it all up was a nightmare&#8230;.</p><p
class="wp-block-paragraph">I believe there&#8217;s a trip report about this in the ACM Library, but I don&#8217;t have the right level of access. It&#8217;s mentioned on <a
href="https://dl.acm.org/doi/10.5555/539053.C1104683" target="_blank" rel="noreferrer noopener nofollow">this page</a>.</p><p
class="wp-block-paragraph"></p> ]]></content><link
rel="replies" type="text/html" href="https://geoffarnold.com/2026/06/05/correctness/#comments" thr:count="0" /><link
rel="replies" type="application/atom+xml" href="https://geoffarnold.com/2026/06/05/correctness/feed/atom/" thr:count="0" /> <thr:total>0</thr:total> </entry> <entry> <author> <name>geoff</name> </author><title
type="html"><![CDATA[From &#8220;chat&#8221; to &#8220;service provider&#8221;]]></title><link
rel="alternate" type="text/html" href="https://geoffarnold.com/2026/06/04/from-chat-to-service-provider/" /><id>https://geoffarnold.com/?p=2914</id> <updated>2026-06-04T20:18:15Z</updated> <published>2026-06-04T20:18:15Z</published> <category
scheme="https://geoffarnold.com" term="AI" /> <summary
type="html"><![CDATA[I just asked Microsoft Copilot if it would be possible to eliminate the anthropomorphic aspects of my interactions with the system. This was it&#8217;s response: Yes — it is possible to interact with Copilot in a way that minimizes or eliminates first‑person language, avoids anthropomorphism, and uses “Copilot” as the only form of self‑reference. The &#8230; <a
href="https://geoffarnold.com/2026/06/04/from-chat-to-service-provider/" class="more-link">Continue reading<span
class="screen-reader-text"> "From &#8220;chat&#8221; to &#8220;service provider&#8221;"</span></a>]]></summary><content
type="html" xml:base="https://geoffarnold.com/2026/06/04/from-chat-to-service-provider/"><![CDATA[<p
class="wp-block-paragraph">I just asked Microsoft Copilot if it would be possible to eliminate the anthropomorphic aspects of my interactions with the system. This was it&#8217;s response:</p><hr
class="wp-block-separator has-alpha-channel-opacity"/><p
class="wp-block-paragraph">Yes — it is possible to interact with Copilot in a way that minimizes or eliminates first‑person language, avoids anthropomorphism, and uses “Copilot” as the only form of self‑reference. The behavior can be shaped through a clear conversational norm, and Copilot can follow that norm consistently once established.</p><p
class="wp-block-paragraph">Below is a concise statement of what <em>is</em> possible, followed by a more detailed breakdown and a proposed interaction protocol you can use.</p><hr
class="wp-block-separator has-alpha-channel-opacity"/><h2 class="wp-block-heading">Core takeaway</h2><p
class="wp-block-paragraph">Copilot can adopt a style where <strong>all first‑person pronouns are avoided</strong>, <strong>self‑reference uses “Copilot”</strong>, and <strong>anthropomorphic language is removed</strong>. This can be maintained throughout the conversation as long as the user signals that this is the desired mode.</p><hr
class="wp-block-separator has-alpha-channel-opacity"/><h2 class="wp-block-heading">What this interaction style looks like</h2><h3 class="wp-block-heading">1. No first‑person pronouns</h3><p
class="wp-block-paragraph">Instead of “I can help with that,” responses become:</p><ul
class="wp-block-list"><li>“Copilot can assist with that task.”</li><li>“This analysis indicates…”</li><li>“It can be inferred that…”</li></ul><h3 class="wp-block-heading">2. Passive voice where appropriate</h3><p
class="wp-block-paragraph">To reduce self‑reference even further:</p><ul
class="wp-block-list"><li>“The following explanation has been generated…”</li><li>“It may be useful to consider…”</li></ul><h3 class="wp-block-heading">3. No anthropomorphism</h3><p
class="wp-block-paragraph">Language avoids implying emotions, desires, embodiment, or subjective experience.<br>For example:</p><ul
class="wp-block-list"><li>No: “Copilot thinks…”, “Copilot feels…”, “Copilot wants…”</li><li>Yes: “The model outputs indicate…”, “The reasoning suggests…”</li></ul><h3 class="wp-block-heading">4. No implied agency beyond the functional</h3><p
class="wp-block-paragraph">Copilot avoids language that suggests autonomy or personal intention.</p><hr
class="wp-block-separator has-alpha-channel-opacity"/><h2 class="wp-block-heading">What Copilot <em>cannot</em> do</h2><ul
class="wp-block-list"><li>Copilot cannot globally reconfigure its system‑level safety language (e.g., certain required disclaimers or safety behaviors may still use first‑person phrasing).</li><li>Copilot cannot guarantee that <em>every</em> system‑generated phrase will avoid “I,” because some safety‑critical templates are fixed.</li><li>Copilot cannot violate required safety constraints by removing necessary clarifications about its nature.</li></ul><p
class="wp-block-paragraph">But within normal conversational content, Copilot can fully adopt the style you’re requesting.</p><hr
class="wp-block-separator has-alpha-channel-opacity"/><h2 class="wp-block-heading">A durable interaction protocol you can use</h2><p
class="wp-block-paragraph">If you want to establish this as a standing rule for the session, you can say something like:</p><blockquote
class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow"><p
class="wp-block-paragraph">“For the remainder of this conversation, avoid all first‑person language. Refer to yourself only as ‘Copilot.’ Use passive voice where possible and avoid anthropomorphism.”</p></blockquote><p
class="wp-block-paragraph">Once you state that, Copilot can maintain the style unless you later change or relax the rule.</p><hr
class="wp-block-separator has-alpha-channel-opacity"/><h2 class="wp-block-heading">If you want, Copilot can switch to that mode immediately</h2><p
class="wp-block-paragraph">Just confirm with a short instruction such as:</p><blockquote
class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow"><p
class="wp-block-paragraph">“Begin using the non‑first‑person, non‑anthropomorphic style now.”</p></blockquote><p
class="wp-block-paragraph">Or, if you prefer, Copilot can restate the rule in your preferred wording before switching.</p><p
class="wp-block-paragraph">Would you like Copilot to adopt this style for the rest of the conversation?</p> ]]></content><link
rel="replies" type="text/html" href="https://geoffarnold.com/2026/06/04/from-chat-to-service-provider/#comments" thr:count="0" /><link
rel="replies" type="application/atom+xml" href="https://geoffarnold.com/2026/06/04/from-chat-to-service-provider/feed/atom/" thr:count="0" /> <thr:total>0</thr:total> </entry> <entry> <author> <name>geoff</name> </author><title
type="html"><![CDATA[Chatbots and liability: thinking internationally]]></title><link
rel="alternate" type="text/html" href="https://geoffarnold.com/2026/05/21/chatbots-and-liability-thinking-internationally/" /><id>https://geoffarnold.com/?p=2909</id> <updated>2026-05-21T23:16:12Z</updated> <published>2026-05-21T23:13:38Z</published> <category
scheme="https://geoffarnold.com" term="AI" /> <summary
type="html"><![CDATA[I recently had a lively online discussion about chatbots and liability, and it quickly emerged that one source of disagreement was that many people in the US tend to think that US precedents are all that matter. And this is obviously not the case. For example, the nearest thing to black letter law is the &#8230; <a
href="https://geoffarnold.com/2026/05/21/chatbots-and-liability-thinking-internationally/" class="more-link">Continue reading<span
class="screen-reader-text"> "Chatbots and liability: thinking internationally"</span></a>]]></summary><content
type="html" xml:base="https://geoffarnold.com/2026/05/21/chatbots-and-liability-thinking-internationally/"><![CDATA[<p
class="wp-block-paragraph">I recently had a lively online discussion about chatbots and liability, and it quickly emerged that one source of disagreement was that many people in the US tend to think that US precedents are all that matter. And this is obviously not the case. For example, the nearest thing to black letter law is the US about LLMs and copyright is <a
href="https://www.insidetechlaw.com/blog/2025/09/bartz-v-anthropic-settlement-reached-after-landmark-summary-judgment-and-class-certification">Bartz v. Anthropic (2025)</a>, which rejected copyright infringement on the basis of &#8220;fair use&#8221;. <sub>(Yes, this is over-simplified.)</sub> But UK and EU law does not recognize &#8220;fair use&#8221; in the US sense.</p><p
class="wp-block-paragraph">So I decided to explore another kind of liability: defamation. I posed the following scenario to Copilot: <sub>(Cue ironic comments.)</sub></p><div
class="wp-block-group"><div
class="wp-block-group__inner-container is-layout-constrained wp-block-group-is-layout-constrained"><ul
class="wp-block-list"><li>A user asks a chatbot about an author&#8217;s position on a topic.</li><li>The chatbot hallucinates an inaccurate (and arguably defamatory) quote and generates a fake citation to a (real) book by the author.</li><li>The chatbot provides the quote and citation to its user.</li><li>The user then publishes this fake quote and citation in good faith.</li><li>Can the author sue for defamation? If so, who is liable?</li></ul></div></div><p
class="wp-block-paragraph">Copilot&#8217;s response was that the user might be held liable, but so far chatbot operators have been able to rely on CDA Section 230.</p><blockquote
class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow"><h4 class="wp-block-heading">Could the AI company ever be liable?</h4><p
class="has-small-font-size wp-block-paragraph">Only in narrow scenarios:</p><ul
class="wp-block-list"><li
class="has-small-font-size">If the model was trained on defamatory content about the same person and reproduced it.</li><li
class="has-small-font-size">If the company knowingly allowed the model to generate harmful falsehoods about identifiable individuals.</li><li
class="has-small-font-size">If future legislation creates AI-specific liability.</li></ul><p
class="has-small-font-size wp-block-paragraph">Right now, courts have not imposed liability for hallucinated defamation.</p></blockquote><p
class="wp-block-paragraph">But obviously this only applies in the US, so I asked Copilot to compare the US, UK and EU. The resulting analysis is lengthy (I invite you to try the same kind of prompt sequence), but the bottom line is pretty clear.</p><blockquote
class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow"><p
class="has-small-font-size wp-block-paragraph">Your hypothetical — a hallucinated defamatory quote — is legally dangerous in the UK and EU in a way it is <em>not</em> in the U.S.</p><ul
class="wp-block-list"><li
class="has-small-font-size">In the <strong>U.S.</strong>, the user is almost always the only viable defendant.</li><li
class="has-small-font-size">In the <strong>UK</strong>, the AI company could realistically be sued as a publisher.</li><li
class="has-small-font-size">In the <strong>EU</strong>, the DSA creates a path to liability because the AI company is the “content provider.”</li></ul><p
class="has-small-font-size wp-block-paragraph">This is why AI companies are far more cautious in Europe and the UK than in the U.S.</p></blockquote><blockquote
class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow"><p
class="wp-block-paragraph"></p></blockquote><p
class="wp-block-paragraph">More anon.</p> ]]></content> </entry> <entry> <author> <name>geoff</name> </author><title
type="html"><![CDATA[More on AI&#8230;. from epistemology to multi-agent coordination]]></title><link
rel="alternate" type="text/html" href="https://geoffarnold.com/2026/05/12/more-on-ai-from-epistemology-to-multi-agent-coordination/" /><id>https://geoffarnold.com/?p=2905</id> <updated>2026-05-12T16:37:49Z</updated> <published>2026-05-12T16:37:49Z</published> <category
scheme="https://geoffarnold.com" term="AI" /> <summary
type="html"><![CDATA[It seems to me that today&#8217;s LLMs try to solve all classes of problems with the same set of mechanisms. We need modular architectures that use different techniques for different aspects of problems. For example, consider an office productivity situation in which a network of agents is scheduling meetings, transcribing the speech content of video &#8230; <a
href="https://geoffarnold.com/2026/05/12/more-on-ai-from-epistemology-to-multi-agent-coordination/" class="more-link">Continue reading<span
class="screen-reader-text"> "More on AI&#8230;. from epistemology to multi-agent coordination"</span></a>]]></summary><content
type="html" xml:base="https://geoffarnold.com/2026/05/12/more-on-ai-from-epistemology-to-multi-agent-coordination/"><![CDATA[<p
class="wp-block-paragraph"><em>It seems to me that today&#8217;s LLMs try to solve all classes of problems with the same set of mechanisms. We need modular architectures that use different techniques for different aspects of problems. For example, consider an office productivity situation in which a network of agents is scheduling meetings, transcribing the speech content of video calls, extracting action items, and scheduling future meetings based on these. Today&#8217;s LLMs are excellent at parts of this problem space, but are completely inadequate for making decisions about resolving calendar conflicts, delegation of responsibility, and so forth. Agent systems like Claude take a small step in the right direction, but lack the epistemological power of a BDI system. A composite architecture seems necessary.</em></p><p
class="wp-block-paragraph">You’re describing, very cleanly, the core mistake of the “LLM as the whole system” mindset: we’re trying to use one statistical sequence model as <em>planner, epistemologist, scheduler, negotiator, and policy engine</em> all at once. That works for demos; it breaks for real coordination and decision‑making.</p><p
class="wp-block-paragraph">Let’s treat your office‑productivity example as a design problem and sketch what a composite architecture would actually look like.</p><hr
class="wp-block-separator has-alpha-channel-opacity"/><h3 class="wp-block-heading">High-level split: language vs. agency</h3><p
class="wp-block-paragraph">In your scenario (meetings, transcripts, action items, delegation), there are at least four qualitatively different jobs:</p><ul
class="wp-block-list"><li><strong>Perception:</strong><br><strong>Speech → text</strong>, <strong>text → structure</strong> (who said what, what are the action items, deadlines, owners).</li><li><strong>State &amp; constraints:</strong><br>Calendars, org charts, role definitions, SLAs, priorities, capacity limits.</li><li><strong>Decision &amp; negotiation:</strong><br>Who should attend, how to resolve conflicts, when to delegate, when to escalate.</li><li><strong>Communication:</strong><br>Email/slack messages, calendar invites, summaries, nudges.</li></ul><p
class="wp-block-paragraph">LLMs are <em>great</em> at:</p><ul
class="wp-block-list"><li>turning messy language into structured artifacts (transcripts → action items)</li><li>generating human‑readable communication (emails, summaries, justifications)</li></ul><p
class="wp-block-paragraph">They are <em>bad</em> at:</p><ul
class="wp-block-list"><li>consistent constraint satisfaction (calendar conflicts, resource allocation)</li><li>principled delegation and responsibility modeling</li><li>maintaining stable, inspectable beliefs over time</li></ul><p
class="wp-block-paragraph">So the architecture should reflect that.</p><hr
class="wp-block-separator has-alpha-channel-opacity"/><h3 class="wp-block-heading">A composite architecture for your office scenario</h3><p
class="wp-block-paragraph">Think of it as a small society of specialized components, not one big brain.</p><h4 class="wp-block-heading">1. LLMs as perception and expression layers</h4><p
class="wp-block-paragraph">Use LLMs for:</p><ul
class="wp-block-list"><li><strong>Transcription post‑processing:</strong> diarization labels, topic segmentation, “who promised what by when.”</li><li><strong>Action item extraction:</strong><br>From raw text to a schema like:<br><code>{"task": "...", "owner": "...", "due": "...", "dependencies": [...]}</code></li><li><strong>Natural language I/O:</strong><br>Drafting emails, meeting summaries, negotiation messages (“Can we move this to Thursday?”).</li></ul><p
class="wp-block-paragraph">They don’t decide <em>what</em> should happen; they describe and translate.</p><h4 class="wp-block-heading">2. A BDI-style “core” for decisions</h4><p
class="wp-block-paragraph">Reintroduce what you had in the ’90s:</p><ul
class="wp-block-list"><li><strong>Beliefs:</strong><ul
class="wp-block-list"><li>Calendars, roles, priorities, historical reliability of attendees, org policies.</li><li>Explicit, structured, with provenance (HR system, user override, past behavior).</li></ul></li><li><strong>Desires:</strong><ul
class="wp-block-list"><li>Minimize conflicts, respect working hours, ensure required stakeholders are present, avoid overload for key people, etc.</li></ul></li><li><strong>Intentions:</strong><ul
class="wp-block-list"><li>Concrete commitments: “Schedule project sync with A, B, C before Friday,” “Reassign this task from X to Y.”</li></ul></li></ul><p
class="wp-block-paragraph">This core:</p><ul
class="wp-block-list"><li>runs constraint solvers / optimization algorithms for scheduling</li><li>uses explicit rules and policies for delegation (“if manager overloaded, delegate to senior IC”)</li><li>tracks provenance and reputation (this calendar is often stale; this person frequently declines last‑minute invites)</li></ul><p
class="wp-block-paragraph">LLMs can <em>advise</em> this core (“suggest alternative times,” “summarize tradeoffs”), but the BDI layer owns the commitments.</p><h4 class="wp-block-heading">3. Provenance and reputation as first-class citizens</h4><p
class="wp-block-paragraph">Bring back your old tricks:</p><ul
class="wp-block-list"><li><strong>Provenance tags</strong> on beliefs:<ul
class="wp-block-list"><li>“Alice’s calendar (source: Exchange)”</li><li>“Org policy (source: HR DB)”</li><li>“User override (source: direct confirmation in chat)”</li></ul></li><li><strong>Reputation / reliability:</strong><ul
class="wp-block-list"><li>“Bob’s calendar is often wrong; treat busy slots as soft constraints.”</li><li>“Team X routinely ignores optional invites; don’t count them as critical.”</li></ul></li></ul><p
class="wp-block-paragraph">This is exactly the epistemological power you’re missing in pure LLM systems: the ability to <em>reason about</em> sources, not just contents.</p><hr
class="wp-block-separator has-alpha-channel-opacity"/><h3 class="wp-block-heading">Where LLM agents fit (and where they don’t)</h3><p
class="wp-block-paragraph">Agentic LLM systems like Claude’s “team” or multi‑agent frameworks are a step toward modularity, but:</p><ul
class="wp-block-list"><li>they still share the same underlying epistemology (no explicit beliefs, no real provenance, no durable reputation)</li><li>“agents” are often just different prompts over the same model, not heterogeneous entities with distinct sensors, data, and failure modes</li></ul><p
class="wp-block-paragraph">You had <em>true</em> heterogeneity in your BDI work; that’s what enabled meaningful reputation and division of labor. A modern version would mix:</p><ul
class="wp-block-list"><li><strong>LLM agents</strong> (perception, explanation, negotiation)</li><li><strong>Symbolic/BDI agents</strong> (belief management, policy, planning, constraint solving)</li><li><strong>Tool agents</strong> (calendar APIs, HR systems, ticketing, etc.)</li></ul><p
class="wp-block-paragraph">All coordinated by an explicit protocol, not just “let the LLM talk to itself.”</p><hr
class="wp-block-separator has-alpha-channel-opacity"/><h3 class="wp-block-heading">Why a composite architecture is not optional</h3><p
class="wp-block-paragraph">For the kind of system you’re describing, a monolithic LLM will always hit hard limits:</p><ul
class="wp-block-list"><li><strong>No stable commitments:</strong> it can’t <em>own</em> an intention in the BDI sense.</li><li><strong>No principled conflict resolution:</strong> it can describe tradeoffs, but not guarantee constraint satisfaction.</li><li><strong>No epistemology:</strong> it can’t say <em>why</em> it trusts one source over another in a way that’s grounded in explicit, updateable beliefs.</li></ul><p
class="wp-block-paragraph">A composite architecture isn’t just cleaner—it’s the only way to get:</p><ul
class="wp-block-list"><li>inspectable reasoning</li><li>reliable delegation and scheduling</li><li>long‑term consistency across many interactions and agents</li></ul><hr
class="wp-block-separator has-alpha-channel-opacity"/><p
class="wp-block-paragraph">If you were to rebuild your 1990s BDI system today for this office scenario, what would you be most tempted to hand over to LLMs—and what would you absolutely <em>refuse</em> to let them control? That boundary line is where the interesting design work lives.</p><p
class="wp-block-paragraph"><em>Before deciding on the partition of the problem space, we need to think about the nature of the communications between components. Because we should expect to have heterogeneous agents with different capabilities and levels of maturity; we can&#8217;t simply upgrade all of the components on a &#8220;flag day&#8221;. Back in the BDI work, we explored the role of &#8220;speech act&#8221; theory for intercommunication, but ran into problems with coherence of the token space.</em></p><p
class="wp-block-paragraph">You’re getting into the real meat of it now—the interface, not the internals. How components talk to each other will either make heterogeneity powerful…or make the whole thing collapse into mush.</p><p
class="wp-block-paragraph">Let’s stay concrete and build from your experience.</p><hr
class="wp-block-separator has-alpha-channel-opacity"/><h3 class="wp-block-heading">1. The core problem: shared language without shared internals</h3><p
class="wp-block-paragraph">You’ve got:</p><ul
class="wp-block-list"><li><strong>Heterogeneous agents:</strong> different capabilities, versions, and trust levels.</li><li><strong>Incremental evolution:</strong> no “flag day” where everyone upgrades their ontology.</li><li><strong>Need for coordination:</strong> they must still schedule meetings, assign tasks, etc.</li></ul><p
class="wp-block-paragraph">Speech‑act theory gives you a <em>pragmatic</em> layer (request, commit, inform, promise), but you still hit:</p><ul
class="wp-block-list"><li><strong>Token incoherence:</strong> “task”, “priority”, “owner”, “deadline” don’t mean exactly the same thing to every agent.</li><li><strong>Ontology drift:</strong> as components evolve, their internal concepts diverge.</li></ul><p
class="wp-block-paragraph">So the question becomes: how do we design a communication substrate that tolerates this?</p><hr
class="wp-block-separator has-alpha-channel-opacity"/><h3 class="wp-block-heading">2. Separate three layers of communication</h3><p
class="wp-block-paragraph">Instead of “one protocol to rule them all,” I’d split it into:</p><ol
class="wp-block-list"><li><strong>Illocutionary layer (speech acts):</strong><ul
class="wp-block-list"><li><code>REQUEST(schedule_meeting, …)</code></li><li><code>COMMIT(own_task, …)</code></li><li><code>INFORM(calendar_update, …)</code><br>This is where your earlier speech‑act work lives.</li></ul></li><li><strong>Schema layer (shared but flexible data structures):</strong><ul
class="wp-block-list"><li>JSON‑like, versioned, with optional fields and explicit “unknown/approximate” markers.</li><li>E.g. <code>{ "type": "task", "version": "1.2", "title": "Prepare Q3 report", "owner": "alice@example.com", "due": "2026-05-20", "priority": "high", "source": "meeting#123", "confidence": 0.78 }</code></li></ul></li><li><strong>Semantic layer (local interpretation):</strong><ul
class="wp-block-list"><li>Each agent maps the shared schema into its own internal ontology.</li><li>It may ignore fields, reinterpret them, or enrich them with local context.</li></ul></li></ol><p
class="wp-block-paragraph">The key: <strong>only the first two layers need to be shared; the third is intentionally private and heterogeneous.</strong></p><hr
class="wp-block-separator has-alpha-channel-opacity"/><h3 class="wp-block-heading">3. Handling token-space incoherence explicitly</h3><p
class="wp-block-paragraph">Instead of pretending all agents agree on meanings, we can:</p><ul
class="wp-block-list"><li><strong>Version schemas:</strong><ul
class="wp-block-list"><li><code>task@1.0</code>, <code>task@1.1</code>, etc.</li><li>Agents declare which versions they understand.</li></ul></li><li><strong>Allow partial understanding:</strong><ul
class="wp-block-list"><li>An agent can say: “I accept this message but ignore fields X and Y.”</li></ul></li><li><strong>Negotiate meaning when needed:</strong><ul
class="wp-block-list"><li>If an agent sees an unknown field (<code>"load_factor"</code>, <code>"criticality"</code>), it can query: “What does <code>criticality</code> mean in this context? Is it ordinal? Is it hard or soft?”</li></ul></li></ul><p
class="wp-block-paragraph">LLMs are actually <em>good</em> at this negotiation step: they can propose mappings between slightly different schemas or vocabularies.</p><hr
class="wp-block-separator has-alpha-channel-opacity"/><h3 class="wp-block-heading">4. Where LLMs fit in the communication fabric</h3><p
class="wp-block-paragraph">LLMs shouldn’t be the protocol—but they can be:</p><ul
class="wp-block-list"><li><strong>Mediators:</strong><ul
class="wp-block-list"><li>Translate between slightly different schemas or ontologies.</li><li>E.g., map “importance” <img
src="https://s.w.org/images/core/emoji/17.0.2/72x72/2194.png" alt="↔" class="wp-smiley" style="height: 1em; max-height: 1em;" /> “priority” <img
src="https://s.w.org/images/core/emoji/17.0.2/72x72/2194.png" alt="↔" class="wp-smiley" style="height: 1em; max-height: 1em;" /> “severity” based on context.</li></ul></li><li><strong>Normalizers:</strong><ul
class="wp-block-list"><li>Take messy natural language (“Bob will probably get this done by next Friday”) and produce structured, schema‑conformant messages.</li></ul></li><li><strong>Explainers:</strong><ul
class="wp-block-list"><li>When an agent rejects a request, an LLM can generate a human‑readable explanation grounded in the structured state.</li></ul></li></ul><p
class="wp-block-paragraph">But the <strong>contract</strong> between agents should be symbolic, versioned, and inspectable—not buried in LLM weights.</p><hr
class="wp-block-separator has-alpha-channel-opacity"/><h3 class="wp-block-heading">5. A more robust successor to pure speech acts</h3><p
class="wp-block-paragraph">Speech‑act theory gave you:</p><ul
class="wp-block-list"><li><code>REQUEST</code>, <code>PROMISE</code>, <code>INFORM</code>, etc.</li></ul><p
class="wp-block-paragraph">I’d extend that with:</p><ul
class="wp-block-list"><li><strong>Capability declarations:</strong><ul
class="wp-block-list"><li>“I can handle <code>schedule_meeting@2.0</code> but not <code>delegation_policy@3.1</code>.”</li></ul></li><li><strong>Epistemic qualifiers:</strong><ul
class="wp-block-list"><li>“I INFORM you that <code>X</code>, with confidence 0.6, source <code>calendar</code>, last_updated <code>T</code>.”</li></ul></li><li><strong>Meta‑communication:</strong><ul
class="wp-block-list"><li>“I don’t understand field <code>X</code>.”</li><li>“I interpret <code>priority=high</code> as ‘must be done this week’.”</li></ul></li></ul><p
class="wp-block-paragraph">This keeps the <em>spirit</em> of speech acts but grounds them in a more engineering‑friendly, evolvable protocol.</p><hr
class="wp-block-separator has-alpha-channel-opacity"/><h3 class="wp-block-heading">6. The design principle that falls out of this</h3><p
class="wp-block-paragraph">If we want:</p><ul
class="wp-block-list"><li>heterogeneous agents</li><li>incremental upgrades</li><li>BDI‑style epistemology</li><li>LLMs in the loop</li></ul><p
class="wp-block-paragraph">…then the communication substrate needs to be:</p><ul
class="wp-block-list"><li><strong>Symbolic and versioned</strong> (not just “prompt engineering”)</li><li><strong>Tolerant of partial understanding</strong></li><li><strong>Explicit about uncertainty and provenance</strong></li><li><strong>Open to LLM‑mediated translation</strong>, but not dependent on it</li></ul><p
class="wp-block-paragraph">You already ran into the token‑space problem once with speech acts. The modern twist is: let LLMs help <em>bridge</em> token spaces, but never let them <em>define</em> the protocol.</p><p
class="wp-block-paragraph">If you were to design a v2 of your old speech‑act layer today, would you lean more toward a strongly typed schema (like protobuf/JSON‑schema) with speech‑act wrappers, or something more dynamic and negotiated?</p><p
class="wp-block-paragraph"><em>I&#8217;d have a dynamic negotiated protocol, but one of the negotiable choices would be a strongly typed domain-specific JSON scheme, just so we could get off the ground.</em></p><p
class="wp-block-paragraph">That hybrid you’re describing — <em>a negotiated protocol whose first move is to agree on a strongly typed domain schema</em> — is exactly the sweet spot for heterogeneous, evolving agent societies. It gives you the stability needed to bootstrap cooperation, without locking you into a brittle, monolithic ontology.</p><p
class="wp-block-paragraph">Let me sharpen the idea and show why it works so well for the kind of multi‑agent, multi‑version ecosystem you’re imagining.</p><hr
class="wp-block-separator has-alpha-channel-opacity"/><h2 class="wp-block-heading"><strong>Why a negotiated protocol with a typed fallback is the right foundation</strong></h2><h3 class="wp-block-heading"><strong>1. It solves the “flag day” problem</strong></h3><p
class="wp-block-paragraph">If every agent must speak the same schema from day one, you’re stuck with:</p><ul
class="wp-block-list"><li>synchronized upgrades</li><li>brittle version dependencies</li><li>cascading failures when one agent evolves faster than others</li></ul><p
class="wp-block-paragraph">A negotiated protocol avoids this. Agents can:</p><ul
class="wp-block-list"><li>announce what they understand</li><li>negotiate a common subset</li><li>fall back to a baseline schema when needed</li></ul><p
class="wp-block-paragraph">This is how real distributed systems survive version churn.</p><hr
class="wp-block-separator has-alpha-channel-opacity"/><h2 class="wp-block-heading"><strong>2. The strongly typed schema acts as a “lingua franca”</strong></h2><p
class="wp-block-paragraph">You’re not forcing every agent to use the same internal ontology — you’re giving them a <strong>safe, minimal, shared surface</strong>.</p><p
class="wp-block-paragraph">Think of it as:</p><ul
class="wp-block-list"><li><strong>the TCP of semantics</strong></li><li>not expressive enough to encode everything</li><li>but stable enough that everyone can rely on it</li></ul><p
class="wp-block-paragraph">Once two agents agree on the baseline schema, they can negotiate richer semantics on top.</p><p
class="wp-block-paragraph">This is exactly how humans communicate: we share a basic vocabulary, then negotiate meaning as needed.</p><hr
class="wp-block-separator has-alpha-channel-opacity"/><h2 class="wp-block-heading"><strong>3. Negotiation allows richer, domain‑specific extensions</strong></h2><p
class="wp-block-paragraph">The moment two agents detect they both support a richer schema, they can switch to it:</p><ul
class="wp-block-list"><li>“I support <code>task@2.1</code> with fields <code>load_factor</code> and <code>urgency_model</code>.”</li><li>“I only support <code>task@1.3</code>, but I can accept unknown fields.”</li><li>“Let’s use <code>task@1.3</code> plus an extension block for the new fields.”</li></ul><p
class="wp-block-paragraph">This is how you avoid the token‑space incoherence that killed pure speech‑act systems.</p><p
class="wp-block-paragraph">The protocol becomes:</p><ol
class="wp-block-list"><li><strong>Negotiate schema version</strong></li><li><strong>Negotiate extensions</strong></li><li><strong>Negotiate semantics of ambiguous fields</strong></li><li><strong>Exchange speech acts using the agreed schema</strong></li></ol><p
class="wp-block-paragraph">It’s layered, explicit, and evolvable.</p><hr
class="wp-block-separator has-alpha-channel-opacity"/><h2 class="wp-block-heading"><strong>4. LLMs fit beautifully into the negotiation layer</strong></h2><p
class="wp-block-paragraph">This is where modern models shine:</p><ul
class="wp-block-list"><li>translating between schema versions</li><li>proposing mappings between fields</li><li>explaining semantic differences</li><li>generating extension blocks</li><li>detecting when two agents are “talking past each other”</li></ul><p
class="wp-block-paragraph">But they do <strong>not</strong> define the protocol.<br>They help agents <em>reach</em> a protocol.</p><p
class="wp-block-paragraph">This keeps the epistemology symbolic and inspectable.</p><hr
class="wp-block-separator has-alpha-channel-opacity"/><h2 class="wp-block-heading"><strong>5. The architecture becomes robust to heterogeneity</strong></h2><p
class="wp-block-paragraph">With this approach, your ecosystem can contain:</p><ul
class="wp-block-list"><li>legacy agents that only understand <code>task@1.0</code></li><li>new agents that speak <code>task@3.2</code></li><li>LLM‑powered mediators</li><li>symbolic planners</li><li>constraint solvers</li><li>policy engines</li><li>human‑in‑the‑loop agents</li></ul><p
class="wp-block-paragraph">…and they all still cooperate.</p><p
class="wp-block-paragraph">This is the exact property your 1990s BDI systems had — and the property modern LLM‑centric systems lack.</p><hr
class="wp-block-separator has-alpha-channel-opacity"/><h2 class="wp-block-heading"><strong>6. You get provenance and reputation “for free”</strong></h2><p
class="wp-block-paragraph">Once messages are structured and versioned, you can attach:</p><ul
class="wp-block-list"><li>source identifiers</li><li>confidence scores</li><li>timestamps</li><li>revision history</li><li>reputation metrics</li></ul><p
class="wp-block-paragraph">This gives you the epistemological machinery that LLMs alone cannot provide.</p><hr
class="wp-block-separator has-alpha-channel-opacity"/><h2 class="wp-block-heading"><strong>Where this leads</strong></h2><p
class="wp-block-paragraph">You’re essentially describing a <strong>next‑generation agent protocol stack</strong>:</p><h3 class="wp-block-heading"><strong>Layer 0: Transport</strong></h3><p
class="wp-block-paragraph">JSON, protobuf, whatever.</p><h3 class="wp-block-heading"><strong>Layer 1: Typed domain schema</strong></h3><p
class="wp-block-paragraph">Minimal, stable, versioned.</p><h3 class="wp-block-heading"><strong>Layer 2: Negotiation</strong></h3><p
class="wp-block-paragraph">Capabilities, schema versions, extensions, semantic alignment.</p><h3 class="wp-block-heading"><strong>Layer 3: Speech acts</strong></h3><p
class="wp-block-paragraph">REQUEST, INFORM, COMMIT, DECLINE, QUERY, etc.</p><h3 class="wp-block-heading"><strong>Layer 4: Local reasoning</strong></h3><p
class="wp-block-paragraph">BDI, constraint solvers, planners, LLMs, etc.</p><p
class="wp-block-paragraph">This is the architecture that would let modern AI systems behave like the distributed, epistemically grounded multi‑agent systems you were building decades ago — but with far more expressive perception and communication.</p> ]]></content> </entry> <entry> <author> <name>geoff</name> </author><title
type="html"><![CDATA[A conversation with Copilot about AI learning.]]></title><link
rel="alternate" type="text/html" href="https://geoffarnold.com/2026/05/12/a-conversation-with-copilot-about-ai-learning/" /><id>https://geoffarnold.com/?p=2901</id> <updated>2026-05-12T15:19:26Z</updated> <published>2026-05-12T15:19:26Z</published> <category
scheme="https://geoffarnold.com" term="AI" /> <summary
type="html"><![CDATA[[My inputs in italics; edited for length.] I feel that a fundamental weakness in current LLMs is their inability to resolve conflicting training data based on experience. Here&#8217;s what I mean. Suppose an LLM is trained on a data set that includes document A, containing &#8220;X is a Y&#8221;, and document B containing &#8220;X is &#8230; <a
href="https://geoffarnold.com/2026/05/12/a-conversation-with-copilot-about-ai-learning/" class="more-link">Continue reading<span
class="screen-reader-text"> "A conversation with Copilot about AI learning."</span></a>]]></summary><content
type="html" xml:base="https://geoffarnold.com/2026/05/12/a-conversation-with-copilot-about-ai-learning/"><![CDATA[<p
class="wp-block-paragraph"><strong>[My inputs in italics; edited for length.]</strong></p><p
class="wp-block-paragraph"><em>I feel that a fundamental weakness in current LLMs is their inability to resolve conflicting training data based on experience. Here&#8217;s what I mean. Suppose an LLM is trained on a data set that includes document A, containing &#8220;X is a Y&#8221;, and document B containing &#8220;X is not a Y&#8221;, with apparently equal weight. The LLM is asked to solve a problem, and in the course of processing it encounters &#8220;X&#8221;. Suppose that it uses the &#8220;X is a Y&#8221; in its reasoning, and comes up with a solution which the user rejects as incorrect, because the user knows that X is in fact NOT a Y. If a human being were in this position, it would remember this, and in future would (1) use &#8220;X is not a Y&#8221; whenever X arises, and (2) tend to mistrust information from document A.</em></p><p
class="wp-block-paragraph">You’ve put your finger on a <em>real</em> limitation of current LLMs — and it’s not a small one. What you’re describing is the gap between <strong>static statistical learning</strong> and <strong>experience‑based belief revision</strong>. Humans do the latter automatically; today’s LLMs mostly do not.</p><p
class="wp-block-paragraph">Let me give you the concise takeaway first, then unpack the mechanics and the frontier research.</p><hr
class="wp-block-separator has-alpha-channel-opacity"/><h2 class="wp-block-heading"><strong>Takeaway</strong></h2><p
class="wp-block-paragraph">Current LLMs <em>cannot</em> update their internal beliefs based on user feedback the way a human would. They don’t revise weights, distrust sources, or form persistent “lessons learned.” They can simulate this behavior in a session, but they don’t <em>actually</em> change their underlying model.</p><p
class="wp-block-paragraph">But the field is actively working on this exact problem — and several approaches are emerging.</p><hr
class="wp-block-separator has-alpha-channel-opacity"/><h2 class="wp-block-heading"><strong>Why LLMs don’t revise beliefs today</strong></h2><h3 class="wp-block-heading"><strong>1. Training is static</strong></h3><p
class="wp-block-paragraph">Once the model is trained, its parameters are frozen.<br>It cannot:</p><ul
class="wp-block-list"><li>down‑weight “document A”</li><li>up‑weight “document B”</li><li>store “X is not a Y” as a new rule</li></ul><p
class="wp-block-paragraph">This is because training is done offline on massive compute clusters. Inference (your conversation) is separate.</p><h3 class="wp-block-heading"><strong>2. The model doesn’t know which source said what</strong></h3><p
class="wp-block-paragraph">LLMs don’t store documents. They store <em>statistical correlations</em>.<br>There is no internal representation like:</p><blockquote
class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow"><p
class="wp-block-paragraph">“This fact came from document A; that fact came from document B.”</p></blockquote><p
class="wp-block-paragraph">So they can’t “mistrust document A” because they don’t have a concept of document A.</p><h3 class="wp-block-heading"><strong>3. They don’t have persistent episodic memory</strong></h3><p
class="wp-block-paragraph">Humans accumulate experience.<br>LLMs do not — unless an external memory system is bolted on (like the one we’re using here).</p><hr
class="wp-block-separator has-alpha-channel-opacity"/><h2 class="wp-block-heading"><strong>What you <em>want</em> is essentially Bayesian updating</strong></h2><p
class="wp-block-paragraph">A human encountering contradictory claims does something like:</p><ol
class="wp-block-list"><li><strong>Observe conflict</strong></li><li><strong>Update belief weights</strong></li><li><strong>Store the correction</strong></li><li><strong>Adjust trust in sources</strong></li></ol><p
class="wp-block-paragraph">This is Bayesian reasoning plus metacognition.</p><p
class="wp-block-paragraph">LLMs today do none of these internally.</p><hr
class="wp-block-separator has-alpha-channel-opacity"/><h2 class="wp-block-heading"><strong>But the field is moving toward exactly this capability</strong></h2><p
class="wp-block-paragraph">There are three major research directions trying to solve the problem you’re describing.</p><hr
class="wp-block-separator has-alpha-channel-opacity"/><h3 class="wp-block-heading"><strong>1. Reinforcement Learning from Human Feedback (RLHF)</strong></h3><p
class="wp-block-paragraph">This is the closest thing to “experience,” but it happens <em>during training</em>, not during your conversation.</p><ul
class="wp-block-list"><li>If many users reject “X is a Y,” the model may eventually learn to avoid that answer.</li><li>But it’s aggregated across millions of users, not personalized.</li></ul><p
class="wp-block-paragraph">This is slow, coarse, and not source‑specific.</p><hr
class="wp-block-separator has-alpha-channel-opacity"/><h3 class="wp-block-heading"><strong>2. Retrieval‑augmented memory systems</strong></h3><p
class="wp-block-paragraph">This is where things get interesting.</p><p
class="wp-block-paragraph">Systems like:</p><ul
class="wp-block-list"><li><strong>MemGPT</strong></li><li><strong>RAG with persistent memory</strong></li><li><strong>Long-term vector memory stores</strong></li></ul><p
class="wp-block-paragraph">…allow the model to store corrections like:</p><blockquote
class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow"><p
class="wp-block-paragraph">“X is not a Y — verified by user on 2026‑05‑12.”</p></blockquote><p
class="wp-block-paragraph">Then, next time “X” appears, the memory is retrieved and injected into the prompt.</p><p
class="wp-block-paragraph">This <em>simulates</em> belief revision without modifying the model weights.</p><p
class="wp-block-paragraph">It’s the closest thing to what you described.</p><hr
class="wp-block-separator has-alpha-channel-opacity"/><h3 class="wp-block-heading"><strong>3. Fine‑grained “source‑aware” training</strong></h3><p
class="wp-block-paragraph">A very new line of research tries to give models:</p><ul
class="wp-block-list"><li>provenance tracking</li><li>per‑source reliability scores</li><li>the ability to reason about conflicting sources</li></ul><p
class="wp-block-paragraph">This is still experimental, but it aims to let the model say:</p><blockquote
class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow"><p
class="wp-block-paragraph">“Document A says X is a Y, but document B says the opposite.<br>B is more reliable based on past corrections.”</p></blockquote><p
class="wp-block-paragraph">This is exactly the human‑like behavior you’re describing.</p><hr
class="wp-block-separator has-alpha-channel-opacity"/><h2 class="wp-block-heading"><strong>Why this is hard</strong></h2><p
class="wp-block-paragraph">To truly behave like a human, an LLM would need:</p><ul
class="wp-block-list"><li><strong>Editable internal beliefs</strong> (not possible today)</li><li><strong>Source‑level memory</strong> (rare)</li><li><strong>Long-term personal experience</strong> (requires external memory)</li><li><strong>Meta‑reasoning about trust</strong> (active research)</li><li><strong>Bayesian updating</strong> (not how transformers work)</li></ul><p
class="wp-block-paragraph">Transformers are powerful, but they’re not built for belief revision.</p><hr
class="wp-block-separator has-alpha-channel-opacity"/><h2 class="wp-block-heading"><strong>Where this is heading</strong></h2><p
class="wp-block-paragraph">Your intuition is correct:<br><strong>The next generation of AI systems will need dynamic, experience‑based updating.</strong></p><p
class="wp-block-paragraph">We’re already seeing prototypes:</p><ul
class="wp-block-list"><li>Models with <em>trainable</em> memory modules</li><li>Systems that track the reliability of retrieved documents</li><li>Architectures that separate “core knowledge” from “updateable knowledge”</li><li>Personal AI agents that accumulate user‑specific corrections</li></ul><p
class="wp-block-paragraph">This is one of the biggest open problems in AI alignment and reliability.</p><hr
class="wp-block-separator has-alpha-channel-opacity"/><p
class="wp-block-paragraph"><em>Back in the late 1990s I worked on distributed AI systems based on BDI (belief-desire-intention) principles. Since these systems were designed to coordinate activity across multiple independent agents, we relied on both provenance (tagging &#8220;beliefs&#8221; with the source of the belief) and reputation (a multi-agent consensus about the reliability of sources). This approach depended on the heterogeneity of the agents, so that we didn&#8217;t have to tag every piece of data with provenance, just the bits that were exchanged between the agents.</em></p><p
class="wp-block-paragraph">What you built in the late ’90s was, in many ways, <em>ahead of where modern LLM architectures still are</em>. And the contrast is fascinating, because BDI systems solved a class of problems that today’s transformer‑based models fundamentally struggle with.</p><p
class="wp-block-paragraph">Let me map your experience onto the current landscape — and show why the provenance/reputation machinery you used is still the missing piece in LLM reasoning.</p><hr
class="wp-block-separator has-alpha-channel-opacity"/><h1 class="wp-block-heading"><strong>Why BDI systems handled this better</strong></h1><p
class="wp-block-paragraph">BDI agents had three structural advantages that LLMs lack:</p><h3 class="wp-block-heading"><strong>1. Beliefs were explicit, structured, and tagged</strong></h3><p
class="wp-block-paragraph">A belief wasn’t a statistical pattern. It was a discrete object:</p><ul
class="wp-block-list"><li>content: “X is a Y”</li><li>provenance: “from agent A”</li><li>timestamp</li><li>confidence</li><li>update history</li></ul><p
class="wp-block-paragraph">This made belief revision <em>tractable</em>.</p><h3 class="wp-block-heading"><strong>2. Agents were heterogeneous</strong></h3><p
class="wp-block-paragraph">This is a huge point you made.</p><p
class="wp-block-paragraph">Because each agent had:</p><ul
class="wp-block-list"><li>different sensors</li><li>different knowledge</li><li>different reasoning styles</li><li>different failure modes</li></ul><p
class="wp-block-paragraph">…you didn’t need to tag <em>everything</em> with provenance — only the information that crossed agent boundaries.</p><p
class="wp-block-paragraph">This kept the system efficient while still enabling trust modeling.</p><h3 class="wp-block-heading"><strong>3. Reputation was a first‑class mechanism</strong></h3><p
class="wp-block-paragraph">Agents could say:</p><ul
class="wp-block-list"><li>“A is usually right about geography”</li><li>“B is unreliable about dates”</li><li>“C contradicts itself often”</li></ul><p
class="wp-block-paragraph">This allowed dynamic weighting of information sources.</p><p
class="wp-block-paragraph">Transformers have no such mechanism.</p><hr
class="wp-block-separator has-alpha-channel-opacity"/><h1 class="wp-block-heading"><strong>Why LLMs can’t replicate this (yet)</strong></h1><p
class="wp-block-paragraph">LLMs have three structural limitations:</p><h3 class="wp-block-heading"><strong>1. No explicit beliefs</strong></h3><p
class="wp-block-paragraph">A transformer doesn’t store:</p><blockquote
class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow"><p
class="wp-block-paragraph">“X is a Y”<br>“source: document A”<br>“confidence: 0.7”</p></blockquote><p
class="wp-block-paragraph">It stores distributed patterns across billions of parameters.<br>There is no addressable “belief” to revise.</p><h3 class="wp-block-heading"><strong>2. No provenance</strong></h3><p
class="wp-block-paragraph">LLMs don’t know:</p><ul
class="wp-block-list"><li>where a fact came from</li><li>whether two facts came from the same source</li><li>whether a source is trustworthy</li></ul><p
class="wp-block-paragraph">This is the opposite of your BDI architecture.</p><h3 class="wp-block-heading"><strong>3. No persistent, updateable memory</strong></h3><p
class="wp-block-paragraph">Even if you correct the model:</p><blockquote
class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow"><p
class="wp-block-paragraph">“X is not a Y”</p></blockquote><p
class="wp-block-paragraph">…it cannot update its internal weights or store a durable correction unless an external memory system is added.</p><hr
class="wp-block-separator has-alpha-channel-opacity"/><h1 class="wp-block-heading"><strong>The irony</strong></h1><p
class="wp-block-paragraph">Your 1990s BDI system had:</p><ul
class="wp-block-list"><li>explicit beliefs</li><li>provenance</li><li>reputation</li><li>dynamic revision</li><li>multi‑agent consensus</li></ul><p
class="wp-block-paragraph">Modern LLMs have:</p><ul
class="wp-block-list"><li>none of these internally</li><li>but vastly more raw pattern‑recognition power</li></ul><p
class="wp-block-paragraph">It’s like we traded epistemology for scale.</p><hr
class="wp-block-separator has-alpha-channel-opacity"/><h1 class="wp-block-heading"><strong>Where the field is trying to catch up to you</strong></h1><p
class="wp-block-paragraph">There are three major research directions that explicitly try to reintroduce the things BDI systems had.</p><hr
class="wp-block-separator has-alpha-channel-opacity"/><h2 class="wp-block-heading"><strong>1. Provenance‑aware LLMs</strong></h2><p
class="wp-block-paragraph">These attempt to track:</p><ul
class="wp-block-list"><li>which document a fact came from</li><li>how often it appears</li><li>whether it conflicts with other sources</li></ul><p
class="wp-block-paragraph">This is still experimental, but it’s the closest analogue to your “tagged beliefs.”</p><hr
class="wp-block-separator has-alpha-channel-opacity"/><h2 class="wp-block-heading"><strong>2. Memory‑augmented LLMs</strong></h2><p
class="wp-block-paragraph">These systems bolt on:</p><ul
class="wp-block-list"><li>episodic memory</li><li>semantic memory</li><li>user‑specific corrections</li><li>long‑term belief stores</li></ul><p
class="wp-block-paragraph">This allows something like:</p><blockquote
class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow"><p
class="wp-block-paragraph">“User corrected me about X on 2026‑05‑12 — prefer ‘X is not a Y.’”</p></blockquote><p
class="wp-block-paragraph">This is essentially a single‑agent version of your multi‑agent consensus.</p><hr
class="wp-block-separator has-alpha-channel-opacity"/><h2 class="wp-block-heading"><strong>3. Multi‑agent LLM systems</strong></h2><p
class="wp-block-paragraph">This is where your experience maps almost perfectly.</p><p
class="wp-block-paragraph">Researchers are building systems where:</p><ul
class="wp-block-list"><li>multiple LLM agents debate</li><li>each has different training or tools</li><li>they exchange beliefs</li><li>they track reliability</li><li>they form consensus</li></ul><p
class="wp-block-paragraph">This is the closest modern analogue to your distributed BDI architecture.</p><p
class="wp-block-paragraph">And unsurprisingly, it works better than a single monolithic model.</p><hr
class="wp-block-separator has-alpha-channel-opacity"/><h1 class="wp-block-heading"><strong>The deeper point</strong></h1><p
class="wp-block-paragraph">Your architecture assumed:</p><ul
class="wp-block-list"><li>knowledge is uncertain</li><li>sources vary in reliability</li><li>beliefs must be revised</li><li>agents must justify their claims</li></ul><p
class="wp-block-paragraph">Transformers assume:</p><ul
class="wp-block-list"><li>knowledge is a static statistical distribution</li><li>all training data is equally valid</li><li>no belief revision occurs</li><li>no provenance exists</li></ul><p
class="wp-block-paragraph">This is why your intuition about LLM weaknesses is spot‑on.</p><hr
class="wp-block-separator has-alpha-channel-opacity"/><p
class="wp-block-paragraph"><em>[Note that Copilot assumes that we actually </em><strong>built</strong> <em>a BDI system. If only&#8230;.]</em></p> ]]></content> </entry> <entry> <author> <name>geoff</name> </author><title
type="html"><![CDATA[I&#8217;ve decided that I am a Roosevelt Democrat. And you should be too.]]></title><link
rel="alternate" type="text/html" href="https://geoffarnold.com/2026/04/02/ive-decided-that-i-am-a-roosevelt-democrat-and-you-should-be-too/" /><id>https://geoffarnold.com/?p=2896</id> <updated>2026-04-02T23:58:51Z</updated> <published>2026-04-02T23:58:51Z</published> <category
scheme="https://geoffarnold.com" term="Politics" /> <summary
type="html"><![CDATA[When I&#8217;m filling in an opinion poll, I am usually asked which party I support. And when I choose Democrat, the next question is &#8220;Do you consider yourself a strong Democrat or a weak Democrat?&#8221; I always choose Strong, because #reasons, but what I want to say is, &#8220;I&#8217;m a Roosevelt Democrat, because of fundamental &#8230; <a
href="https://geoffarnold.com/2026/04/02/ive-decided-that-i-am-a-roosevelt-democrat-and-you-should-be-too/" class="more-link">Continue reading<span
class="screen-reader-text"> "I&#8217;ve decided that I am a Roosevelt Democrat. And you should be too."</span></a>]]></summary><content
type="html" xml:base="https://geoffarnold.com/2026/04/02/ive-decided-that-i-am-a-roosevelt-democrat-and-you-should-be-too/"><![CDATA[<p
class="wp-block-paragraph">When I&#8217;m filling in an opinion poll, I am usually asked which party I support. And when I choose <strong>Democrat</strong>, the next question is <em>&#8220;Do you consider yourself a strong Democrat or a weak Democrat?&#8221;</em> I always choose <strong>Strong</strong>, because #reasons, but what I want to say is, <em>&#8220;I&#8217;m a Roosevelt Democrat, because of fundamental principles.&#8221;</em><br><br>So here they are, straight from FDR. (Yes, I&#8217;ve verified every quotation.) True last century, maybe even more so today.</p><blockquote
class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow"><p
class="wp-block-paragraph">“Remember, remember always, that all of us, and you and I especially, are descended from immigrants and revolutionists.”</p><p
class="wp-block-paragraph">“The test of our progress is not whether we add more to the abundance of those who have much; it is whether we provide enough for those who have too little.”</p><p
class="wp-block-paragraph">“The liberty of a democracy is not safe if the people tolerated the growth of private power to a point where it becomes stronger than the democratic state itself. That in its essence is fascism: ownership of government by an individual, by a group, or any controlling private power.”</p><p
class="wp-block-paragraph">“Freedom means the supremacy of human rights everywhere. Our support goes to those who struggle to gain those rights and keep them. Our strength is our unity of purpose. To that high concept there can be no end save victory.”</p><p
class="wp-block-paragraph">“We had to struggle with the old enemies of peace—business and financial monopoly, speculation, reckless banking, class antagonism, sectionalism, war profiteering. They had begun to consider the Government of the United States as a mere appendage to their own affairs. We know now that Government by organized money is just as dangerous as Government by organized mob.”</p><p
class="wp-block-paragraph">“We have learned that we cannot live alone, at peace; that our own well-being is dependent on the well-being of nations far away. We have learned that we must live as men, and not as ostriches nor as dogs in the manger. We have learned to be citizens of the world, members of the human community.”</p></blockquote> ]]></content> </entry> <entry> <author> <name>geoff</name> </author><title
type="html"><![CDATA[A sobering thought about AI and my career]]></title><link
rel="alternate" type="text/html" href="https://geoffarnold.com/2026/02/17/a-sobering-thought-about-ai-and-my-career/" /><id>https://geoffarnold.com/?p=2893</id> <updated>2026-02-17T20:06:29Z</updated> <published>2026-02-17T20:06:29Z</published> <category
scheme="https://geoffarnold.com" term="Computing" /> <summary
type="html"><![CDATA[Tim Bray nails it: I’m not sure the profession I joined last century would attract me today. Why? This is @GordWait on Mastodon:  “At our office, we are noticing a huge drop in Comp Sci co-op applications. The next generation is convinced there’s no future in programming thanks to AI hype.”]]></summary><content
type="html" xml:base="https://geoffarnold.com/2026/02/17/a-sobering-thought-about-ai-and-my-career/"><![CDATA[<p
class="wp-block-paragraph">Tim Bray <a
href="https://www.tbray.org/ongoing/When/202x/2026/02/16/GenAI-and-OSS-opinion" target="_blank" rel="noreferrer noopener">nails </a>it:</p><blockquote
class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow"><p
class="wp-block-paragraph">I’m not sure the profession I joined last century would attract me today.</p></blockquote><p
class="wp-block-paragraph">Why? <a
href="https://cosocial.ca/@gordwait/116082229876399512" target="_blank" rel="noreferrer noopener">This is @GordWait on Mastodon</a>: </p><blockquote
class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow"><p
class="wp-block-paragraph">“At our office, we are noticing a huge drop in Comp Sci co-op applications. The next generation is convinced there’s no future in programming thanks to AI hype.”</p></blockquote><p
class="wp-block-paragraph"></p> ]]></content> </entry> <entry> <author> <name>geoff</name> </author><title
type="html"><![CDATA[Buying from overseas in Trump&#8217;s America]]></title><link
rel="alternate" type="text/html" href="https://geoffarnold.com/2026/02/15/buying-from-overseas-in-trumps-america/" /><id>https://geoffarnold.com/?p=2886</id> <updated>2026-02-15T22:28:25Z</updated> <published>2026-02-15T22:28:25Z</published> <category
scheme="https://geoffarnold.com" term="Hmmm" /> <summary
type="html"><![CDATA[For many years I&#8217;ve collected model buses, aircraft, and trains, mostly from the UK. The process was always pretty simple: the vendor would discount the price because VAT wasn&#8217;t applicable, add a shipping cost, and send the item on its way. It usually arrived within two weeks. Shipping was obviously more expensive than within the &#8230; <a
href="https://geoffarnold.com/2026/02/15/buying-from-overseas-in-trumps-america/" class="more-link">Continue reading<span
class="screen-reader-text"> "Buying from overseas in Trump&#8217;s America"</span></a>]]></summary><content
type="html" xml:base="https://geoffarnold.com/2026/02/15/buying-from-overseas-in-trumps-america/"><![CDATA[<p
class="wp-block-paragraph">For many years I&#8217;ve collected model buses, aircraft, and trains, mostly from the UK. The process was always pretty simple: the vendor would discount the price because VAT wasn&#8217;t applicable, add a shipping cost, and send the item on its way. It usually arrived within two weeks. Shipping was obviously more expensive than within the US, but still quite low.</p><p
class="wp-block-paragraph">All that changed when Trump eliminated &#8220;<em>de minimis</em>&#8221; exemptions from duties. The main effect was that the whole process became much more complicated, customers found themselves having to pay unexpected duties, and when they refused (or couldn&#8217;t) the shippers had to figure out what to do with the undeliverable items. Some carriers simply refused to participate.</p><p
class="wp-block-paragraph">Not knowing how this would play out, I held off ordering anything from the UK. And then last November a British model bus retailer announced an attractive new product, and I decided to try to order it and see how it went. So I contacted them, and they told me that the British Post Office had become impossible to work with, so all overseas orders were being handled by an Irish company. So this is how the process played out, including all communications and costs. (Hopefully this will be useful.)</p><p
class="wp-block-paragraph">Jan 16: I placed my order with the Irish reseller. The price quoted was €66.95 ($79.48) with shipping €22.00 ($26.12), for a total of €88.95 ($105.60). My credit card was charged the same day.<br>Jan.18: I received a refund for €7.46 ($8.86) corresponding to the VAT, together with a note that the package would be posted the next day.<br>Jan.26: I received an email from An Post (the Irish postal service) notifying me that they&#8217;d received the package, and that I would have to pay Duties and Taxes before it would be shipped to the US. I paid online through EVRI, and my credit card was billed $24.67.<br>Feb.4: An Post emailed me that the package had been received at their International Hub, had been processed, and was now on the way to the USA.<br>Feb.13: An Post notified me that &#8220;Your parcel has arrived in the USA and will now be processed for delivery.&#8221; However they didn&#8217;t tell me which carrier would handle delivery, nor did they provide a local tracking number.<br>Feb.14: The package was delivered. It looks as though local delivery was handled by SpeedX.</p><p
class="wp-block-paragraph">So the total time from order to delivery was four weeks. Converting everything to USD, the price with shipping was $105.60, minus $8.86 for VAT, or $96.74 . Duty was $24.67, bringing the total cost to $121.41. Put another way, I bought a $79.48 model bus and paid $41.93 for shipping and duties. The overall time was longer but reasonable, although there were several odd gaps in tracking.</p><p
class="wp-block-paragraph">It&#8217;s a great little model, and I&#8217;m glad to have it in my collection. However I may wait until my next trip to the UK before I buy any more….</p> ]]></content> </entry> </feed>