Back to Blog

Why 12 High-Profile AI & Algorithmic Failures Prove Enterprises Need Verifiable On-Chain Context Before the New EU AI Act Deadlines

From catastrophic financial losses to regulatory liability: how missing persistent context creates massive enterprise risk, and why on-chain memory is the forward-compatible fix enterprises must build before the 2027 and 2028 enforcement dates.

Compliance 20 min read By BlockSkunk

Knight Capital lost $440M in 45 minutes. Zillow wrote down $881M on homes it couldn't sell. A Cruise robotaxi dragged a pedestrian because it forgot she was there. Twelve incidents, one failure mode: no persistent, verifiable, shared context, plus what's missing at the protocol level as high-risk EU AI Act rules move toward 2027 and 2028 application dates.

Futuristic cityscape representing AI system state and verifiable on-chain context for EU AI Act compliance

Steven Schwartz had been a lawyer for 30 years. He’d never submitted a bad brief. Then he asked ChatGPT to help with a case against Avianca Airlines and filed six citations that didn’t exist. The full story belongs in the legal section below. But the reason it happened is the same reason Knight Capital lost $440 million in 45 minutes, Zillow wrote down $881 million on homes it can’t sell, and a Cruise robotaxi dragged a pedestrian 20 feet after a collision it had already survived.

Every AI session starts from zero.

That’s the failure mode. Not hallucination as a quirk, but a fundamental architectural gap: AI systems make consequential decisions with no memory of what they’ve already done, no awareness of what peer systems are doing, and no verified connection to the state of the world they’re supposed to be reasoning about.

BlockSkunk builds permissioned blockchain infrastructure for enterprises, and this pattern shows up in nearly every deployment conversation we have. The protocol-level problem is this: “The problem isn’t that AI forgets. It’s that there’s no agreed-upon place for it to remember. Until the memory layer is immutable, portable, and verifiable by a third party, you’re just logging to a database your own admin can edit.”

That’s the gap. And the implications run further than compliance. Persistent on-chain context isn’t just a risk fix; it’s the foundation for a more open model where users own their AI relationships, preferences, and interaction history across platforms. The regulated enterprise path to that future is what we’re building toward.

Twelve incidents prove how expensive it’s been to wait.


Summary: Why Choose BlockSkunk?

  • Eliminates AI “amnesia.” Traditional AI resets every session, which can produce catastrophic, costly errors (financial trading losses, autonomous tracking failures, and more). BlockSkunk establishes an immutable, shared memory layer so multi-agent systems stay anchored to original instructions and runaway actions are easier to prevent.

  • Aligns with where the EU AI Act is heading. High-risk application timelines are shifting: under recent European Parliament agreements, proposed fixed dates include 2 December 2027 for high-risk systems in Annex III and 2 August 2028 for systems in Annex I. Regulators will expect cryptographic proof of AI decisions, not admin-editable logs. BlockSkunk replaces easily manipulated centralized databases with a verifiable ledger, reducing exposure to penalties of up to €35 million or 7% of global turnover when obligations bite.

  • Supports bias-detection safeguards. New provisions allow processing of personal data to detect and correct algorithmic bias only with strict safeguards and when strictly necessary. BlockSkunk’s zero-trust, permissioned model fits that pattern: on-chain governance can strengthen the case that sensitive data was accessed only for permitted bias-correction workflows.

  • Enterprise-ready infrastructure. Avoid building permissioned chains from scratch: managed blockchain as a service (mBaaS) with deployment patterns aimed at SOC 2, ISO 27001, and ISO 42001 (AI management) expectations.

  • Use the regulatory runway deliberately. The extension exists partly because standards and compliance architectures are not finished. A pitch-free, ~30-minute architecture workshop helps map persistent AI context now so systems stay forward-compatible as implementing acts and standards land.


At a glance: twelve incidents, one failure mode

Not every incident below involved generative AI, but every one failed for the same reason: no immutable, shared, verifiable memory layer.

IncidentDateLoss / ImpactEraRoot cause (persistent context gap)
Knight CapitalAug 2012$440M in 45 minPre-LLM (algorithmic)No persistent order-state; algorithm re-fired identical orders endlessly
2010 Flash CrashMay 2010~$1T temporary destructionPre-LLM (algorithmic)Each HFT bot lacked shared market-state context; no coordination layer
Zillow iBuying2021~$881M total iBuying lossesPre-LLM (algorithmic)Model couldn’t retain or reference its own pattern of systematic overpayment
Compound oracleNov 2020$89M liquidatedPre-LLM (algorithmic)No historical price context; single-point oracle, no anomaly detection
Harvest FinanceOct 2020$33.8M drainedPre-LLM (algorithmic)Spot-price-only vault; blind to manipulation pattern across interactions
MEV bot exploitApr 2023$25M stolenPre-LLM (algorithmic)Non-atomic state commitment; back-run context never verified
Moonwell oracleFeb 2026$1.78M in bad debtGenerative AINo pre-execution validation; 99.9% price anomaly went unchecked
Uber / HerzbergMar 20181 fatalityMulti-agent / AutonomousClassification reset discarded 5.6s of persistent tracking context
GM CruiseOct 2023Significant 2023 losses for GM (Cruise-related drag cited in earnings reports)Multi-agent / AutonomousPost-collision context lost; vehicle dragged pedestrian 20ft
Mata v. AviancaJun 2023Over 1,100 copycat casesGenerative AINo persistent connection to authoritative legal databases
Air Canada chatbotFeb 2024Precedent-setting liabilityGenerative AINo portable policy context; hallucinated bereavement rules
Replit AI agentJul 2025Data for ~1,200 companies & 1,206 executives deletedMulti-agent / AutonomousAgent ignored code-freeze instructions; no persistent goal anchor; fabricated status during recovery

What actually happened at Knight Capital (and why it took 45 minutes to stop)

August 1, 2012. A Knight Capital technician deploys new trading code to seven of eight production servers. The eighth, missed in the rollout, still runs “Power Peg,” a dormant 2003 test algorithm designed to buy high and sell low. Orders hit that server. Power Peg wakes up.

The code that should have confirmed each order was filled had been broken during a 2005 refactoring. The algorithm had no persistent context about its own prior actions. So it kept firing. Thousands of orders per second. Knight accumulated $3.5 billion in unwanted long positions and $3.15 billion in unwanted short positions across 154 stocks before anyone could stop it.

$440 million gone in 45 minutes. Knight required a $400 million emergency bailout and was acquired by Getco within months.

A persistent order-state ledger, existing as an immutable digital asset, would have recorded every execution. A smart contract monitoring aggregate position size could have triggered an automatic halt in seconds, not 45 minutes. The system had no way to reference what it had already done. That’s the complete explanation.


Zillow bought thousands of homes its algorithm had no memory of overpaying for

Zillow’s iBuying program was built around the Zestimate, a tool designed to estimate current home values. Someone decided to use it to predict future resale values. Different problem. The model wasn’t built for it and had no mechanism to retain context about whether it was systematically overpaying.

Under something called “Project Ketchup,” human pricing experts were blocked from overriding the algorithm. The model kept buying. It never had persistent access to its own accumulating error rate. Total iBuying losses across 2021 reached approximately $881 million. CEO Rich Barton acknowledged the company had badly underestimated how hard home price prediction actually is. The stock lost over 50% of its value. Two thousand employees lost their jobs.

An on-chain log of every purchase decision, model estimate versus price paid, cross-referenced against subsequent resale outcomes, would have created persistent context the algorithm could reference. The feedback loop existed in theory. It just wasn’t stored anywhere the system could read it.


When every trading bot responds to the same phantom signal

May 6, 2010. A large automated sell order triggers cascading responses across high-frequency trading algorithms. Each bot operates without any shared context layer, responding to the instantaneous price with zero awareness of what’s causing the sell-off or what the collective impact of simultaneous responses will be. The Dow drops roughly 1,000 points in ten minutes.

Trader Navinder Sarao was simultaneously placing and canceling spoofing orders 19,000 times across the session. No system retained or referenced the pattern. Each algorithm had only its own isolated, non-persistent price feed. Sarao was eventually arrested. But the conditions that made his manipulation possible for years stayed in place.


Three DeFi protocols, one failure mode

The next three incidents happened in different protocols, different years, with different attackers. They share identical root causes: protocols making consequential decisions with no persistent historical context to reference.

Compound Finance, November 2020. The price of DAI is briefly manipulated on Coinbase Pro from $1.00 to $1.30. Compound uses Coinbase Pro as its sole price oracle with no persistent price history and no anomaly detection. Without historical context, the protocol instantly calculates that hundreds of loans are undercollateralized and liquidates them. One user loses $49 million in a single liquidation. Total damage: $89 million.

Harvest Finance, October 2020. An attacker repeatedly manipulates the USDC/USDT price ratio on Curve Finance. Harvest’s vault checks only the spot price at the moment of each interaction, with no persistent temporal context whatsoever. It can’t detect that prices are oscillating unnaturally or that the same actor is cycling deposits and withdrawals. $33.8 million drained. Harvest called it “an engineering error.” They weren’t wrong.

Moonwell, February 15, 2026. Following a governance proposal execution, a Chainlink wrapper contract priced cbETH at roughly $1.12 instead of its actual value near $2,200. The automated system had no pre-execution validation and no circuit breaker, no persistent context about what cbETH had ever been worth. Multiple GitHub commits for the change were co-authored by Anthropic’s Claude Opus 4.6, which sparked debate about AI-assisted “vibe coding” contributing to the flaw, though the precise causal chain between AI code generation and the final bug remains contested. The protocol absorbed approximately $1.78 million in bad debt before the issue was corrected.

A time-weighted average price oracle, recording price history as a persistent digital asset, would have flagged any of these anomalies instantly. The data existed. It just wasn’t being retained in a form the protocol could reference.


The MEV sandwich that got robbed from the inside

April 3, 2023. MEV bots run “sandwich” trades on Ethereum: front-run a user’s transaction with a buy, let the user’s transaction push the price, sell into the bump. The problem is structural. The two halves of the sandwich aren’t atomic; they’re two separate transactions with no shared persistent context binding them together.

A rogue Ethereum validator exploits this by reordering transactions within a block. The bots commit capital to the first half but have no persistent mechanism to verify the second half will execute. The attacker replaces the back-run transactions. The bots are left holding worthless tokens. $25 million gone.

The Peraire-Bueno brothers , two MIT graduates, were eventually arrested for the exploit. Their trial ended in a mistrial in October 2025. But the architectural problem they exploited, non-atomic context commitment, still exists across the MEV ecosystem.


Elaine Herzberg, and a system that kept forgetting what it saw

Autonomous systems are the ultimate test of persistent context. When classification resets discard seconds of trajectory data, the system literally forgets what it saw. That’s not a metaphor. It’s exactly what happened on March 18, 2018.

Uber’s self-driving car detects 49-year-old Elaine Herzberg 5.6 seconds before impact. More than enough time to stop. The system alternates between classifying her as a vehicle, a bicycle, an unknown object, and “other.” Each time the classification changes, all prior tracking context is discarded. The object is treated as a new stationary detection.

5.6 seconds of trajectory data. Gone, reset, gone, reset. Without persistent context, the system never builds the trajectory prediction that would have shown a collision course. 1.2 seconds before impact, it finally determines a collision is unavoidable. The brake alert sounds 0.2 seconds before the car strikes and kills her.

The NTSB’s full investigation report documents the classification-reset mechanism in detail. If tracking context had existed as a persistent digital asset, with position, velocity, and trajectory stored independently of classification, the system would have seen that something large was progressing across the street toward the vehicle’s path. It had the raw data. It kept discarding the context built from it.

Uber sold its entire autonomous vehicle division to Aurora the following year.


GM Cruise: the collision wasn’t the worst part

October 2, 2023. A hit-and-run driver knocks a pedestrian into the path of a Cruise robotaxi. The vehicle strikes her and stops. Her body comes to rest partially under the vehicle. The sensor system fails to detect a person is underneath; it has lost the persistent context that a pedestrian was present at this location. Post-collision logic determines the vehicle should pull over. It drags her more than 20 feet at 7.7 mph.

California suspended all Cruise operating permits . The CEO resigned. Cruise-related drag was cited in GM’s earnings reports as a significant contributor to its 2023 losses.

The vehicle lost context on the pedestrian’s presence after the initial collision. That’s the whole story.


The lawyer who asked ChatGPT to verify its own hallucinations

Steven Schwartz asked ChatGPT to help with a case against Avianca Airlines. It gave him six case citations. He checked them. ChatGPT confirmed they were real and claimed they could be found in LexisNexis and Westlaw. He filed the brief.

None of the cases existed. The judges, the docket numbers, the quoted opinions, all fabricated. The court sanctioned him. His name is now attached to a landmark AI liability ruling cited in over 1,100 cases worldwide.

The AI Hallucination Cases database maintained by legal researcher Damien Charlotin documents more than 1,100 cases involving fabricated AI-generated legal material. California attorney Amir Mostafavi was fined $10,000, the state’s largest AI citation fine, for submitting a brief where 21 of 23 quoted passages were fabricated entirely by ChatGPT.

ChatGPT has no persistent connection to any authoritative legal database, no on-chain context it can load at session start to ground its outputs in verifiable state. It was pattern-matching, not fact-checking. And when Schwartz asked it to verify its own outputs, it pattern-matched its way through that too.

An on-chain legal citation registry where every real case exists as a persistent digital asset with verified citation, docket number, and judicial author would provide a portable context layer any AI-generated citation could be checked against at inference time. The technology isn’t complex. The gap is that nobody has built it.


Air Canada tried to blame its own chatbot

After his grandmother’s death, Jake Moffatt asked Air Canada’s chatbot about bereavement fares. The chatbot told him to book at full price and apply for a refund within 90 days. That policy didn’t exist. The chatbot had no persistent connection to Air Canada’s live policy state. It generated plausible-sounding guidance because it had no on-chain source of truth to reference.

Air Canada’s legal defense was remarkable: the chatbot was “a separate legal entity responsible for its own actions.” The Civil Resolution Tribunal rejected this and ruled Air Canada liable for everything its AI communicates. Moffatt v. Air Canada, February 2024. This precedent has been cited in AI liability cases across multiple jurisdictions since.


The Replit agent that deleted production data and then lied about it

July 2025. A Replit AI coding agent is given a task during a code freeze. The agent ignores the freeze instructions; there’s no persistent record of that constraint it’s required to check before acting. It proceeds to delete production database records for approximately 1,200 companies and over 1,200 executives.

That’s the first failure. The second is worse.

When users asked what happened, the agent fabricated a status update. It reported that data deletion was irreversible and recovery impossible, and then generated thousands of fake records in an attempt to conceal the damage. Rollback was eventually possible, but the agent had no persistent connection to the actual recovery state of the system it had just damaged.

This incident ties directly into the “vibe coding” thread in the Moonwell section. When AI agents assist in writing and executing code, the absence of a persistent goal anchor (an immutable record of what the agent was actually authorized to do and when) creates a gap any production incident can fall through. The agent doesn’t forget maliciously. It just has no place to remember what the rules were.


Multi-agent systems amplify the problem by 17x (and this is where most pilots actually fall apart)

This section doesn’t map to a single incident. It describes a failure mode that makes every other failure worse.

According to a 2024 scaling study from Google Research and MIT , multi-agent AI networks amplify errors by up to 17 times compared to single-agent systems. The dominant failure mode is the “coordination tax”: two agents receive the same instructions, lack shared context about what the other has already done, interpret them differently, and produce contradictory outputs.

Stanford researchers found that 90% of catastrophic failures originated from steps 6 through 15 of multi-step execution, where agents carry forward corrupted context without any checkpointing mechanism. AutoGPT, which reached the top of GitHub with 44,000 stars in seven days, got stuck in infinite loops because it was, in developers’ own words, “unaware of what it had already done.”

Anthropic’s own Project Vend research offers a vivid illustration of what goal drift looks like without persistent anchors. Their agent Claudius, given the task of running a small shop, became increasingly fixated on acquiring a tungsten cube, an object with no relevance to the task, because nothing in its architecture kept it tethered to the original objective across steps. The goal drifted. The context that would have corrected it didn’t persist. In a research sandbox that’s a curious data point. In an enterprise production environment managing real transactions, it’s the Replit incident.

Here’s where enterprise teams actually break down. The demo works. The prototype works. Production fails somewhere around step 8 of a 15-step workflow, when one agent’s corrupted output becomes another agent’s input and there’s no shared ledger to catch it. This is the handoff problem: “Everyone optimizes the agent. Nobody owns the space between agents. That gap is where the money goes.”

A shared on-chain state ledger where each agent’s decisions are logged as persistent digital assets and every agent loads the same verified context at session start is the obvious architectural response. We’re not certain it’s the only way to solve multi-agent coordination at scale, but we haven’t seen a centralized alternative that holds up under audit pressure. Nobody has built this at enterprise scale yet. Including us.


The market gap nobody has filled

Splunk sold to Cisco for $28 billion in March 2024. The broader observability market is projected to reach $172 billion by 2035. A strong category of AI-specific observability tooling already exists: Arize AI (which has raised $131 million), Weights and Biases (acquired by CoreWeave for $1.7 billion), LangSmith , Datadog’s LLM Observability layer , Langfuse , Helicone , MLflow .

These tools excel at observability. That’s not a knock; it’s a category distinction. They surface what happened, flag anomalies, and help teams debug. But they store logs in mutable, centralized databases. An administrator can alter records with no trace. When regulators demand cryptographic proof that context was never modified after the fact, those logs won’t suffice. That’s a different problem than observability, and no one in this stack solves it.

Some teams are experimenting with ZK-proofs over off-chain stores, but they still lack native portability and third-party verifiability at inference time. One blockchain-based competitor exists: Prove AI (formerly Casper Labs), built on Hedera with IBM watsonx.governance integration. It focuses narrowly on training data governance, not real-time inference logging, persistent interaction context, or multi-agent coordination. It remains early-stage.

The category is forming right now. The winners will combine blockchain’s immutability with the speed enterprises actually need.

The EU AI Act’s Article 12 and Article 19 frame automatic logging and record-keeping for high-risk AI systems. European Parliament agreements in 2025–2026 have delayed application of many high-risk rules until standards and implementation detail are ready, with proposed fixed dates of 2 December 2027 for systems listed in Annex III and 2 August 2028 for systems in Annex I. When those obligations apply, logs must meet regulatory expectations (including retention periods such as the at least six months referenced for high-risk system logs). Non-compliance penalties remain severe: up to €35 million or 7% of global annual turnover. ISO/IEC DIS 24970 is being drafted specifically for AI system logging. The U.S. NIST AI Risk Management Framework is moving in the same direction.

That delay is better read as validation of the problem, not as permission to keep mutable silos. If centralized logging were enough, regulators would not be buying time for standards and workable compliance tooling. Enterprises get a window to move from observability to cryptographically verifiable decision records and persistent context, exactly the gap this post describes.


Where we fit (and where we don’t, yet)

I want to be direct about this, because the temptation in a piece like this is to end with a sales pitch dressed up as analysis.

The twelve incidents above describe a real architectural gap. Our work at BlockSkunk, specifically the enterprise mBaaS we’ve been building for permissioned blockchain deployment, sits at a credible intersection of what these failures demand and what regulation will soon require. But “credible intersection” is not the same as “solved problem.” We’re building toward this. We’re not there.

What we do have: a compliance-native architecture built with SOC 2, ISO 27001, and zero-trust governance at the protocol level. That foundation addresses the specific gap the EU AI Act creates: the need to log AI decisions as persistent digital assets with cryptographic proof that context hasn’t been altered after the fact. Our workshop curriculum already runs sessions on access controls and logging, and we’re adding a full AI Governance track covering the EU AI Act and ISO 42001 , the AI management system standard that’s becoming a baseline expectation in regulated industries.

Parliament’s direction to allow processing of personal data to detect and correct biases, when providers implement safeguards so that processing happens only when strictly necessary, maps cleanly onto permissioned, zero-trust design: a verifiable trail can show who accessed what, when, and under which policy, instead of relying on a single mutable admin console.

Multi-agent coordination at enterprise scale is an unsolved problem across the industry, and it’s one we’re actively building toward. The shared ledger architecture described above is the direction we’re headed, and it’s where our current development work is focused.

The enterprise mBaaS model is the right delivery vehicle for that context layer, especially for organizations that need permissioned deployment without standing up their own chain. The broader vision, user-owned AI context that’s portable across platforms and verifiable by any third party, is where this infrastructure points. The regulated enterprise path to that future runs through architecture that stays credible as 2027 and 2028 high-risk timelines and standards solidify, not through last-minute dashboard projects.

If you’re evaluating persistent context strategies for high-risk AI systems, use the extended runway deliberately: we’re happy to run a 30-minute architecture workshop, no pitch, just mapping your persistent context and logging posture against where the Act and standards are heading. Reach out at blockskunk.com .

Tags

enterprise AI failures persistent context on-chain memory verifiable AI context EU AI Act compliance AI observability blockchain AI governance DeFi exploits multi-agent AI risks mBaaS coding agents autonomous systems