Discussion about this post

User's avatar
Janusz Hain's avatar

This is the first post that seems really distant from my experience, at least in Android scope. I have built some workflows thata automatically can check Figma, tickets, other data. We have clear ACs. And yet my AI can't find relevant context by itself, I nees to pass it, otherwise it will burn a lot of tokens and change not proper screen. More than that, I need to check each artifact and it often is not one-shot. I need to fix it. Then it runs once or several times with me reviewing and telling it to fix something. If I see so many quality and logical errors, how can I let it run alone, even with supervision, for such a long time?

Can you tell me examples of tasks and technology used? I compared my AI work to what web developers do and we have completly different workflow, maybe it is that? How about quality, errors or other issues with that approach? How understanding of the codebase is preserved?

A lot of questions, especially seeing some products degrading in quality / understandong of the cosebase (like Microsoft's Windows / Github or Claude Code). I am not sceptical at all, I would like to try to introduce into my workflow, but it seems like it doesn't come as a perfect solution yet

Stratabase's avatar

The brain/hands/session split is one of the cleaner decompositions of where agents actually fail. Most production failures aren't brain failures — the model can reason fine in isolation. They're session failures: state gets corrupted, context goes stale, the orchestrator collapses what should be three independent layers into one mutable blob.

This maps almost directly onto LangChain's 2026 report finding that around 60% of agent failures trace to the harness rather than the model. "Harness" in their framing is essentially your "session" layer — the place where state and recovery live, and the place most stacks treat as configuration rather than as a first-class system. The brain gets all the design attention; the session gets a YAML file.

The Ralph Loop is interesting because it solves the problem by refusing to merge the layers in the first place. Bash plus JSON minimalism isn't a stylistic choice — it's a structural one. State lives in files. Recovery is just rerunning. The "simplicity" is doing the work that frameworks try (and often fail) to do with abstraction. Probably the most underrated point in the post.

No posts

Ready for more?