4 Comments
User's avatar
Pawel Jozefiak's avatar

The Dreaming concept is what I am watching most. Every agent setup I have tried degrades over time because memory compounds in the wrong direction, old irrelevant rules accumulate and the agent slows down. Memory refinement between sessions would change that.

Routines are table stakes at this point, agents running overnight already have something similar. The evaluation layer is the real gap. Most setups trust the agent to self-report on quality and that does not hold up past the first week of real use.

Rohan Jaiswal's avatar

1,600 exchanges on the Claude Typewriter exhibit is the proof-by-volume argument I hadn't seen anyone make: reliability comes from the architecture around the model. Your line 'when your system is misbehaving, your fix is usually fewer agents and better tools' names exactly the mistake I see founders repeat. They add another agent layer to fix the last agent layer. At theaifounder.substack.com we're building on these systems, and the question I can't resolve is how you determine in advance which tools are worth the depth investment. When you strip back scaffolding to match a more capable model, how do you distinguish a guardrail that earned its keep from one that was just adding latency?

Gino Ftn's avatar

Routines as a concept is the real unlock. Claude performs best when you stop prompting and start designing workflows. Great recap on the Claude Code's Conference ✨

Jay's avatar

Outstanding overview!