Discussion about this post

User's avatar
Pawel Jozefiak's avatar

The 'verifiable work' distinction is the useful framing here. Code compiles or it doesn't - that feedback loop is why software engineering is moving faster than other domains.

What Amodei's timeline undersells: the transition isn't happening at the frontier models. It's happening in agent orchestration - the wrapper that turns a model into a continuous worker.

I've been running an agent on actual production tasks overnight. The gap isn't model capability - it's knowing when to escalate vs. proceed. That judgment layer is what's actually hard to automate: https://thoughts.jock.pl/p/building-ai-agent-night-shifts-ep1

What verifiability mechanisms have you found work best for non-code tasks?

2 more comments...

No posts

Ready for more?