Posts

All the articles I've posted.

May 25, 2026

What Do We Mean When We Say Evals?

A practical framing of AI evals — what to evaluate across model, agent, and application layers, when to run them, and how to turn vibe coding into engineering.

May 6, 2026

spek

Here is how I built a simple coding agent

Walking through spek, a small LLM-powered coding agent that turns a markdown spec into a working, tested Python package — and the six design questions I had to answer to build it.

March 25, 2025

Poking Around ChatGPT's Sandbox

An exploration of what ChatGPT's code execution environment can and can't do — filesystem access, process introspection, networking, and the curious 'prove it' prompting pattern.