Tag

#ai-engineering

8 pieces of content

Engineering the Evidence Layer

When I started building AI applications, I thought the hardest part would be getting the models to produce good answers. I was wrong. AI produces evidence, not decisions.

Jun 2026 article

AI Makes Code Cheap, But Reviewing It is the New Bottleneck

The shift from building 'cool' AI agents to maintainable real-world applications requires model awareness, cost awareness, and a persistent human harness.

May 2026 article

I Evaluated Fine-Tuning Across 3 Projects — None of Them Needed It

Three projects, three evaluations, zero cases where fine-tuning was justified. Here's the decision framework, the cost math, and why simpler approaches won every time.

Mar 2026 article

How ReAct Agents Recover from Their Own Mistakes

ReAct agents recover from their own mistakes — not because the model is clever, but because of how tools return errors and how the loop is structured. Here's what that looks like in practice.

Mar 2026 article

Letting the Model Pick Its Own Tools: How Tool Use Inverts Control Flow

The model autonomously combined keyword search and vector search in the optimal sequence — without being told to. Then I ran experiments to measure what vague descriptions, over-calling, and temperature actually do to tool selection.

Mar 2026 article

When Embeddings Fail: Why Vector Search Can't Judge Capability

I added vector search to the screening pipeline and watched it rank a junior frontend developer above a Principal Engineer who processed 1B+ events/day. The embedding model matched vocabulary, not capability.

Mar 2026 article

My LLM Pipeline Passed Every Manual Check — Then 36 Tests Proved Otherwise

Five manual runs looked fine. Then 36 automated tests exposed non-deterministic sourcing, biased scoring, and a confidence threshold that fired randomly.

Mar 2026 article

Auditing My AI Systems: Patterns, Tradeoffs, and Gaps I Was Working Around

I catalogued every AI decision across three production systems and found a consistent pattern — along with five gaps I'd been working around instead of solving.

Mar 2026