Tag

#testing

1 piece of content

My LLM Pipeline Passed Every Manual Check — Then 36 Tests Proved Otherwise

Five manual runs looked fine. Then 36 automated tests exposed non-deterministic sourcing, biased scoring, and a confidence threshold that fired randomly.

Mar 2026