The idea
A workflow’s cost is mostly its model choices: the OCR engine and the language model each step uses. Those are configuration, not code, so they are easy to change. Evaluations tell you whether a change broke anything.Do it
Change the model on the relevant step, push a new version, and run the experiment. If the score still clears the threshold, you just lowered cost with no loss of reliability.compare shows you exactly which examples moved, so a small quality drop on a
non-critical field is a deliberate decision rather than a surprise.