Skip to main content
Once evaluations define what correct means, cost becomes a search problem instead of a guess. The quality bar is fixed, so you can try cheaper configurations and keep any that still pass. The goal is not the most capable model; it is the cheapest one that does the job.

The idea

A workflow’s cost is mostly its model choices: the OCR engine and the language model each step uses. Those are configuration, not code, so they are easy to change. Evaluations tell you whether a change broke anything.
Define the quality bar with evaluators
        |
Try a configuration (cheaper OCR / smaller model)
        |
Run the experiment over the dataset
        |
Passes?  ->  keep it.  Fails?  ->  revert.

Do it

Change the model on the relevant step, push a new version, and run the experiment. If the score still clears the threshold, you just lowered cost with no loss of reliability.
# in workflow.yaml, set a cheaper model on the step, e.g. ai.parse / ai.extract
eigenpal workflow push
eigenpal workflow experiment run <workflow-id> --wait
eigenpal workflow experiment compare <expensiveBatchId> <cheapBatchId>
compare shows you exactly which examples moved, so a small quality drop on a non-critical field is a deliberate decision rather than a surprise.

Keep it optimal over time

Prices change and new models ship. Because the evaluations are permanent, you can re-run this search whenever that happens and re-pick the cheapest passing configuration. Operating cost stays at the minimum required to hold the agreed quality level, without anyone re-checking outputs by hand.