Skip to main content
Everything in EigenPal follows from five ideas. Each one is backed by a concrete capability, not just a slogan.

1. Evaluations define correctness

A workflow defines behavior. An evaluation defines what “correct” means for the data it produces. Models, prompts, and steps all change over time; the evaluation is the fixed point that says whether the output is still right. Weighted evaluators roll up to one score against a pass threshold, so correctness is a number you can agree on and operate against.

2. Production issues become tests

Every failure should make the system stronger. When a real document produces a wrong answer, you promote that run into the evaluation dataset with the corrected output. From then on, the bug is a permanent test case: no future version can reintroduce it without an evaluation catching it. See Eval-first development.

3. Optimize outcomes, not models

The goal is not to use the most capable model. It is to satisfy a quality requirement at the lowest cost. Once evaluations fix the quality bar, you can search across OCR engines and language models and pick the cheapest one that still passes, and re-pick it whenever prices change or new models appear. Models are implementation details. See Optimize cost.

4. Treat AI systems like software

Workflows, datasets, evaluators, and agent source are artifacts: plain files you version in Git, review in pull requests, export between environments, and generate programmatically. The same discipline you apply to code applies here, which is also what lets an AI agent build and improve the whole thing.

5. Separate definition from execution

What a workflow does is configuration; how it runs is the engine’s job. The definition is portable and transparent, and the stateful, queue-based engine handles retries, load, and long-running work underneath it. That separation is why there is no lock-in: the definitions and their history are yours to export at any time.