Design principles

EigenPal is built around a few engineering constraints that make AI workflows easier to test, review, and operate.

1. Evaluations define correctness

A workflow defines behavior. An evaluation defines the output contract for a dataset. Models, prompts, and steps can change; the evaluation is the check that says whether the output still matches what the workflow is meant to produce. Weighted evaluators roll up to one score against a pass threshold.

2. Production issues become tests

When a real document produces a wrong answer, add that input and corrected output to the evaluation dataset. From then on, the case is part of every experiment and comparison. See Eval-first development.

3. Keep humans in the loop with Reviews

Reviews let people judge production runs, track open issues, correct expected output, and monitor quality over time. Human feedback should not live in Slack threads or spreadsheets; it should become part of the run history and the improvement loop.

4. Optimize against evaluation scores

The goal is to meet the quality bar at the lowest acceptable cost. Once evaluations define that bar, you can compare OCR engines, prompts, and language models and keep the cheapest version that still passes. See Optimize cost.

5. Treat AI systems like software

Workflows, datasets, evaluators, and agent source are artifacts: plain files you version in Git, review in pull requests, export between environments, and generate programmatically. The same discipline you apply to code applies here.

6. Separate definition from execution

What a workflow does is configuration; how it runs is the engine’s job. The definition is portable and transparent, while the stateful, queue-based engine handles retries, load, and long-running work. Definitions and their history are exportable.

Get started

Concepts

Workflow steps

Guides & tutorials

Changelog

1. Evaluations define correctness

2. Production issues become tests

3. Keep humans in the loop with Reviews

4. Optimize against evaluation scores

5. Treat AI systems like software

6. Separate definition from execution

​1. Evaluations define correctness

​2. Production issues become tests

​3. Keep humans in the loop with Reviews

​4. Optimize against evaluation scores

​5. Treat AI systems like software

​6. Separate definition from execution

1. Evaluations define correctness

2. Production issues become tests

3. Keep humans in the loop with Reviews

4. Optimize against evaluation scores

5. Treat AI systems like software

6. Separate definition from execution