Build a dataset

A dataset tests a workflow against real inputs. It is a folder of examples, each with an input and an optional expected output. The folder is the manifest; there is no separate config file.

Archive layout

dataset/
  examples/
    acme-invoice/
      input.json               full run input object
      input/
        Invoice.pdf             file referenced by input.json
      expected.json             optional ground-truth output
      meta.json                 optional { rowOrder, annotation, overrides }

input.json is the full run input. Use { "$file": "input/Invoice.pdf" } wherever the input should receive a file.
Files under input/ and expected/ are included only when referenced from input.json or expected.json.
expected.json is the ground truth an evaluator compares against. Leave it out for examples you only want to run, not score.

Example folder names must match [a-z0-9][a-z0-9-_]*.

Build it

You can hand-author the folders, or create scalar-only examples from the CLI:

# Create a scalar-only example
eigenpal workflow dataset example create <workflow-id> \
  --name acme-invoice \
  --input-json '{"language":"en"}' \
  --expected-file expected/acme-output.json

# Validate the local dataset layout before pushing
eigenpal workflow dataset validate

For examples with file inputs, use the archive layout above: place files under examples/<name>/input/, reference them from input.json, then push the dataset. That keeps original filenames and mirrors the import/export format.

Push and pull

The dataset round-trips. Push your local folder, or pull the live dataset to edit it:

eigenpal workflow dataset push <workflow-id>            # append by default
eigenpal workflow dataset push <workflow-id> --mode replace
eigenpal workflow dataset pull <workflow-id>            # download to ./dataset

push defaults to append; replace swaps the whole dataset and asks you to confirm. The archive layout is identical in both directions, so a single-example export re-imports cleanly when you need to move one case between environments.

Score the dataset in Evaluate a workflow.

Get started

Concepts

Workflow steps

Guides & tutorials

Changelog

Archive layout

Build it

Push and pull

Next

​Archive layout

​Build it

​Push and pull

​Next

Archive layout

Build it

Push and pull

Next