Skip to main content
A dataset is how you test a workflow against real documents. It is a folder of examples, each with an input and an optional expected output. The folder is the manifest, there is no separate config file, so it is easy for a human or an agent to build.

Archive layout

dataset/
  examples/
    acme-invoice/
      input/
        arguments.json          scalar/object args, e.g. { "language": "en" }
        document/Invoice.pdf     one folder per file argument
      expected/
        output.json             optional ground-truth output
      meta.json                 optional { rowOrder, annotation, overrides }
  • Keys in arguments.json are scalar args.
  • Each folder under input/ is a file argument. The folder name is the argument name; drop one or more files inside it. Original filenames are preserved.
  • expected/output.json is the ground truth an evaluator compares against. Leave it out for examples you only want to run, not score.
Example and argument folder names must match [a-z0-9][a-z0-9-_]*.

Build it

You can hand-author the folders, or create examples from the CLI:
# Create an example from local files
eigenpal workflow dataset example create <workflow-id> \
  --name acme-invoice \
  -F document=@invoices/acme.pdf \
  --arguments '{"language":"en"}'

# Validate the local dataset layout before pushing
eigenpal workflow dataset validate

Push and pull

The dataset round-trips. Push your local folder, or pull the live dataset to edit it:
eigenpal workflow dataset push <workflow-id>            # append by default
eigenpal workflow dataset push <workflow-id> --mode replace
eigenpal workflow dataset pull <workflow-id>            # download to ./dataset
push defaults to append; replace swaps the whole dataset and asks you to confirm. Because the archive layout is identical in both directions, a single-example export re-imports cleanly, which is the supported way to move one example between environments.

Next

Score the dataset in Evaluate a workflow.