1. Build the workflow
Parse the document, then extract the fields you need. Keep it simple to start.2. Define what correct means
Collect a handful of real statements and write down the right answer for each. That is your dataset: an input document plus the expected output. Then add evaluators that score a run against that expected output. For structured extraction,exact-diff checks the transactions and
account fields match exactly. For the fuzzy parts, an llm-judge can grade
something like “does this capture recurring salary credits and exclude refunds.”
Evaluators are weighted and roll up to one score against a pass threshold.