transform.text-chunker, Split long text into chunks with regex-anchored boundaries, overlap, and header preservation. Accepts raw text or a parsed-document object; chunks carry source page indexes when pages are provided.
Configuration
Configuration goes inside the step’swith: block.
Either raw text or a parsed-document object
{ pages: [{ pageIndex, text }] } (e.g. {{ steps.parse.output }}). Pages preserve per-chunk page provenance.Target chunk size in characters. Hard ceiling per chunk is 1.5×.
Characters duplicated at chunk boundaries (default 0). Must be < maxChars / 2.
Ordered list of regexes; the first that matches near the chunk boundary wins. Falls back to char-cut when none match. Tip: list narrowest first (e.g. /\d+.\d+\s+/ before /\n\n+/).
Safety cap; later chunks are dropped and
summary.truncated flips to true.Prepend the first N characters of the input to every chunk (good for “always include the contract title”).
Trailing chunks shorter than this are merged into the previous chunk.