Text Chunker

transform.text-chunker, Split long text into chunks with regex-anchored boundaries, overlap, and header preservation. Accepts raw text or a parsed-document object; chunks carry source page indexes when pages are provided.

Configuration

Configuration goes inside the step’s with: block.

string | record<string, unknown>

required

Either raw text or a parsed-document object { pages: [{ pageIndex, text }] } (e.g. {{ steps.parse.output }}). Pages preserve per-chunk page provenance.

integer

required

Target chunk size in characters. Hard ceiling per chunk is 1.5×.

integer

default:"0"

Characters duplicated at chunk boundaries (default 0). Must be < maxChars / 2.

array<string>

Ordered list of regexes; the first that matches near the chunk boundary wins. Falls back to char-cut when none match. Tip: list narrowest first (e.g. /\d+.\d+\s+/ before /\n\n+/).

integer

default:"64"

Safety cap; later chunks are dropped and summary.truncated flips to true.

integer

default:"0"

Prepend the first N characters of the input to every chunk (good for “always include the contract title”).

integer

default:"0"

Trailing chunks shorter than this are merged into the previous chunk.

Output

Show chunks properties

Page indexes that contributed text to this chunk (empty for raw-string input).

Show summary properties

Script Regex Extract

Get started

Concepts

Workflow steps

Guides & tutorials

Changelog

Configuration

Output

​Configuration

​Output

Configuration

Output