Split a parsed document into named sections using an LLM. Consumes ai.parse output; emits per-section page ranges and text ready for downstream ai.extract via control.parallel_map.
ai.split, Split a parsed document into named sections using an LLM. Consumes ai.parse output; emits per-section page ranges and text ready for downstream ai.extract via control.parallel_map.
What this section looks like in the document. Use the document’s own terminology (e.g. “Príloha 2 / Anlage 2”) and visual cues. Multilingual variants belong here.
Natural-language hints fed to the LLM about what marks the END of this section (e.g. [“PRÍPAD PORUŠENIA ZMLUVY”, “Koniec prílohy”]). The LLM uses these as guidance, not a regex match, so casing variants, missing diacritics, and multilingual phrasings all close the section correctly.
Override the per-window token ceiling for this step. Defaults to env SPLIT_WINDOW_TOKEN_BUDGET or 20000. Smaller windows give sharper anchors on contract-style documents (less competing context for the LLM to mis-anchor on); bump to 50k–100k when sections routinely exceed per-window page count.
Sections found in the document, in page order. Absent sections are omitted.
Show splits properties
Section name from the config
[startIndex, endIndex] inclusive, 0-based page indices
LLM confidence at the anchor page. Coarse enum (low | medium | high), numeric scores cluster meaninglessly at 0.85-0.95.
LLM’s justification for the anchor, useful for debugging
Structured start-anchor evidence, parse this instead of regexing over notes.
Show evidence properties
Verbatim heading text the LLM cited as the section start
Page where start_heading_text appears
Set when the LLM detected an end-of-section cue (matched against the section’s endHints or an explicit closing marker). Absent when the section was closed by continuity-fill alone.
Show end_evidence properties
LAST page of the section (inclusive)
LLM’s justification for the end. Pair with start notes for reconciliation.
Joined per-page content for this section, ready for downstream ai.extract
Raw per-page records covered by this split (in page order). Iterate when downstream needs per-page context (e.g. control.parallel_map over section pages).
Show pages properties
Whether this page is direct LLM evidence (anchored) or continuity-filled (inferred)