ai.parse turns a document into text. It is almost always the first step in a
document-understanding workflow: point it at an uploaded file, get back clean
text plus per-page content, then feed that into ai.extract,
ai.split, or a transform.script.
When to use it
- You have a PDF, image, or Office document and need its text.
ai.parsehandles PDFs, images (PNG/JPG), and Office formats (DOCX, PPTX, XLSX, ODT, and more), normalizing them to one text representation. - You need per-page boundaries. The
pagesarray carries page-scoped text so downstream steps can cite evidence by page or split a document into sections.
How parsing is chosen
The step picks a strategy based on the document and your config:- Native text (
nativeText: true) pulls embedded text straight from a PDF. Fastest, uses no credits, and falls back to OCR/VLM when the PDF has no text layer (a scan). - OCR (
ocrModel) runs optical character recognition for scanned PDFs and images. - Vision LLM (
llmModel) reads page images with a vision model. Best for complex layouts where OCR struggles.pagesPerBatchandmaxConcurrencytune how page images are batched across requests.
Example
markdown output keeps headings and tables, which makes the downstream
ai.extract prompt far more reliable than unstyled plain text.
Configuration
Configuration goes inside the step’swith: block.
Storage reference or template expression for the document
OCR provider ID for PDF/image parsing
LLM provider ID for vision-based parsing
Max concurrent VLM batch requests
Number of page images per VLM request
Custom extraction prompt
OCR language hints
Format for extracted text.
markdown (default) keeps structure and is best for LLM extraction; plain is unstyled text; djot/html preserve more layout. Only the native (Kreuzberg) parser respects this, OCR/VLM always emit markdown.Extract native/embedded text from PDFs without OCR/VLM. Faster and uses no credits. Falls back to OCR/VLM if the PDF has no embedded text.
Output
Extracted text content (combined from all pages)
Per-page content
Document metadata