How it Works

A clean, observable pipeline. Four stages, fully inspectable.

CHUNKZA replaces the black box of ad-hoc chunking with a pipeline you can see, diff, and replay. Here's exactly what happens to your documents.

See the features

Ingest

Bring your corpora in any form.

Connect a source — a folder, a bucket, a Notion workspace, a Confluence space — or upload files directly. CHUNKZA normalizes PDF, DOCX, PPTX, Markdown, HTML, and plain text into a single structural representation, preserving headings, tables, lists, and captions.

PDF, DOCX, PPTX, MD, HTML, Notion, Confluence
OCR pass for scanned documents
Source URI and provenance preserved
Incremental sync for live sources

Parse & split

Chunk by structure, then by meaning.

Layout-aware segmentation identifies structural boundaries first. A semantic boundary model then refines the splits inside long passages, predicting where topics shift. Parent and child chunks are linked automatically, with metadata injected at every level.

Layout-aware structural segmentation
Semantic boundary detection on long passages
Parent-child linking with shared metadata
Per-section policy overrides

Visualize

Inspect every boundary before you ship.

Open the diagnostic panel to preview chunk boundaries in context, inspect metadata on each chunk, and project embeddings into 2D to spot clusters, outliers, and duplicates. Diff any two strategies side by side and watch which boundaries move.

Live boundary preview with token budgets
Embedding distribution map
Strategy diff with recall impact
Metadata and schema validation

Retrieve

Export, replay, and measure.

Push the chunked corpus to your vector store in one command. Replay real queries against any chunking version to see which chunks surfaced, in what order, with what score. Iterate the policy, re-export, and watch retrieval quality climb.

One-command export to Pinecone, Weaviate, Qdrant, pgvector
Retrieval replay with score breakdowns
Recall@k and context-token dashboards
Versioned chunking policies, fully reproducible

Ready to retrieve?

Export to your vector store and replay your first query in minutes.

Request a demo

See it on your own documents

Bring a sample corpus. We'll run it through the pipeline and show you the diagnostic panel live.

Book a walkthrough Explore features