Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

build

Validate, embed, and write the search index.

Usage

mdvs build [path] [flags]

Flags

FlagDefaultDescription
path.Directory containing mdvs.toml
--set-modelChange embedding model (requires --force)
--set-revisionPin model to a specific HuggingFace revision (requires --force)
--set-chunk-sizeChange max chunk size in characters (requires --force)
--forceConfirm config changes or trigger a full rebuild
--no-updateSkip auto-update before building

Global flags (-o, -v, --logs) are described in Configuration.

What it does

build creates (or updates) the search index in .mdvs/. The pipeline:

  1. Read config — parse mdvs.toml. If [embedding_model], [chunking], or [search] sections are missing, they’re added with defaults and written back.

By default, build auto-updates the schema before building (see [build].auto_update). Use --no-update to skip this.

  1. Scan — walk the directory and extract frontmatter.
  2. Validate — check frontmatter against the schema (same as check). If violations are found, the build aborts.
  3. Classify — compare scanned files against the existing index to determine what needs embedding.
  4. Load model — download or load the cached embedding model. Skipped if nothing needs embedding.
  5. Embed — chunk and embed new/edited files.
  6. Write index — write files.parquet and chunks.parquet to .mdvs/.

See Search & Indexing for details on chunking, embedding, and how the index is structured.

Incremental builds

Build is incremental by default. It classifies each file by comparing its content hash against the existing index:

StatusConditionAction
newfile not in existing indexchunk + embed
editedfile in index, content changedchunk + re-embed
unchangedfile in index, content matcheskeep existing chunks
removedfile in index, no longer on diskdrop from index

Content hash covers the file body only (after frontmatter extraction). Frontmatter-only changes don’t trigger re-embedding — but files.parquet is always rewritten with fresh frontmatter from the current scan.

When nothing needs embedding, the model is never loaded.

Config changes

build detects when the embedding configuration has changed since the last build by comparing mdvs.toml against metadata stored in the parquet files. If a mismatch is found, the build refuses to proceed unless you pass --force:

config changed since last build:
  model: 'minishlab/potion-base-8M' → 'minishlab/potion-base-32M'
Use --force to rebuild with new config

The --set-model, --set-revision, and --set-chunk-size flags update mdvs.toml and require --force (since they change the config and trigger a full re-embed). For example, to switch to a larger model:

mdvs build --set-model minishlab/potion-base-32M --force

--set-revision pins the model to a specific HuggingFace commit SHA, ensuring reproducible embeddings even if the model is updated upstream:

mdvs build --set-revision abc123def --force

The revision is stored in mdvs.toml under [embedding_model].revision and checked against the parquet metadata on subsequent builds. See Embedding for the full list of available models.

On the first build (no existing .mdvs/), --force is never needed.

Output

Compact (default)

Incremental build with one new file:

Built index — 44 files, 60 chunks

╭──────────────────────────┬─────────────────────────┬─────────────────────────╮
│ embedded                 │ 1 file                  │ 1 chunk                 │
│ unchanged                │ 43 files                │ 59 chunks               │
╰──────────────────────────┴─────────────────────────┴─────────────────────────╯

When nothing needs embedding:

Built index — 43 files, 59 chunks

╭──────────────────────────┬─────────────────────────┬─────────────────────────╮
│ unchanged                │ 43 files                │ 59 chunks               │
╰──────────────────────────┴─────────────────────────┴─────────────────────────╯

When violations are found, the build aborts:

Build aborted — 6 violation(s) found. Run `mdvs check` for details.

Verbose (-v)

Read config: example_kb/mdvs.toml
Scan: 44 files
Validate: 44 files — no violations
Classify: 44 files (full rebuild)
Load model: "minishlab/potion-base-8M" (256d)
Embed: 44 files (60 chunks)
Write index: 44 files, 60 chunks

Built index — 44 files, 60 chunks (full rebuild)

╭─────────────────────────┬─────────────────────────┬──────────────────────────╮
│ embedded                │ 44 files                │ 60 chunks                │
├─────────────────────────┴─────────────────────────┴──────────────────────────┤
│   - "README.md" (7 chunks)                                                   │
│   - "blog/drafts/grant-ideas.md" (2 chunks)                                  │
│   - "blog/drafts/upcoming-talk.md" (1 chunk)                                 │
│   ...                                                                        │
│   - "scratch.md" (1 chunk)                                                   │
╰──────────────────────────────────────────────────────────────────────────────╯

Verbose output shows each pipeline step with its result, and expands embedded files with per-file chunk counts.

Exit codes

CodeMeaning
0Build completed successfully
1Violations found — build aborted
2Pipeline error (missing config, scan failure, config mismatch, model failure)

Errors

ErrorCause
no mdvs.toml foundConfig doesn’t exist — run mdvs init first
config changed since last buildConfig differs from parquet metadata — use --force
--set-model requires --forceChanging model triggers full re-embed
--set-chunk-size requires --forceChanging chunk size triggers full re-embed
dimension mismatchModel produces different dimensions than existing index (incremental build only — --force bypasses this)