Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

build

Validate, embed, and write the search index.

Usage

mdvs build [path] [flags]

Flags

FlagDefaultDescription
path.Directory containing mdvs.toml
--set-modelChange embedding model (requires --force)
--set-revisionPin model to a specific HuggingFace revision (requires --force)
--set-chunk-sizeChange max chunk size in characters (requires --force)
--forceConfirm config changes or trigger a full rebuild
--no-updateSkip auto-update before building

Global flags (-o, -v, --logs) are described in Configuration.

What it does

build creates (or updates) the search index in .mdvs/. The pipeline:

  1. Read config — parse mdvs.toml. If [embedding_model], [chunking], or [search] sections are missing, they’re added with defaults and written back.

By default, build auto-updates the schema before building (see [build].auto_update). Use --no-update to skip this.

  1. Scan — walk the directory and extract frontmatter.
  2. Validate — check frontmatter against the schema (same as check). If violations are found, the build aborts.
  3. Classify — compare scanned files against the existing index to determine what needs embedding.
  4. Load model — download or load the cached embedding model. Skipped if nothing needs embedding.
  5. Embed — chunk and embed new/edited files.
  6. Write index — write the Lance dataset at .mdvs/index.lance/ (one row per chunk) and create indexes inside it: a full-text BM25 index on chunk_text (always) and a cosine IVF-PQ vector index on embedding (only above 10,000 chunks; smaller vaults rely on LanceDB’s exact flat scan).

See Search & Indexing for details on chunking, embedding, and how the index is structured.

Incremental builds

Build is incremental by default. It classifies each file by comparing its content hash against the existing index:

StatusConditionAction
newfile not in existing indexchunk + embed
editedfile in index, content changedchunk + re-embed
unchangedfile in index, content matcheskeep existing chunks
removedfile in index, no longer on diskdrop from index

Content hash covers the file body only (after frontmatter extraction). Frontmatter-only changes don’t trigger re-embedding — but every chunk row is rewritten with fresh frontmatter from the current scan.

When nothing needs embedding, the model is never loaded.

Config changes

build detects when the embedding configuration has changed since the last build by comparing mdvs.toml against metadata stored on the Lance dataset. If a mismatch is found, the build refuses to proceed unless you pass --force:

config changed since last build:
  model: 'minishlab/potion-base-8M' → 'minishlab/potion-base-32M'
Use --force to rebuild with new config

The same check covers schema changes. A hash of the post-translation JSON Schema is stored on the Lance dataset; if the current schema doesn’t match, the build refuses with:

schema: fields, types, constraints, path-scoping, or preprocessors have changed
Use --force to rebuild with new schema

This catches edits to [[fields.field]] definitions, constraint changes, preprocessor changes, and path-scoping changes — anything that affects what gets stored in the data column of the index.

The --set-model, --set-revision, and --set-chunk-size flags update mdvs.toml and require --force (since they change the config and trigger a full re-embed). For example, to switch to a larger model:

mdvs build --set-model minishlab/potion-base-32M --force

--set-revision pins the model to a specific HuggingFace commit SHA, ensuring reproducible embeddings even if the model is updated upstream:

mdvs build --set-revision abc123def --force

The revision is stored in mdvs.toml under [embedding_model].revision and checked against the Lance dataset metadata on subsequent builds. See Embedding for the full list of available models.

On the first build (no existing .mdvs/), --force is never needed.

Output

Compact (default)

When nothing needs embedding (incremental build, all files unchanged):

Built index — 43 files, 59 chunks

┌──────────────────────────┬───────────────────────────────────────────────────┐
│ full rebuild             │ false                                             │
├──────────────────────────┼───────────────────────────────────────────────────┤
│ files total              │ 43                                                │
├──────────────────────────┼───────────────────────────────────────────────────┤
│ files embedded           │ 0                                                 │
├──────────────────────────┼───────────────────────────────────────────────────┤
│ files unchanged          │ 43                                                │
├──────────────────────────┼───────────────────────────────────────────────────┤
│ files removed            │ 0                                                 │
├──────────────────────────┼───────────────────────────────────────────────────┤
│ chunks total             │ 59                                                │
├──────────────────────────┼───────────────────────────────────────────────────┤
│ chunks embedded          │ 0                                                 │
├──────────────────────────┼───────────────────────────────────────────────────┤
│ chunks unchanged         │ 59                                                │
├──────────────────────────┼───────────────────────────────────────────────────┤
│ chunks removed           │ 0                                                 │
├──────────────────────────┼───────────────────────────────────────────────────┤
│ new fields               │ (none)                                            │
├──────────────────────────┼───────────────────────────────────────────────────┤
│ embedded files           │ (none)                                            │
├──────────────────────────┼───────────────────────────────────────────────────┤
│ removed files            │ (none)                                            │
└──────────────────────────┴───────────────────────────────────────────────────┘

When violations are found, the build aborts:

Build aborted — 6 violation(s) found. Run `mdvs check` for details.

Verbose (-v)

Verbose output adds pipeline timing lines before the result:

Read config: example_kb/mdvs.toml (4ms)
Scan: 43 files (4ms)
Infer: 37 field(s) (0ms)
Validate: 43 files — no violations (87ms)
Classify: 43 files (full rebuild) (0ms)
Load model: minishlab/potion-base-8M (24ms)
Embed: 43 files, 59 chunks (12ms)
Write index: 43 files, 59 chunks (1ms)
Built index — 43 files, 59 chunks (full rebuild)

┌──────────────────────────┬───────────────────────────────────────────────────┐
│ full rebuild             │ true                                              │
├──────────────────────────┼───────────────────────────────────────────────────┤
│ files total              │ 43                                                │
├──────────────────────────┼───────────────────────────────────────────────────┤
│ files embedded           │ 43                                                │
├──────────────────────────┼───────────────────────────────────────────────────┤
│ files unchanged          │ 0                                                 │
├──────────────────────────┼───────────────────────────────────────────────────┤
│ ...                                                                          │
├──────────────────────────┼───────────────────────────────────────────────────┤
│ embedded files           │ README.md (7 chunks)                              │
│                          │ blog/drafts/grant-ideas.md (2 chunks)             │
│                          │ ...                                               │
├──────────────────────────┼───────────────────────────────────────────────────┤
│ removed files            │ (none)                                            │
└──────────────────────────┴───────────────────────────────────────────────────┘

The key-value table is identical in both modes — verbose only adds the step lines showing processing times. When files are embedded, the embedded files row lists each file with its chunk count.

Exit codes

CodeMeaning
0Build completed successfully
1Violations found — build aborted
2Pipeline error (missing config, scan failure, config mismatch, model failure)

Errors

ErrorCause
no mdvs.toml foundConfig doesn’t exist — run mdvs init first
config changed since last buildConfig differs from Lance dataset metadata — use --force
--set-model requires --forceChanging model triggers full re-embed
--set-chunk-size requires --forceChanging chunk size triggers full re-embed
dimension mismatchModel produces different dimensions than existing index (incremental build only — --force bypasses this)