Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Configuration

All configuration lives in mdvs.toml, created by init and updated by update. This page is a complete reference of every section and field.

Sections overview

mdvs.toml has two groups of sections:

Validation (always present):

  • [scan] — file discovery
  • [check] — check command settings
  • [fields] — field definitions and ignore list

Build & search (written by init, model/chunking filled by first build):

Global flags

These flags apply to all commands:

FlagValuesDefaultDescription
-o, --outputtext, jsontextOutput format
-v, --verboseShow detailed output (pipeline steps, expanded records)
--logsinfo, debug, trace(none)Enable diagnostic logging to stderr

[scan]

Controls how markdown files are discovered.

[scan]
glob = "**"
include_bare_files = true
skip_gitignore = false
FieldTypeDefaultDescription
globString"**"Glob pattern for matching markdown files
include_bare_filesBooleantrueInclude files without YAML frontmatter
skip_gitignoreBooleanfalseDon’t read .gitignore patterns during scan

When include_bare_files is true, files without frontmatter participate in inference (empty field set) and validation (can trigger MissingRequired). When false, they’re excluded from the scan entirely.

[update]

Placeholder for future update-specific settings. Currently empty — this section is hidden from mdvs.toml by default.

[check]

Check command settings.

[check]
auto_update = true
FieldTypeDefaultDescription
auto_updateBooleanfalseAuto-run update before validating

When auto_update is true, check runs the update pipeline (scan, infer, write config) before validating. Set to false or use --no-update for deterministic CI validation against the committed mdvs.toml.

[embedding_model]

Specifies the embedding model for semantic search. See Embedding for available models.

[embedding_model]
provider = "model2vec"
name = "minishlab/potion-base-8M"
FieldTypeDefaultDescription
providerString"model2vec"Embedding provider (currently only "model2vec")
nameString"minishlab/potion-base-8M"HuggingFace model ID
revisionString(none)Pin to a specific HuggingFace commit SHA for reproducibility

The provider field can be omitted — it defaults to "model2vec". The revision field only appears when explicitly set (e.g., via build --set-revision).

Changing the model or revision after a build requires build --force to re-embed all files.

[chunking]

Controls semantic text splitting before embedding.

[chunking]
max_chunk_size = 1024
FieldTypeDefaultDescription
max_chunk_sizeInteger1024Maximum chunk size in characters

The text splitter breaks each file’s body into semantic chunks respecting markdown structure (headings, paragraphs, lists). Changing the chunk size after a build requires build --force.

[build]

Build workflow settings.

[build]
auto_update = true
FieldTypeDefaultDescription
auto_updateBooleanfalseAuto-run update before building

When auto_update is true, build runs the update pipeline before building. Use --no-update to skip.

Settings for the search command, including how internal columns are named in --where queries.

[search]
default_limit = 10
FieldTypeDefaultDescription
default_limitInteger10Maximum results when --limit is not specified
internal_prefixString""Prefix for internal column names in --where queries
aliasesMap{}Per-column name overrides for internal columns
auto_updateBooleanfalseAuto-run update before building (when auto_build is true)
auto_buildBooleanfalseAuto-run build before searching

Internal column names

Beyond your frontmatter fields, the search index stores bookkeeping columns that mdvs uses internally. These internal columns are available in --where queries:

ColumnContains
filepathRelative file path (e.g., blog/post.md)
file_idUnique identifier for each file
content_hashHash of the file body
built_atTimestamp of last build

By default, these are exposed with their raw names:

--where "filepath LIKE 'blog/%'"

If a frontmatter field name collides with an internal column (e.g., you have a field called filepath), search will error and suggest resolutions:

  1. Set a prefix to namespace all internal columns:

    [search]
    internal_prefix = "_"
    

    Internal columns become _filepath, _file_id, etc.

  2. Set a per-column alias to rename just the colliding column:

    [search.aliases]
    filepath = "path"
    

    The internal column becomes path, your frontmatter filepath stays bare.

  3. Rename the frontmatter field in your markdown files.

Aliases take precedence over the prefix. See the Search Guide for full --where reference.

[fields]

Defines field constraints and the ignore list. This is the largest section — it contains one [[fields.field]] entry per constrained field.

Ignore list

[fields]
ignore = ["internal_id", "temp_notes"]

Fields in the ignore list are known but unconstrained — they skip all validation and are not reported as new fields by check or update. A field cannot be in both ignore and [[fields.field]].

Field definitions

Each [[fields.field]] entry defines constraints on a frontmatter field:

[[fields.field]]
name = "title"
type = "String"
allowed = ["blog/**", "projects/**"]
required = ["blog/**", "projects/**"]
nullable = false
FieldTypeDefaultDescription
nameString(required)Frontmatter key
typeFieldType"String"Expected value type
allowedString[]["**"]Glob patterns where the field may appear
requiredString[][]Glob patterns where the field must be present
nullableBooleantrueWhether null values are accepted

All fields except name have permissive defaults. A minimal entry with just a name:

[[fields.field]]
name = "title"

is equivalent to:

[[fields.field]]
name = "title"
type = "String"
allowed = ["**"]
required = []
nullable = true

This is not the same as putting the field in the ignore list. Both prevent the field from being reported as new during update, but a [[fields.field]] entry tracks the field — it appears in info output with its type and patterns, and can be targeted by update --reinfer. The ignore list simply silences the field: no validation, no detail in info.

Type syntax

Scalar types are plain strings:

type = "String"    # also: "Boolean", "Integer", "Float"

Arrays use an inline table:

type = { array = "String" }

Objects use a nested inline table:

type = { object = { author = "String", count = "Integer" } }

See Types for the full type system, including widening rules.

Path patterns

allowed and required are lists of glob patterns matched against relative file paths:

allowed = ["blog/**", "projects/alpha/**"]
required = ["blog/published/**"]

Patterns must end with /* (direct children) or /** (full subtree), or be exactly * or **. Bare paths like blog or file names like blog/post.md are not valid.

The invariant required ⊆ allowed is enforced — every required glob must be covered by some allowed glob. For example, allowed = ["meetings/**"] covers required = ["meetings/all-hands/**"] because any path matching the required pattern also matches the allowed one.

See Schema Inference for how these patterns are computed.

Example

A representative subset from example_kb/mdvs.toml (37 fields total, 4 shown):

[scan]
glob = "**"
include_bare_files = true
skip_gitignore = false

[embedding_model]
provider = "model2vec"
name = "minishlab/potion-base-8M"

[chunking]
max_chunk_size = 1024

[search]
default_limit = 10

[fields]
ignore = []

[[fields.field]]
name = "title"
type = "String"
allowed = ["blog/**", "meetings/**", "people/**", "projects/**", "reference/protocols/**"]
required = ["blog/**", "meetings/**", "people/**", "projects/**", "reference/protocols/**"]
nullable = false

[[fields.field]]
name = "tags"
allowed = ["blog/**", "projects/alpha/*", "projects/alpha/notes/**", "projects/archived/**", "projects/beta/*", "projects/beta/notes/**"]
required = ["blog/published/**", "projects/alpha/notes/**", "projects/archived/**", "projects/beta/notes/**"]
nullable = false
type = { array = "String" }

[[fields.field]]
name = "drift_rate"
type = "Float"
allowed = ["projects/alpha/notes/**"]
required = ["projects/alpha/notes/**"]
nullable = true

[[fields.field]]
name = "calibration"
allowed = ["projects/alpha/notes/**"]
required = []
nullable = false
type = { object = { adjusted = { object = { intensity = "Float", wavelength = "Float" } }, baseline = { object = { intensity = "Float", notes = "String", wavelength = "Float" } } } }