Custom Embedding Models

Oxidoc's bundled BGE Micro v2 model works well for English documentation, but you may want a different model for non-English content, specialized domains, or different quality/size tradeoffs.

Using a Custom Model

Point model_path to your GGUF file:

oxidoc.tomltoml

[search]
semantic = true
model_path = "./models/my-custom-model.gguf"

Oxidoc loads your model instead of the bundled one. At build time, it embeds all pages using your model. The model file is copied to the output directory for browser-side query embedding.

Model Requirements

Your model must meet three requirements:

GGUF format

A single .gguf file with the tokenizer embedded. This is the standard format for efficient model distribution — one file contains everything needed for inference.

Sentence embedding model

The model must produce fixed-size vectors from text input. It must be a sentence-transformer or embedding model, not a generative LLM. Models like BERT, BGE, E5, GTE, and MiniLM work.

Compatible architecture

Standard BERT-family sentence transformers work out of the box with Oxidoc's boostr GGUF loader. This covers the vast majority of embedding models on Hugging Face.

When to Use a Custom Model

Non-English documentation

The bundled model is optimized for English. For Japanese, Chinese, Korean, Arabic, or multilingual docs, use a model trained on your target language. Semantic search quality drops significantly when the model doesn't understand the content language.

Domain-specific content

Medical, legal, scientific, or financial documentation uses specialized terminology. A domain-fine-tuned model understands that "injection" means something different in medicine vs. software security.

Higher quality embeddings

Larger models (100MB+) with more dimensions (768, 1024) capture more semantic nuance. If search quality is critical and your users have fast connections, a larger model is worth the tradeoff.

Smaller footprint

If 17.5 MB is too large for your users (mobile-first docs, users on slow connections), use a smaller quantized model. Q4 quantization can cut size by 50-75% with modest quality loss.

Finding Models

Search Hugging Face for GGUF sentence embedding models. When evaluating models, look for:

"sentence-transformers" or "embedding" in the model name
GGUF variants — look for filenames like model-q5_k_m.gguf or model-f16.gguf
Language coverage — check the model card for supported languages
Embedding dimensions — higher dimensions = better quality but larger vectors
MTEB benchmark scores — the standard benchmark for embedding quality

Recommended Models by Use Case

Use Case	Model	Size	Dimensions	Notes
English (default)	BGE Micro v2	17.5 MB	384	Bundled, good quality/size ratio
English (high quality)	`bge-small-en-v1.5`	~130 MB	384	Better quality, larger download
Multilingual	`multilingual-e5-small`	~120 MB	384	100+ languages, good for i18n sites
Japanese / CJK	`multilingual-e5-small`	~120 MB	384	Strong CJK support
Chinese	`bge-small-zh-v1.5`	~100 MB	512	Optimized specifically for Chinese
Minimal size	`all-MiniLM-L6-v2`	~23 MB	384	Smallest useful model

Model size = browser download

Your custom model file is served to the browser for query-time embedding. A 500 MB model means a 500 MB download before semantic search works. Lexical search still works immediately — semantic search becomes available once the model loads.

Converting Models to GGUF

If you find a model on Hugging Face in safetensors or PyTorch format but not GGUF, you can convert it:

Clone the model

git clone https://huggingface.co/BAAI/bge-small-en-v1.5

Convert to GGUF

Use the conversion tools from llama.cpp or ggml:

python convert-hf-to-gguf.py bge-small-en-v1.5 --outtype f16

Quantize (optional)

Reduce file size with quantization:

./quantize bge-small-en-v1.5-f16.gguf bge-small-en-v1.5-q5_k_m.gguf Q5_K_M

Quantization levels (smaller = less quality):

F16 — full precision, largest file
Q8_0 — near-lossless, ~50% smaller
Q5_K_M — good balance, ~65% smaller
Q4_K_M — aggressive, ~75% smaller

Use in Oxidoc

oxidoc.tomltoml

[search]
semantic = true
model_path = "./models/bge-small-en-v1.5-q5_k_m.gguf"

Embedding models only

Only convert sentence/embedding models (BERT, BGE, E5, GTE, MiniLM). Do not use generative LLMs (GPT, Llama, Mistral) — they don't produce fixed-size embeddings and won't work with Oxidoc's search engine.

Testing Your Model

After switching models, test search quality by trying:

Exact term searches — "oxidoc.toml", "CodeBlock" (should still work via lexical)
Conceptual searches — "how to change colors" (should find theming page)
Synonym searches — "setup" (should find installation page)
Your domain terms — whatever specialized vocabulary your docs use

If semantic results are poor, try a larger model or one specifically trained on your domain/language.