Custom Embedding Models
Oxidoc's bundled BGE Micro v2 model works well for English documentation, but you may want a different model for non-English content, specialized domains, or different quality/size tradeoffs.
Using a Custom Model
Point model_path to your GGUF file:
[search]
semantic = true
model_path = "./models/my-custom-model.gguf"Oxidoc loads your model instead of the bundled one. At build time, it embeds all pages using your model. The model file is copied to the output directory for browser-side query embedding.
Model Requirements
Your model must meet three requirements:
GGUF format
A single .gguf file with the tokenizer embedded. This is the standard format for efficient model distribution — one file contains everything needed for inference.
Sentence embedding model
The model must produce fixed-size vectors from text input. It must be a sentence-transformer or embedding model, not a generative LLM. Models like BERT, BGE, E5, GTE, and MiniLM work.
Compatible architecture
Standard BERT-family sentence transformers work out of the box with Oxidoc's boostr GGUF loader. This covers the vast majority of embedding models on Hugging Face.
When to Use a Custom Model
Non-English documentation
The bundled model is optimized for English. For Japanese, Chinese, Korean, Arabic, or multilingual docs, use a model trained on your target language. Semantic search quality drops significantly when the model doesn't understand the content language.
Domain-specific content
Medical, legal, scientific, or financial documentation uses specialized terminology. A domain-fine-tuned model understands that "injection" means something different in medicine vs. software security.
Higher quality embeddings
Larger models (100MB+) with more dimensions (768, 1024) capture more semantic nuance. If search quality is critical and your users have fast connections, a larger model is worth the tradeoff.
Smaller footprint
If 17.5 MB is too large for your users (mobile-first docs, users on slow connections), use a smaller quantized model. Q4 quantization can cut size by 50-75% with modest quality loss.
Finding Models
Search Hugging Face for GGUF sentence embedding models. When evaluating models, look for:
- "sentence-transformers" or "embedding" in the model name
- GGUF variants — look for filenames like
model-q5_k_m.gguformodel-f16.gguf - Language coverage — check the model card for supported languages
- Embedding dimensions — higher dimensions = better quality but larger vectors
- MTEB benchmark scores — the standard benchmark for embedding quality
Recommended Models by Use Case
| Use Case | Model | Size | Dimensions | Notes |
| English (default) | BGE Micro v2 | 17.5 MB | 384 | Bundled, good quality/size ratio |
| English (high quality) | bge-small-en-v1.5 | ~130 MB | 384 | Better quality, larger download |
| Multilingual | multilingual-e5-small | ~120 MB | 384 | 100+ languages, good for i18n sites |
| Japanese / CJK | multilingual-e5-small | ~120 MB | 384 | Strong CJK support |
| Chinese | bge-small-zh-v1.5 | ~100 MB | 512 | Optimized specifically for Chinese |
| Minimal size | all-MiniLM-L6-v2 | ~23 MB | 384 | Smallest useful model |
Model size = browser download
Your custom model file is served to the browser for query-time embedding. A 500 MB model means a 500 MB download before semantic search works. Lexical search still works immediately — semantic search becomes available once the model loads.
Converting Models to GGUF
If you find a model on Hugging Face in safetensors or PyTorch format but not GGUF, you can convert it:
Clone the model
git clone https://huggingface.co/BAAI/bge-small-en-v1.5
Convert to GGUF
Use the conversion tools from llama.cpp or ggml:
python convert-hf-to-gguf.py bge-small-en-v1.5 --outtype f16
Quantize (optional)
Reduce file size with quantization:
./quantize bge-small-en-v1.5-f16.gguf bge-small-en-v1.5-q5_k_m.gguf Q5_K_M
Quantization levels (smaller = less quality):
- F16 — full precision, largest file
- Q8_0 — near-lossless, ~50% smaller
- Q5_K_M — good balance, ~65% smaller
- Q4_K_M — aggressive, ~75% smaller
Use in Oxidoc
[search]
semantic = true
model_path = "./models/bge-small-en-v1.5-q5_k_m.gguf"Embedding models only
Only convert sentence/embedding models (BERT, BGE, E5, GTE, MiniLM). Do not use generative LLMs (GPT, Llama, Mistral) — they don't produce fixed-size embeddings and won't work with Oxidoc's search engine.
Testing Your Model
After switching models, test search quality by trying:
- Exact term searches — "oxidoc.toml", "CodeBlock" (should still work via lexical)
- Conceptual searches — "how to change colors" (should find theming page)
- Synonym searches — "setup" (should find installation page)
- Your domain terms — whatever specialized vocabulary your docs use
If semantic results are poor, try a larger model or one specifically trained on your domain/language.