Custom Embedding Models

Oxidoc's bundled BGE Micro v2 model works well for English documentation, but you may want a different model for non-English content, specialized domains, or different quality/size tradeoffs.

Using a Custom Model

Point model_path to your GGUF file:

oxidoc.tomltoml
[search]
semantic = true
model_path = "./models/my-custom-model.gguf"

Oxidoc loads your model instead of the bundled one. At build time, it embeds all pages using your model. The model file is copied to the output directory for browser-side query embedding.

Model Requirements

Your model must meet three requirements:

1

GGUF format

A single .gguf file with the tokenizer embedded. This is the standard format for efficient model distribution — one file contains everything needed for inference.

2

Sentence embedding model

The model must produce fixed-size vectors from text input. It must be a sentence-transformer or embedding model, not a generative LLM. Models like BERT, BGE, E5, GTE, and MiniLM work.

3

Compatible architecture

Standard BERT-family sentence transformers work out of the box with Oxidoc's boostr GGUF loader. This covers the vast majority of embedding models on Hugging Face.

When to Use a Custom Model

Non-English documentation

The bundled model is optimized for English. For Japanese, Chinese, Korean, Arabic, or multilingual docs, use a model trained on your target language. Semantic search quality drops significantly when the model doesn't understand the content language.

Domain-specific content

Medical, legal, scientific, or financial documentation uses specialized terminology. A domain-fine-tuned model understands that "injection" means something different in medicine vs. software security.

Higher quality embeddings

Larger models (100MB+) with more dimensions (768, 1024) capture more semantic nuance. If search quality is critical and your users have fast connections, a larger model is worth the tradeoff.

Smaller footprint

If 17.5 MB is too large for your users (mobile-first docs, users on slow connections), use a smaller quantized model. Q4 quantization can cut size by 50-75% with modest quality loss.

Finding Models

Search Hugging Face for GGUF sentence embedding models. When evaluating models, look for:

  • "sentence-transformers" or "embedding" in the model name
  • GGUF variants — look for filenames like model-q5_k_m.gguf or model-f16.gguf
  • Language coverage — check the model card for supported languages
  • Embedding dimensions — higher dimensions = better quality but larger vectors
  • MTEB benchmark scores — the standard benchmark for embedding quality
Use CaseModelSizeDimensionsNotes
English (default)BGE Micro v217.5 MB384Bundled, good quality/size ratio
English (high quality)bge-small-en-v1.5~130 MB384Better quality, larger download
Multilingualmultilingual-e5-small~120 MB384100+ languages, good for i18n sites
Japanese / CJKmultilingual-e5-small~120 MB384Strong CJK support
Chinesebge-small-zh-v1.5~100 MB512Optimized specifically for Chinese
Minimal sizeall-MiniLM-L6-v2~23 MB384Smallest useful model

Model size = browser download

Your custom model file is served to the browser for query-time embedding. A 500 MB model means a 500 MB download before semantic search works. Lexical search still works immediately — semantic search becomes available once the model loads.

Converting Models to GGUF

If you find a model on Hugging Face in safetensors or PyTorch format but not GGUF, you can convert it:

1

Clone the model

git clone https://huggingface.co/BAAI/bge-small-en-v1.5
2

Convert to GGUF

Use the conversion tools from llama.cpp or ggml:

python convert-hf-to-gguf.py bge-small-en-v1.5 --outtype f16
3

Quantize (optional)

Reduce file size with quantization:

./quantize bge-small-en-v1.5-f16.gguf bge-small-en-v1.5-q5_k_m.gguf Q5_K_M

Quantization levels (smaller = less quality):

  • F16 — full precision, largest file
  • Q8_0 — near-lossless, ~50% smaller
  • Q5_K_M — good balance, ~65% smaller
  • Q4_K_M — aggressive, ~75% smaller
4

Use in Oxidoc

oxidoc.tomltoml
[search]
semantic = true
model_path = "./models/bge-small-en-v1.5-q5_k_m.gguf"

Embedding models only

Only convert sentence/embedding models (BERT, BGE, E5, GTE, MiniLM). Do not use generative LLMs (GPT, Llama, Mistral) — they don't produce fixed-size embeddings and won't work with Oxidoc's search engine.

Testing Your Model

After switching models, test search quality by trying:

  1. Exact term searches — "oxidoc.toml", "CodeBlock" (should still work via lexical)
  2. Conceptual searches — "how to change colors" (should find theming page)
  3. Synonym searches — "setup" (should find installation page)
  4. Your domain terms — whatever specialized vocabulary your docs use

If semantic results are poor, try a larger model or one specifically trained on your domain/language.