Level 1 — Minimal Deployment

Single static binary, embedded candle inference + in-process candle sentence embeddings, zero C/C++ dependencies.

This guide covers deploying Isartor as a standalone process — no sidecars, no Docker Compose, no orchestrator. The firewall binary embeds a Gemma-2-2B-IT GGUF model via candle for Layer 2 classification and uses candle's BertModel (sentence-transformers/all-MiniLM-L6-v2) for Layer 1 semantic cache embeddings — all entirely in-process, pure Rust.

When to Use Level 1

✅ Good Fit	❌ Consider Level 2/3 Instead
€5–€20/month VPS (Hetzner, DigitalOcean, Linode)	GPU inference for generation quality
ARM edge devices (Raspberry Pi 5, Jetson Nano)	More than ~50 concurrent users
Air-gapped / offline environments	Production observability stack required
Development & local experimentation	Multi-node high-availability
CI/CD test runners

Prerequisites

Requirement	Minimum	Recommended
RAM	2 GB free	4 GB free
Disk	2 GB (model download)	5 GB
CPU	2 cores	4+ cores (AVX2 recommended)
Rust (build from source)	1.75+	Latest stable
OS	Linux (x86_64 / aarch64), macOS	Ubuntu 22.04 LTS

Memory budget: Gemma-2-2B Q4_K_M ≈ 1.5 GB, candle BertModel ≈ 90 MB, tokenizer ≈ 4 MB, firewall runtime ≈ 50 MB. Total: ~1.7 GB resident.

Option A: One-Click Install (Recommended)

The fastest way to get started is to leverage the pre-built, cross-platform binaries generated by the CI/CD pipeline.

Install via script:

curl -fsSL https://raw.githubusercontent.com/isartor-ai/Isartor/main/install.sh | sh

Windows (PowerShell):

irm https://raw.githubusercontent.com/isartor-ai/Isartor/main/install.ps1 | iex

This script detects your target OS and processor architecture, downloads the correct release binary, and adds it to your path automatically.

Option B: Build from Source

1. Clone & Build

git clone https://github.com/isartor-ai/Isartor.git
cd Isartor
cargo build --release

The release binary is at ./target/release/isartor (~5 MB statically linked).

2. Configure Environment

Create a minimal .env file or export variables directly:

# Required — your cloud LLM key for Layer 3 fallback
export ISARTOR__EXTERNAL_LLM_API_KEY="sk-..."

# Optional — override defaults
export ISARTOR__GATEWAY_API_KEY="my-secret-key"
export ISARTOR__HOST_PORT="0.0.0.0:8080"
export ISARTOR__LLM_PROVIDER="openai"          # openai | azure | anthropic | xai
export ISARTOR__EXTERNAL_LLM_MODEL="gpt-4o-mini"

# Cache mode — "both" enables exact + semantic cache. Semantic embeddings
# are generated in-process via candle BertModel — no sidecar needed.
export ISARTOR__CACHE_MODE="both"

# Pluggable backends — Level 1 uses the defaults (no change needed):
#   ISARTOR__CACHE_BACKEND=memory     — in-process LRU (ahash + parking_lot)
#   ISARTOR__ROUTER_BACKEND=embedded  — in-process Candle GGUF SLM
# These are ideal for a single-process deployment with zero dependencies.

3. Start the Firewall

./target/release/isartor up

On first start, the embedded classifier will auto-download the Gemma-2-2B-IT GGUF model from Hugging Face Hub (~1.5 GB). Subsequent starts load from the local cache (~/.cache/huggingface/).

INFO  isartor > Listening on 0.0.0.0:8080
INFO  isartor::layer1::embeddings > Initialising candle TextEmbedder (all-MiniLM-L6-v2)...
INFO  isartor::layer1::embeddings > TextEmbedder ready (~90 MB BertModel loaded)
INFO  isartor::services::local_inference > Downloading model from mradermacher/gemma-2-2b-it-GGUF...
INFO  isartor::services::local_inference > Model loaded (1.5 GB), ready for inference

4. Verify

# Health check
curl http://localhost:8080/health

# Test the firewall
curl -s http://localhost:8080/api/chat \
  -H "Content-Type: application/json" \
  -H "X-API-Key: my-secret-key" \
  -d '{"prompt": "Hello, how are you?"}' | jq .

Option B: Docker (Single Container)

For environments where you prefer a container but don't need a full Compose stack.

Build the Image

cd isartor
docker build -t isartor:latest -f docker/Dockerfile .

Run

docker run -d \
  --name isartor \
  -p 8080:8080 \
  -e ISARTOR__GATEWAY_API_KEY="my-secret-key" \
  -e ISARTOR__EXTERNAL_LLM_API_KEY="sk-..." \
  -e ISARTOR__CACHE_MODE="both" \
  -e HF_HOME=/tmp/huggingface \
  -v isartor-models:/tmp/huggingface \
  isartor:latest

Note: The -v flag mounts a named volume for the Hugging Face cache so the model downloads persist across container restarts.

The official Docker image runs as non-root and uses HF_HOME=/tmp/huggingface to ensure the cache is writable.

Option C: systemd Service (Production Linux)

For long-running production deployments on bare metal or VPS.

1. Install the Binary

# Build
cargo build --release

# Install to /usr/local/bin
sudo cp target/release/isartor /usr/local/bin/isartor
sudo chmod +x /usr/local/bin/isartor

2. Create a System User

sudo useradd --system --no-create-home --shell /usr/sbin/nologin isartor

3. Create Environment File

sudo mkdir -p /etc/isartor
sudo tee /etc/isartor/env <<'EOF'
ISARTOR__HOST_PORT=0.0.0.0:8080
ISARTOR__GATEWAY_API_KEY=your-production-key
ISARTOR__EXTERNAL_LLM_API_KEY=sk-...
ISARTOR__LLM_PROVIDER=openai
ISARTOR__EXTERNAL_LLM_MODEL=gpt-4o-mini
ISARTOR__CACHE_MODE=both
ISARTOR__CACHE_BACKEND=memory
ISARTOR__ROUTER_BACKEND=embedded
RUST_LOG=isartor=info
EOF
sudo chmod 600 /etc/isartor/env

4. Create systemd Unit

sudo tee /etc/systemd/system/isartor.service <<'EOF'
[Unit]
Description=Isartor Prompt Firewall
After=network-online.target
Wants=network-online.target

[Service]
Type=simple
User=isartor
Group=isartor
EnvironmentFile=/etc/isartor/env
ExecStart=/usr/local/bin/isartor
Restart=on-failure
RestartSec=5
LimitNOFILE=65536

# Security hardening
NoNewPrivileges=true
ProtectSystem=strict
ProtectHome=true
PrivateTmp=true
ReadWritePaths=/var/cache/isartor

[Install]
WantedBy=multi-user.target
EOF

5. Create Model Cache Directory

sudo mkdir -p /var/cache/isartor
sudo chown isartor:isartor /var/cache/isartor

6. Enable & Start

sudo systemctl daemon-reload
sudo systemctl enable isartor
sudo systemctl start isartor

# Check status
sudo systemctl status isartor
sudo journalctl -u isartor -f

Model Pre-Caching (Air-Gapped / Offline)

If the deployment target has no internet access, pre-download the model on a connected machine and copy it over.

On the Connected Machine

# Install huggingface-cli
pip install huggingface-hub

# Download the GGUF file
huggingface-cli download mradermacher/gemma-2-2b-it-GGUF \
  gemma-2-2b-it.Q4_K_M.gguf \
  --local-dir ./models

# Also grab the tokenizer (from the base model)
huggingface-cli download google/gemma-2-2b-it \
  tokenizer.json \
  --local-dir ./models

Transfer to Target

scp -r ./models/ user@target-host:/var/cache/isartor/

By default, hf-hub uses ~/.cache/huggingface/. In the official Docker image, Isartor sets HF_HOME=/tmp/huggingface (non-root safe). Set HF_HOME or ISARTOR_HF_CACHE_DIR to point to your pre-cached directory if needed.

Level 1 Configuration Reference

These are the most relevant ISARTOR__* variables for Level 1 deployments. For the full reference, see the Configuration Reference.

Variable	Default	Level 1 Notes
`ISARTOR__HOST_PORT`	`0.0.0.0:8080`	Bind address
`ISARTOR__GATEWAY_API_KEY`	`""`	Set to enable gateway auth
`ISARTOR__CACHE_MODE`	`both`	`both` recommended — candle BertModel provides in-process semantic embeddings
`ISARTOR__CACHE_BACKEND`	`memory`	In-process LRU — ideal for single-process Level 1
`ISARTOR__ROUTER_BACKEND`	`embedded`	In-process Candle GGUF SLM — zero external dependencies
`ISARTOR__CACHE_TTL_SECS`	`300`	Cache TTL in seconds
`ISARTOR__CACHE_MAX_CAPACITY`	`10000`	Max entries per cache
`ISARTOR__LLM_PROVIDER`	`openai`	`openai` · `azure` · `anthropic` · `xai`
`ISARTOR__EXTERNAL_LLM_API_KEY`	(empty)	Required for Layer 3 fallback
`ISARTOR__EXTERNAL_LLM_MODEL`	`gpt-4o-mini`	Cloud LLM model name
`ISARTOR__ENABLE_MONITORING`	`false`	Enable for stdout OTel (no collector needed)

Embedded Classifier Defaults (Compiled)

Setting	Default Value	Description
`repo_id`	`mradermacher/gemma-2-2b-it-GGUF`	HF repo for the GGUF model
`gguf_filename`	`gemma-2-2b-it.Q4_K_M.gguf`	Model file (~1.5 GB)
`max_classify_tokens`	`20`	Token limit for classification
`max_generate_tokens`	`256`	Token limit for simple task execution
`temperature`	`0.0`	Greedy decoding for classification
`repetition_penalty`	`1.1`	Avoids degenerate loops

Performance Expectations

Metric	Typical Value (4-core x86_64)
Cold start (model download)	30–120 s (depends on bandwidth; ~1.5 GB Gemma + ~90 MB candle BertModel)
Warm start (cached model)	3–8 s
Classification latency	50–200 ms
Simple task execution	200–2000 ms
Firewall overhead (no inference)	< 1 ms
Memory (steady state)	~1.6 GB
Binary size	~5 MB

Upgrading to Level 2

When your traffic outgrows Level 1, the migration path is straightforward:

Add the generation sidecar — ISARTOR__LAYER2__SIDECAR_URL=http://127.0.0.1:8081 (replaces embedded candle with the more powerful Phi-3-mini on GPU).
Optionally add an embedding sidecar — ISARTOR__EMBEDDING_SIDECAR__SIDECAR_URL=http://127.0.0.1:8082 (only needed for external embedding inference; the default L1b semantic cache already uses in-process candle BertModel).
Deploy via Docker Compose — See Level 2 — Sidecar Deployment.

Note: The pluggable backend defaults (cache_backend=memory, router_backend=embedded) remain appropriate for Level 2 single-host deployments. You only need to switch to cache_backend=redis and router_backend=vllm at Level 3 when scaling horizontally.

No code changes required — only environment variables and infrastructure.

Isartor Documentation