Troubleshooting

Common issues, diagnostic steps, and FAQ for operating Isartor.

Startup Errors
Cache Issues
Embedding & SLM Issues
Cloud LLM Issues
Observability Issues
Performance & Degraded Operation
Docker & Deployment Issues
FAQ

Startup Errors

`Failed to initialize candle TextEmbedder`

Symptom: Gateway panics on startup with:

Failed to initialize candle TextEmbedder (all-MiniLM-L6-v2)

Causes & Fixes:

Cause	Fix
Model files not downloaded	Run once with internet access; candle auto-downloads to `~/.cache/huggingface/`
Corrupted model cache	Delete `~/.cache/huggingface/` and restart
Cache directory not writable (`Permission denied (os error 13)`)	Set `HF_HOME` (or `ISARTOR_HF_CACHE_DIR`) to a writable path (e.g. `/tmp/huggingface`). In Docker, mount a volume there: `-e HF_HOME=/tmp/huggingface -v isartor-hf:/tmp/huggingface`.
Insufficient memory	Ensure ≥ 256 MB available for the embedding model

`Address already in use`

Symptom:

Error: error creating server listener: Address already in use (os error 48)

Fix:

# Find the process using port 8080
lsof -i :8080
# Kill it, or change the port:
export ISARTOR__HOST_PORT=0.0.0.0:9090

`missing field` or config deserialization errors

Symptom:

Error: missing field `layer2` in config

Fix: Ensure all required environment variables have the correct prefix and separator. Isartor uses double-underscore (__) as separator:

# Correct:
export ISARTOR__LAYER2__SIDECAR_URL=http://127.0.0.1:8081

# Wrong:
export ISARTOR_LAYER2_SIDECAR_URL=http://127.0.0.1:8081

See the Configuration Reference for the full list of variables.

Gateway auth / `401 Unauthorized`

Symptom: All requests return 401 Unauthorized.

By default, gateway_api_key is empty and auth is disabled — you should not see 401 errors unless you (or your deployment) explicitly set ISARTOR__GATEWAY_API_KEY.

If you enabled auth by setting a key, every request must include it:

export ISARTOR__GATEWAY_API_KEY=your-secret-key

Common causes of unexpected 401s:

The key in your request header doesn't match ISARTOR__GATEWAY_API_KEY.
You forgot to include X-API-Key or Authorization: Bearer in the request.

Cache Issues

Low Cache Hit Rate

Symptom: Deflection rate below expected levels despite repeated traffic.

Diagnostic steps:

Check cache mode:

echo $ISARTOR__CACHE_MODE   # should be "both" for most workloads

Check similarity threshold:
```
echo $ISARTOR__SIMILARITY_THRESHOLD   # default: 0.85
```
If too high (> 0.92), similar prompts won't match. Try lowering to 0.80.
Check TTL:
```
echo $ISARTOR__CACHE_TTL_SECS   # default: 300
```
Short TTL evicts entries before they can be reused.
Check Jaeger for cosine_similarity values on semantic cache spans. If scores are just below the threshold, lower it.

Stale Cache Responses

Symptom: Users receive outdated answers from cache.

Fix: Reduce TTL or restart the gateway to clear in-memory caches:

export ISARTOR__CACHE_TTL_SECS=60   # 1 minute

For Redis-backed caches, you can flush explicitly:

redis-cli -u $ISARTOR__REDIS_URL FLUSHDB

Redis Connection Refused

Symptom:

Layer 1a: Redis connection error — falling through

Diagnostic steps:

Verify Redis is running:

redis-cli -u $ISARTOR__REDIS_URL ping
# Expected: PONG

Check network connectivity (especially in Docker/K8s):

# Inside the gateway container:
curl -v telnet://redis:6379

Verify the URL format:

# Correct formats:
export ISARTOR__REDIS_URL=redis://127.0.0.1:6379
export ISARTOR__REDIS_URL=redis://user:password@redis.svc:6379/0

Check Redis memory limit — if Redis is OOM, it will reject writes.

Fallback behaviour: When Redis is unreachable, Isartor falls through to the next layer. No data is lost, but deflection rate drops.

Cache Memory Growing Unbounded

Symptom: Gateway memory usage increases over time.

Fix: The in-memory cache uses bounded LRU eviction. Check:

echo $ISARTOR__CACHE_MAX_CAPACITY   # default: 10000

If set too high, reduce it. Each entry ≈ 2–4 KB, so 10K entries ≈ 20–40 MB.

Embedding & SLM Issues

Slow Embedding Generation

Symptom: L1b latency > 10 ms.

Causes & Fixes:

Cause	Fix
CPU-bound contention	Increase CPU allocation for the container
Large prompt text	Embedder truncates to model max length (512 tokens), but longer text = more CPU
Cold start	First embedding call warms up the candle BertModel (~2 s). Subsequent calls are fast.

SLM Sidecar Unreachable

Symptom:

Layer 2: Failed to connect to SLM sidecar — falling through

Diagnostic steps:

Check if the sidecar is running:
```
curl http://127.0.0.1:8081/v1/models
```

Verify configuration:

echo $ISARTOR__LAYER2__SIDECAR_URL   # default: http://127.0.0.1:8081

Check the sidecar logs for errors (model loading, OOM, etc.).

Increase timeout if the sidecar is slow:

export ISARTOR__LAYER2__TIMEOUT_SECONDS=60

Fallback behaviour: When the SLM sidecar is unreachable, Isartor treats all requests as COMPLEX and forwards to Layer 3.

SLM Misclassification (Tiered: TEMPLATE / SNIPPET / COMPLEX)

The default classifier mode is tiered, which sorts requests into three categories instead of the legacy binary SIMPLE/COMPLEX split:

Tier	Description
TEMPLATE	Config files, type definitions, documentation, boilerplate
SNIPPET	Short single-function code, simple middleware (<50 lines)
COMPLEX	Multi-file implementations, test suites, full endpoints

TEMPLATE and SNIPPET requests are answered locally by the SLM; COMPLEX requests are forwarded to Layer 3. The legacy binary mode (SIMPLE/COMPLEX) is still available via ISARTOR__LAYER2__CLASSIFIER_MODE=binary.

An answer quality guard also rejects SLM answers that are too short (<10 chars) or start with uncertainty phrases, escalating them to Layer 3.

Symptom: Users receive low-quality answers for complex questions (misclassified as TEMPLATE/SNIPPET) or unnecessarily hit the cloud for simple ones.

Diagnostic steps:

In Jaeger, search for router.decision attribute to see classification distribution across TEMPLATE, SNIPPET, and COMPLEX.

Send known-simple and known-complex prompts and check the classification:

curl -s -X POST http://localhost:8080/api/chat \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $KEY" \
  -d '{"prompt": "Generate a tsconfig.json"}' | jq '.layer'
# Expected: layer 2 (TEMPLATE)

Consider switching to a larger SLM model for better classification accuracy.
To fall back to the legacy binary classifier, set ISARTOR__LAYER2__CLASSIFIER_MODE=binary.

Embedded Candle Engine Errors

Symptom:

Layer 2: Embedded classification failed – falling through

Causes & Fixes:

Cause	Fix
Model file missing	Set `ISARTOR__EMBEDDED__MODEL_PATH` to a valid GGUF file
Insufficient memory	Candle GGUF models need 1–4 GB RAM
Feature not compiled	Build with `--features embedded-inference`

Cloud LLM Issues

`502 Bad Gateway` from Layer 3

Symptom: Requests that reach Layer 3 return 502.

Diagnostic steps:

Check provider connectivity:

curl -s $ISARTOR__EXTERNAL_LLM_URL \
  -H "Authorization: Bearer $ISARTOR__EXTERNAL_LLM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"ping"}]}'

Verify API key is valid and has quota.

For Azure OpenAI, check deployment ID and API version:

echo $ISARTOR__AZURE_DEPLOYMENT_ID
echo $ISARTOR__AZURE_API_VERSION

Rate Limiting from Cloud Provider

Symptom: Intermittent 429 errors from the cloud LLM.

Fix:

Increase deflection rate (lower threshold, longer TTL) to reduce cloud traffic.
Request higher rate limits from your provider.
Implement client-side retry with exponential backoff (application level).

Wrong Provider Configured

Symptom: Authentication errors or unexpected response formats.

Fix: Verify the provider matches the URL and API key:

# OpenAI
export ISARTOR__LLM_PROVIDER=openai

# Azure
export ISARTOR__LLM_PROVIDER=azure

# Anthropic
export ISARTOR__LLM_PROVIDER=anthropic

# xAI
export ISARTOR__LLM_PROVIDER=xai

# Google Gemini
export ISARTOR__LLM_PROVIDER=gemini

# Ollama (local — no API key required)
export ISARTOR__LLM_PROVIDER=ollama

See the Configuration Reference for the full list of supported providers.

Observability Issues

No Traces in Jaeger

Cause	Fix
Monitoring disabled	`export ISARTOR__ENABLE_MONITORING=true`
Wrong endpoint	`export ISARTOR__OTEL_EXPORTER_ENDPOINT=http://otel-collector:4317`
Collector not running	`docker compose -f docker-compose.observability.yml up otel-collector`
Firewall blocking gRPC	Ensure port 4317 is open between gateway and collector

No Metrics in Prometheus

Cause	Fix
Prometheus not scraping collector	Check `prometheus.yml` targets include `otel-collector:8889`
Collector metrics pipeline broken	Verify `otel-collector-config.yaml` exports to Prometheus
No requests sent yet	Send a test request — metrics appear after first request

Grafana Shows "No Data"

Cause	Fix
Data source not configured	Add Prometheus source: URL `http://prometheus:9090`
Wrong time range	Expand the time range in Grafana to cover the test period
Dashboard not provisioned	Check `docker/grafana/provisioning/` paths are mounted

Console Shows "OTel disabled" Despite Setting env var

Cause: Config file takes precedence, or the env var prefix is wrong.

Fix:

# Correct (double underscore):
export ISARTOR__ENABLE_MONITORING=true

# Wrong (single underscore):
export ISARTOR_ENABLE_MONITORING=true  # ❌ not picked up

Performance & Degraded Operation

High Tail Latency (P99 > 10 s)

Diagnostic steps:

Check which layer is the bottleneck:

histogram_quantile(0.99,
  sum by (le, layer_name) (
    rate(isartor_layer_duration_seconds_bucket[5m])
  )
)

Common causes:
- L3 Cloud: provider is slow → switch to a faster model or provider.
- L2 SLM: model inference is slow → use a smaller quantised model.
- L1b Semantic: embedding is slow → check CPU contention.

Gateway OOM (Out of Memory)

Diagnostic steps:

Check cache capacity:
```
echo $ISARTOR__CACHE_MAX_CAPACITY
```
Reduce capacity or switch to Redis backend.
If using embedded SLM, check model size vs. container memory limit.

Requests Queuing / High Connection Count

Symptom: Clients see connection timeouts or slow responses even for cache hits.

Causes & Fixes:

Cause	Fix
Too many concurrent requests	Scale horizontally (add replicas)
`spawn_blocking` pool exhaustion	Increase Tokio blocking threads: `TOKIO_WORKER_THREADS=8`
SLM inference blocking async runtime	Ensure SLM runs on blocking pool (default in Isartor)

Degraded Mode (SLM Down, Cache Only)

When the SLM sidecar is unreachable, Isartor automatically degrades:

L1a/L1b cache still works → cached requests are served.
L2 SLM → all requests treated as COMPLEX (regardless of classifier mode) → forwarded to L3.
Impact: Higher cloud costs, but no downtime.

Monitor with:

# If SLM layer stops resolving requests, something is wrong
sum(rate(isartor_requests_total{final_layer="L2_SLM"}[5m])) == 0

Docker & Deployment Issues

Docker Build Fails

Symptom: cargo build fails inside Docker.

Common fixes:

Ensure Dockerfile uses the correct Rust toolchain version.
For aws-lc-rs (TLS): install cmake, gcc, make in build stage.
Check that .dockerignore isn't excluding required files.

Container Can't Reach Host Services

Symptom: Gateway inside Docker can't connect to sidecar on localhost.

Fix: Use Docker network names or host.docker.internal:

# docker-compose.yml
environment:
  - ISARTOR__LAYER2__SIDECAR_URL=http://sidecar:8081   # service name
  # or for host:
  - ISARTOR__LAYER2__SIDECAR_URL=http://host.docker.internal:8081

Health Check Failing

Symptom: Orchestrator keeps restarting the container.

Fix: The health endpoint is GET /healthz. Ensure the health check matches:

healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:8080/healthz"]
  interval: 10s
  timeout: 5s
  retries: 3

FAQ

Q: What is `cache_mode` and which should I use?

A: cache_mode controls which cache layers are active:

Mode	What it does	Best for
`exact`	Only SHA-256 hash match	Deterministic agent loops
`semantic`	Only cosine similarity	Diverse user queries
`both`	Exact first, then semantic	Most workloads (default)

Q: What happens if Redis goes down?

A: Isartor gracefully falls through. The exact cache layer logs a warning and forwards the request downstream. No crash, no data loss. Deflection rate drops until Redis recovers, and more requests reach the cloud LLM (higher cost).

Q: Can I change the embedding model?

A: Yes. The in-process embedder uses candle with a pure-Rust BertModel, which supports multiple models. Set:

export ISARTOR__EMBEDDING_MODEL=bge-small-en-v1.5

The model is auto-downloaded on first startup. Note: changing the model invalidates the semantic cache (different embedding dimensions/space).

Q: How much does Isartor cost to run?

A: Isartor itself is free (Apache 2.0). The infrastructure cost depends on your deployment:

Mode	Estimated Cost
Minimalist (single binary, no GPU)	~$5–15/month (small VM or container)
With SLM sidecar (CPU)	~$20–50/month (4-core VM)
With SLM on GPU	~$50–200/month (GPU instance)
Enterprise (K8s + Redis + vLLM)	~$200–500/month

The ROI comes from cloud LLM savings. At 70 % deflection and $0.01/1K tokens, Isartor typically pays for itself within the first week.

Q: Is Isartor production-ready?

A: Isartor is designed for production use with:

✅ Bounded, concurrent caches (no unbounded memory growth)
✅ Graceful degradation (every layer has a fallback)
✅ OpenTelemetry observability (traces, metrics, structured logs)
✅ Health check endpoint (/healthz)
✅ Configurable via environment variables (12-factor app)
✅ Integration tests covering all middleware layers

For enterprise deployments, use Redis-backed caches and a production Kubernetes cluster. See the Enterprise Guide.

Q: Can I use Isartor with LangChain / LlamaIndex / AutoGen?

A: Yes. Isartor exposes an OpenAI-compatible API. Point any SDK at the gateway URL:

import openai
client = openai.OpenAI(
    base_url="http://your-isartor-host:8080/v1",
    api_key="your-gateway-key",
)

See Integrations for full examples.

Q: How do I upgrade Isartor?

# Binary
cargo install --path . --force

# Docker
docker pull ghcr.io/isartor-ai/isartor:latest
docker compose up -d --pull always

In-memory caches are cleared on restart. Redis caches persist.

Q: Why does `isartor update` or GitHub access fail with `localhost:8081` / `Connection refused` after I stopped Isartor?

A: Your shell likely still has proxy environment variables from a prior isartor connect ... session, so non-Isartor commands are still trying to reach GitHub through the local CONNECT proxy on localhost:8081.

Fix on macOS / Linux:

unset HTTPS_PROXY HTTP_PROXY ALL_PROXY https_proxy http_proxy all_proxy
unset NODE_EXTRA_CA_CERTS SSL_CERT_FILE REQUESTS_CA_BUNDLE
unset ISARTOR_COPILOT_ENABLED ISARTOR_ANTIGRAVITY_ENABLED

Then confirm the shell is clean:

env | grep -i proxy

You can also clean up client-side configuration:

isartor connect copilot --disconnect
isartor connect claude --disconnect
isartor connect antigravity --disconnect

Q: Why does `isartor update` fail with `Permission denied (os error 13)`?

A: Your current isartor binary is installed in a system-managed directory.

Recommended fix: move to a user-writable install location:

mkdir -p ~/.local/bin
cp /usr/local/bin/isartor ~/.local/bin/isartor
chmod +x ~/.local/bin/isartor
export PATH="$HOME/.local/bin:$PATH"
hash -r

Then confirm: which isartor

Q: Why does `isartor` keep my terminal busy?

A: isartor runs the API gateway in the foreground by default. Start in detached mode:

isartor up --detach

Stop later with: isartor stop

Q: How do I monitor deflection rate in real-time?

A: Use the Grafana dashboard included in dashboards/prometheus-grafana.json or the PromQL query:

1 - (
  sum(rate(isartor_requests_total{final_layer="L3_Cloud"}[5m]))
  /
  sum(rate(isartor_requests_total[5m]))
)

Q: Can I run Isartor without any cloud LLM?

A: Partially. Layers 1 and 2 work standalone (cache + SLM). But Layer 3 requires a cloud LLM API key. Without one, uncached COMPLEX requests will return a 502 error. For fully local operation, ensure your SLM can handle all traffic (set a very aggressive SIMPLE classification).

Isartor Documentation