Configuration Reference

Complete reference for every Isartor configuration variable, CLI command, and provider option.


Configuration Loading Order

Isartor loads configuration in the following order (later sources override earlier ones):

  1. Compiled defaults — baked into the binary
  2. isartor.toml — if present in the working directory or ~/.isartor/
  3. Environment variablesISARTOR__... with double-underscore separators

Generate a starter config file with:

isartor init

Master Configuration Table

YAML KeyEnvironment VariableTypeDefaultDescription
server.hostISARTOR__HOSTstring0.0.0.0Host address for server binding
server.portISARTOR__PORTint8080Port for HTTP server
exact_cache.providerISARTOR__CACHE_BACKENDstringmemoryLayer 1a cache backend: memory or redis
exact_cache.redis_urlISARTOR__REDIS_URLstring(none)Redis connection string (if provider=redis)
exact_cache.redis_dbISARTOR__REDIS_DBint0Redis database index
semantic_cache.providerISARTOR__SEMANTIC_BACKENDstringcandleLayer 1b semantic cache: candle (in-process) or tei (external)
semantic_cache.remote_urlISARTOR__TEI_URLstring(none)TEI endpoint (if provider=tei)
slm_router.providerISARTOR__ROUTER_BACKENDstringembeddedLayer 2 router: embedded or vllm
slm_router.remote_urlISARTOR__VLLM_URLstring(none)vLLM/TGI endpoint (if provider=vllm)
slm_router.modelISARTOR__VLLM_MODELstringgemma-2-2b-itModel name/path for SLM router
slm_router.model_pathISARTOR__MODEL_PATHstring(baked-in)Path to GGUF model file (embedded mode)
slm_router.classifier_modeISARTOR__LAYER2__CLASSIFIER_MODEstringtieredClassifier mode: tiered (TEMPLATE/SNIPPET/COMPLEX) or binary (legacy SIMPLE/COMPLEX)
slm_router.max_answer_tokensISARTOR__LAYER2__MAX_ANSWER_TOKENSu642048Max tokens the SLM may generate for a local answer
fallback.openai_api_keyISARTOR__OPENAI_API_KEYstring(none)OpenAI API key for Layer 3 fallback
fallback.anthropic_api_keyISARTOR__ANTHROPIC_API_KEYstring(none)Anthropic API key for Layer 3 fallback
llm_providerISARTOR__LLM_PROVIDERstringopenaiLLM provider (see below for full list)
external_llm_modelISARTOR__EXTERNAL_LLM_MODELstringgpt-4o-miniModel name to request from the provider
model_aliases.ISARTOR__MODEL_ALIASES__string(none)Request-time alias that resolves to a real model ID
external_llm_api_keyISARTOR__EXTERNAL_LLM_API_KEYstring(none)API key for the configured LLM provider (not needed for ollama)
provider_keysISARTOR__PROVIDER_KEYSJSON array[]Optional multi-key pool for the primary provider
key_rotation_strategyISARTOR__KEY_ROTATION_STRATEGYstringround_robinMulti-key selection strategy: round_robin or priority
key_cooldown_secsISARTOR__KEY_COOLDOWN_SECSu6460Cooldown applied after a key hits rate limits or quota exhaustion
l3_timeout_secsISARTOR__L3_TIMEOUT_SECSu64120HTTP timeout applied to all Layer 3 provider requests
enable_context_optimizerISARTOR__ENABLE_CONTEXT_OPTIMIZERbooltrueMaster switch for L2.5 context optimiser
context_optimizer_dedupISARTOR__CONTEXT_OPTIMIZER_DEDUPbooltrueEnable cross-turn instruction deduplication
context_optimizer_minifyISARTOR__CONTEXT_OPTIMIZER_MINIFYbooltrueEnable static minification (comments, rules, blanks)
enable_request_logsISARTOR__ENABLE_REQUEST_LOGSboolfalseOpt-in request/response debug logging with redaction
request_log_pathISARTOR__REQUEST_LOG_PATHstring~/.isartor/request_logsDirectory for rotating JSONL request logs
usage_log_pathISARTOR__USAGE_LOG_PATHstring~/.isartorDirectory that stores usage.jsonl for usage stats and quotas
usage_retention_daysISARTOR__USAGE_RETENTION_DAYSu6430Retention window for persisted usage events
usage_window_hoursISARTOR__USAGE_WINDOW_HOURSu6424Default reporting window for isartor stats --usage
provider_health_check_interval_secsISARTOR__PROVIDER_HEALTH_CHECK_INTERVAL_SECSu64300Background provider ping cadence for dashboard and health status (0 disables)
classifier_routing.enabledISARTOR__CLASSIFIER_ROUTING__ENABLEDboolfalseEnable the MiniLM multi-head routing pass before Layer 1 cache
classifier_routing.artifacts_pathISARTOR__CLASSIFIER_ROUTING__ARTIFACTS_PATHstring(empty)Path to the JSON artifact containing MiniLM routing heads
classifier_routing.confidence_thresholdISARTOR__CLASSIFIER_ROUTING__CONFIDENCE_THRESHOLDfloat0.55Minimum overall confidence required before routing rules can match
classifier_routing.fallback_to_existing_routingISARTOR__CLASSIFIER_ROUTING__FALLBACK_TO_EXISTING_ROUTINGbooltrueWhen false, fail closed with 503 instead of falling back to the normal routing path
classifier_routing.rulesISARTOR__CLASSIFIER_ROUTING__RULESJSON array[]Ordered routing rules matching classifier labels to provider/model targets
classifier_routing.matrixISARTOR__CLASSIFIER_ROUTING__MATRIXTOML table{}Model matrix: 2D grid of complexity → task_type → "provider/model" (compiled to rules at startup)
quota..*ISARTOR__QUOTA____*mixed(none)Per-provider token/cost quota policy and action

Sections

Server

  • server.host, server.port: Bind address and port.

Layer 1a: Exact Match Cache

  • exact_cache.provider: memory or redis
  • exact_cache.redis_url, exact_cache.redis_db: Redis config

Layer 1b: Semantic Cache

  • semantic_cache.provider: candle or tei
  • semantic_cache.remote_url: TEI endpoint
  • Requests that carry x-isartor-session-id, x-thread-id, x-session-id, or x-conversation-id are isolated into a session-aware cache scope. The same scope can also be provided in request bodies via session_id, thread_id, conversation_id, or metadata.*. If no session identifier is present, Isartor keeps the legacy global-cache behavior.

Layer 2: SLM Router

  • slm_router.provider: embedded or vllm
  • slm_router.remote_url, slm_router.model, slm_router.model_path: Router config
  • slm_router.classifier_mode: tiered (default — TEMPLATE/SNIPPET/COMPLEX) or binary (legacy SIMPLE/COMPLEX)
  • slm_router.max_answer_tokens: Max tokens the SLM may generate for a local answer (default 2048)

Layer 0.5: MiniLM classifier routing

  • classifier_routing.enabled: Enables the pre-cache MiniLM routing pass.
  • classifier_routing.artifacts_path: JSON artifact path loaded at startup. The artifact contains four heads: task_type, complexity, persona, and domain.
  • classifier_routing.confidence_threshold: Global minimum for overall_confidence before any rule matches.
  • classifier_routing.fallback_to_existing_routing: Default true. When false, requests fail closed with 503 if the classifier artifact is missing, classification fails, or no rule matches.
  • classifier_routing.rules: Ordered rule list. Each rule may match any subset of task_type, complexity, persona, and domain, and must supply at least one route target: provider and/or model.
  • classifier_routing.matrix: Optional model matrix — a 2D grid mapping complexity × task_type to "provider/model" targets. Matrix entries compile into rules at startup. Explicit rules take priority. Use "local" for cells that should stay on the cache/SLM path. Use "default" in either dimension as a wildcard.

Example:

[classifier_routing]
enabled = true
artifacts_path = "./minilm-routing-artifact.json"
confidence_threshold = 0.60
fallback_to_existing_routing = true

[[classifier_routing.rules]]
name = "codegen-backend-builder"
task_type = "codegen"
complexity = "complex"
persona = "builder"
domain = "backend"
provider = "groq"
model = "llama-3.3-70b-versatile"

Model matrix example

[classifier_routing.matrix.complex]
code_generation = "groq/llama-3.3-70b-versatile"
analysis        = "anthropic/claude-sonnet-4-20250514"
conversation    = "openai/gpt-4o"
default         = "groq/llama-3.3-70b-versatile"

[classifier_routing.matrix.simple]
code_generation = "groq/llama-3.1-8b-instant"
analysis        = "groq/llama-3.1-8b-instant"
default         = "local"
  • Rows = complexity labels, columns = task_type labels.
  • "provider/model" pins both; "provider" alone pins only the provider.
  • "local" = stay on the cache/SLM path (no L3 provider override).
  • "default" in either dimension acts as a wildcard.
  • More-specific cells are tried first: complex/codegencomplex/defaultdefault/default.

Monitor classifier-guided routing with:

  • x-isartor-provider response header
  • isartor stats / isartor stats --by-tool
  • GET /debug/providers
  • opt-in request logs via enable_request_logs

Layer 2.5: Context Optimiser

L2.5 compresses repeated instruction payloads (CLAUDE.md, copilot-instructions.md, skills blocks) before they reach the cloud, reducing input tokens on every L3 call.

  • enable_context_optimizer: Master switch (default true). Set to false to disable L2.5 entirely.
  • context_optimizer_dedup: Enable cross-turn instruction deduplication (default true). When the same instruction block is seen in consecutive turns of the same session, it is replaced with a compact hash reference.
  • context_optimizer_minify: Enable static minification (default true). Strips HTML/XML comments, decorative horizontal rules, consecutive blank lines, and Unicode box-drawing decoration.

The pipeline processes system/instruction messages from OpenAI, Anthropic, and native request formats. See Deflection Stack — L2.5 for architecture details.

Request debug logging

Isartor can optionally record request and response payloads to a separate JSONL log for troubleshooting provider or client integrations.

  • enable_request_logs: Default false. Set to true only while debugging.
  • request_log_path: Directory where rotating request log files are written. Default ~/.isartor/request_logs.
  • provider_health_check_interval_secs: Default 300. Controls the dashboard/runtime background provider ping loop. Set to 0 to disable periodic pings.

Important behavior:

  • request logs are separate from isartor.log and OpenTelemetry output
  • sensitive headers such as Authorization, api-key, and x-api-key are redacted automatically
  • bodies are truncated to a bounded size per entry to keep logs manageable
  • isartor logs --requests shows or follows the request log stream
  • dashboard Test actions also update the in-memory provider health badge immediately, while the background ping loop keeps it fresh between real routed requests

Layer 3: Cloud Fallbacks

  • fallback.openai_api_key, fallback.anthropic_api_key: API keys for external LLMs
  • llm_provider: Select the active provider. All providers are powered by rig-core except copilot, which uses Isartor's native GitHub Copilot adapter:
    • openai (default), azure, anthropic, xai
    • gemini, mistral, groq, cerebras, nebius, siliconflow, fireworks, nvidia, chutes, deepseek
    • cohere, galadriel, hyperbolic, huggingface
    • mira, moonshot, ollama (local, no key), openrouter
    • perplexity, together
    • copilot (GitHub Copilot subscription-backed L3)
  • external_llm_model: Model name for the selected provider (e.g. gpt-4o-mini, gemini-2.0-flash, mistral-small-latest, llama-3.1-8b-instant, deepseek-chat, command-r, sonar, moonshot-v1-128k)
  • Many OpenAI-compatible providers ship with built-in default endpoints now, so set-key, setup, and check work directly for providers such as Cerebras, Nebius, SiliconFlow, Fireworks, NVIDIA, and Chutes.
  • model_aliases: Optional map of friendly names to real model IDs. Alias resolution happens at the HTTP boundary before L1 cache keys are built, so model="fast" and the resolved real model share the same canonical cache behavior.
  • external_llm_api_key: API key for the configured provider (not needed for ollama)
  • provider_keys: Optional array-of-tables or ISARTOR__PROVIDER_KEYS JSON array for multiple credentials on the same provider. Each entry supports key, priority, and optional label.
  • key_rotation_strategy: round_robin (default) or priority
  • key_cooldown_secs: Cooldown window, in seconds, after a key hits 429 / quota-style upstream failures
  • l3_timeout_secs: Shared timeout, in seconds, for all Layer 3 provider HTTP calls
  • fallback_providers: Optional ordered backup chain. Keep the current top-level provider as the primary, then add [[fallback_providers]] entries with provider, model, api_key, provider_keys, key_rotation_strategy, and url. Azure fallbacks can also set azure_deployment_id and azure_api_version.
  • ISARTOR__FALLBACK_PROVIDERS: Environment override for the same chain as a JSON array of provider objects. Example:
export ISARTOR__FALLBACK_PROVIDERS='[
  {"provider":"nvidia","model":"meta/llama-3.1-8b-instruct","api_key":"nvapi-...","url":"https://integrate.api.nvidia.com/v1/chat/completions"},
  {"provider":"openrouter","model":"openai/gpt-4o-mini","api_key":"sk-or-...","url":"https://openrouter.ai/api/v1/chat/completions"}
]'
  • Failover happens only after the current provider exhausts its own retry budget, and only for provider-side errors that are safe to cascade (for example 429, 5xx, timeouts, and quota-style failures). Invalid request / bad request errors do not move to the next provider.
  • Successful Layer 3 responses include an x-isartor-provider header naming the upstream that actually answered.

Provider status / health

Isartor keeps an in-memory status tracker for the configured Layer 3 chain. It is intentionally process-local and resets on restart.

  • GET /debug/providers: Authenticated debug endpoint that returns the active provider plus every configured primary/fallback entry, including model, endpoint, request/error counts, and the last-known success/error timestamps and message.
  • Provider status now also includes masked key-pool entries, their strategy, request counts, rate-limit counts, and cooldown state.
  • isartor providers: CLI view that reads /debug/providers when the gateway is reachable and falls back to local config inspection when it is not.
  • The tracker is updated only by real Layer 3 request outcomes. It does not persist across restarts and does not write to Redis or other storage.

Supported inbound API surfaces

Isartor currently accepts four user-facing request formats at the gateway boundary:

  • Native Isartor: POST /api/chat and POST /api/v1/chat
  • OpenAI-compatible: POST /v1/chat/completions
  • Anthropic-compatible: POST /v1/messages
  • Gemini-native: POST /v1beta/models/{model}:generateContent and POST /v1beta/models/{model}:streamGenerateContent

Gemini-native requests use the model embedded in the URL path as the canonical request model when the body omits a model field. Cache entries stay namespaced by API surface, so Gemini JSON responses never collide with native, OpenAI, or Anthropic cache entries even when the underlying prompt text is identical.

Model aliases

Use aliases when you want clients to send stable names like fast, smart, or code instead of raw provider model IDs:

[model_aliases]
fast = "gpt-4o-mini"
smart = "gpt-4o"
code = "gpt-4.1"

You can also write them from the CLI:

isartor set-alias --alias fast --model gpt-4o-mini

Aliases are currently model aliases within the configured provider. They are surfaced by GET /v1/models alongside the configured real model IDs.


TOML Config Example

Generate a scaffold with isartor init, then edit isartor.toml:

[server]
host = "0.0.0.0"
port = 8080

[exact_cache]
provider = "memory"           # "memory" or "redis"
# redis_url = "redis://127.0.0.1:6379"
# redis_db = 0

[semantic_cache]
provider = "candle"           # "candle" or "tei"
# remote_url = "http://localhost:8082"

[slm_router]
provider = "embedded"         # "embedded" or "vllm"
# remote_url = "http://localhost:8000"
# model = "gemma-2-2b-it"

# L2.5 Context Optimiser (all enabled by default)
# enable_context_optimizer = true
# context_optimizer_dedup = true
# context_optimizer_minify = true

[fallback]
# openai_api_key = "sk-..."
# anthropic_api_key = "sk-ant-..."

# llm_provider = "openai"
# external_llm_model = "gpt-4o-mini"
# external_llm_api_key = "sk-..."
# key_rotation_strategy = "round_robin"
# key_cooldown_secs = 60
#
# [[provider_keys]]
# key = "sk-primary"
# priority = 1
# label = "primary"
#
# [[provider_keys]]
# key = "sk-team-shared"
# priority = 2
# label = "team-shared"
#
# [[fallback_providers]]
# provider = "nvidia"
# model = "meta/llama-3.1-8b-instruct"
# api_key = "nvapi-..."
# url = "https://integrate.api.nvidia.com/v1/chat/completions"
#
# [model_aliases]
# fast = "gpt-4o-mini"
# smart = "gpt-4o"

Per-Tier Defaults

SettingLevel 1 (Minimal)Level 2 (Sidecar)Level 3 (Enterprise)
Cache backendmemorymemoryredis
Semantic backendcandlecandletei (optional)
SLM routerembeddedembedded or sidecarvllm
LLM provideropenaiopenaiany
Monitoringfalsetruetrue

Provider-Specific Configuration

Each provider requires ISARTOR__EXTERNAL_LLM_API_KEY (except Ollama) and a matching ISARTOR__LLM_PROVIDER value:

# OpenAI (default)
export ISARTOR__LLM_PROVIDER=openai
export ISARTOR__EXTERNAL_LLM_MODEL=gpt-4o-mini

# Azure OpenAI
export ISARTOR__LLM_PROVIDER=azure

# Anthropic
export ISARTOR__LLM_PROVIDER=anthropic
export ISARTOR__EXTERNAL_LLM_MODEL=claude-3-haiku-20240307

# xAI (Grok)
export ISARTOR__LLM_PROVIDER=xai

# Google Gemini
export ISARTOR__LLM_PROVIDER=gemini
export ISARTOR__EXTERNAL_LLM_MODEL=gemini-2.0-flash

# Ollama (local — no API key required)
export ISARTOR__LLM_PROVIDER=ollama
export ISARTOR__EXTERNAL_LLM_MODEL=llama3

# GitHub Copilot (configured automatically by `isartor connect claude-copilot`)
export ISARTOR__LLM_PROVIDER=copilot
export ISARTOR__EXTERNAL_LLM_MODEL=claude-sonnet-4.5

Setting API Keys with the CLI

Use isartor set-key for interactive key management:

isartor set-key --provider openai
isartor set-key --provider anthropic
isartor set-key --provider xai

This writes the key to isartor.toml or the appropriate env file.


CLI Commands

CommandDescription
isartor upStart the API gateway only (recommended default). Flag: --detach to run in background
isartor up <copilot|claude|antigravity>Start the gateway plus the CONNECT proxy for that client
isartor initGenerate a commented isartor.toml config scaffold
isartor demoRun the post-install showcase (cache-only, or live + cache when a provider is configured)
isartor checkAudit outbound connections
isartor connect <client>Configure AI clients to route through Isartor
isartor connect copilotConfigure Copilot CLI with CONNECT proxy + TLS MITM
isartor connect claude-copilotConfigure Claude Code to use GitHub Copilot through Isartor
isartor statsShow total prompts, counts by layer, and recent prompt routing history
isartor set-key --provider <name>Set LLM provider API key (writes to isartor.toml or env file)
isartor set-alias --alias <name> --model <id>Set a request-time model alias in isartor.toml
isartor providersShow the active provider config plus last-known in-memory Layer 3 health
isartor logs --requestsShow or follow the separate request/response debug log
isartor stopStop a running Isartor instance (uses PID file). Flags: --force (SIGKILL), --pid-file <path>
isartor updateSelf-update to the latest (or specific) version. Flags: --version <tag>, --dry-run, --force

See also: Architecture · Metrics & Tracing · Troubleshooting

Usage analytics

Isartor can persist provider/model usage events to ~/.isartor/usage.jsonl and expose aggregated summaries through GET /debug/usage and isartor stats --usage. Configure usage_log_path, usage_retention_days, usage_window_hours, and usage_pricing.<provider>.{input_cost_per_million_usd,output_cost_per_million_usd} via ISARTOR__... environment variables or isartor.toml.

Provider quotas

Quota policies reuse the same persisted usage tracker instead of introducing a second accounting store. Define a policy per provider with [quota.<provider>], then set any mix of token and cost ceilings:

  • daily_token_limit, weekly_token_limit, monthly_token_limit
  • daily_cost_limit_usd, weekly_cost_limit_usd, monthly_cost_limit_usd
  • warning_threshold_ratio (default 0.8)
  • action_on_limit = warn, block, or fallback

Behavior notes:

  • warnings are emitted when projected usage for the in-flight request crosses the configured threshold
  • block returns HTTP 429
  • fallback skips the current provider and continues down the existing Layer 3 provider chain
  • quota windows reset on UTC boundaries: daily at midnight, weekly on Monday 00:00, monthly on the first day of the month
  • isartor check prints the current quota window status for each configured provider target

Example:

[quota.openai]
daily_token_limit = 500000
monthly_cost_limit_usd = 25.0
warning_threshold_ratio = 0.8
action_on_limit = "fallback"

[quota.anthropic]
daily_cost_limit_usd = 10.0
action_on_limit = "block"