Air-Gapped / Offline Deployment
Overview
Isartor is architecturally the most air-gap-friendly LLM gateway available. Its pure-Rust statically compiled binary embeds all inference models at build time, requires no runtime dependencies, and validates licenses with an offline HMAC check — so Isartor itself does not initiate unsolicited telemetry or license calls to external services.
The zero-phone-home guarantee applies to Isartor-managed network paths: the
--offline flag disables L3 cloud routing and external observability backends
at the application layer, and our CI phone-home audit test (see
tests/phone_home_audit.rs) exercises these code paths on every commit.
Supported regulated industries: defense, healthcare (HIPAA), finance (SOX), and government (FedRAMP).
Pre-Deployment Checklist
Complete these steps before deploying Isartor in an air-gapped environment:
-
Download the airgapped Docker image
docker pull ghcr.io/isartor-ai/isartor:latest-airgappedThis image includes local copies of the L1b embedding models to minimize or avoid external downloads during normal operation in most setups. See Image Size Comparison for size details and be sure to follow any additional configuration steps required by your environment to operate fully offline.
-
Transfer to your air-gapped environment via your organisation's approved media transfer process (USB, air-gap data diode, etc.).
-
Enable offline mode
export ISARTOR__OFFLINE_MODE=trueAlternatively, pass
--offlineon the command line:isartor --offline -
Disable L3 or point it at an internal LLM endpoint
- For strictly air-gapped / zero-egress deployments, you must enable offline mode (step 3). Leaving
ISARTOR__EXTERNAL_LLM_API_KEYunset alone does not prevent the gateway from attempting outbound L3 calls to the default external endpoint on cache misses. - To run fully local (cache + SLM only) with no outbound attempts, enable offline mode and leave
ISARTOR__EXTERNAL_LLM_API_KEYunset. - To route L3 to a self-hosted model, see Connecting to an Internal LLM.
- For strictly air-gapped / zero-egress deployments, you must enable offline mode (step 3). Leaving
-
Run
isartor checkto confirm zero external connections:isartor checkExpected output (with offline mode active):
Isartor Connectivity Audit ────────────────────────── Required (L3 cloud routing): → api.openai.com:443 [NOT CONFIGURED] (BLOCKED — offline mode active) Optional (observability / monitoring): → http://localhost:4317 [NOT CONFIGURED] Internal only (no external): → (in-memory cache — no network connection) [CONFIGURED - internal] Zero hidden telemetry connections: ✓ VERIFIED Air-gap compatible: ✓ YES (L3 disabled or offline mode active) -
Run
isartor audit verify(planned — see issue #3) to confirm the signed audit log is functioning correctly.
Connecting to an Internal LLM
In this configuration Isartor acts as a fully air-gapped deflection layer in front of an internal LLM. 100% of traffic stays inside the perimeter: L1a and L1b handle cached / semantically similar prompts locally, and only genuine cache misses are forwarded to your self-hosted model over the internal network.
# Route L3 to a self-hosted vLLM instance on the internal network.
export ISARTOR__EXTERNAL_LLM_URL=http://vllm.internal.corp:8000/v1
export ISARTOR__LLM_PROVIDER=openai # vLLM exposes an OpenAI-compat API
export ISARTOR__EXTERNAL_LLM_MODEL=meta-llama/Llama-3-8B-Instruct
# Enable offline mode to block any accidental external connections.
export ISARTOR__OFFLINE_MODE=true
# Start the gateway.
isartor
Note:
ISARTOR__EXTERNAL_LLM_URLsets the L3 endpoint URL. Point it at your internal vLLM or TGI server.
With this configuration:
- L1a (exact cache) deflects duplicate prompts instantly (< 1 ms).
- L1b (semantic cache) deflects semantically similar prompts (1–5 ms).
- L3 forwards surviving cache-miss prompts to your internal vLLM.
- Zero bytes leave the network perimeter.
Startup Status Banner
When offline mode is active, Isartor prints a status banner at startup so operators can confirm the configuration at a glance:
┌──────────────────────────────────────────────────────┐
│ [Isartor] OFFLINE MODE ACTIVE │
├──────────────────────────────────────────────────────┤
│ ✓ L1a Exact Cache: active │
│ ✓ L1b Semantic Cache: active │
│ - L2 SLM Router: disabled (ENABLE_SLM_ROUTER=false)│
│ ✗ L3 Cloud Logic: DISABLED (offline mode) │
│ ✗ Telemetry export: DISABLED if external endpoint │
│ ✓ License validation: offline HMAC check │
└──────────────────────────────────────────────────────┘
Environment Variables Reference
| Variable | Default | Description |
|---|---|---|
ISARTOR__OFFLINE_MODE | false | Enable air-gap mode. Blocks L3 cloud calls. |
ISARTOR__EXTERNAL_LLM_URL | — | Internal LLM endpoint (vLLM, TGI, etc.). |
ISARTOR__EXTERNAL_LLM_MODEL | gpt-4o-mini | Model name passed to the internal LLM. |
ISARTOR__SIMILARITY_THRESHOLD | 0.85 | Cosine similarity threshold for L1b cache hits. Lower values increase local deflection. |
ISARTOR__OTEL_EXPORTER_ENDPOINT | http://localhost:4317 | OTel collector endpoint. External URLs are suppressed in offline mode. |
For the complete variable listing, see the Configuration Reference.
Image Size Comparison
| Image | Tag | Includes models | Compressed size |
|---|---|---|---|
| Base | latest | No (downloads on first run) | ~120 MB |
| Air-gapped | latest-airgapped | Yes (all-MiniLM-L6-v2 embedded) | ~210 MB |
The latest-airgapped image is approximately 90 MB larger due to the
pre-bundled embedding model. This is the recommended image for any environment
with restricted outbound internet access.
Compliance Notes
FedRAMP / NIST 800-53
This deployment posture supports the following NIST 800-53 controls:
| Control | Description | How Isartor Supports It |
|---|---|---|
| AU-2 | Audit Logging | Every prompt, deflection decision, and L3 call is logged as a structured JSON event with tracing spans. |
| SC-7 | Boundary Protection | ISARTOR__OFFLINE_MODE=true enforces a hard block on all outbound connections. The phone-home audit CI test verifies this. |
| SI-4 | Information System Monitoring | OpenTelemetry traces + Prometheus metrics provide real-time visibility into the deflection stack. Internal-only OTel endpoints are supported. |
| CM-6 | Configuration Settings | All settings are controlled via environment variables with documented defaults. No runtime code changes are needed. |
HIPAA
When ISARTOR__OFFLINE_MODE=true and L3 is pointed at an internal model:
- PHI in prompts never leaves the network perimeter.
- The L1b semantic cache computes embeddings in-process using a pure-Rust
candlemodel — no external API calls. - Audit logs are written to stdout for ingestion by your internal SIEM.
Disclaimer
This document describes deployment architecture. The controls described above are architectural claims based on code behaviour — they are not a formal compliance certification. Consult your compliance team and engage a qualified assessor for formal FedRAMP authorization or HIPAA compliance review.