Deployment & Scaling
Octoprox runs as a single instance for development or as a horizontally-scaled cluster fronted by an L4 load balancer. The same binary serves both — you only change the topology.
Two Deployment Shapes
Single instance (default)
One Octoprox process, one Postgres, one Redis. Fine for development, demos, and small production workloads (one host, tens of thousands of proxies, a few hundred concurrent tunnels).
client ──┐
▼
┌────────────────────┐ ┌──────────┐
│ Octoprox │ ──────▶ │ Postgres │
│ :8000 API + UI │ │ Redis │
│ :8080 proxy port │ └──────────┘
└────────────────────┘
│
▼
upstream proxy pool
Compose files: docker-compose.yml (build from source) or docker-compose.ghcr.yml (pre-built image).
Multi-instance cluster (HA + horizontal scaling)
Multiple identical Octoprox processes behind an L4 load balancer (HAProxy in the bundled compose; any TCP/HTTP-capable LB works). All instances share one Postgres and one Redis.
client ──▶ ┌────────────┐ ┌── Octoprox-1 ──┐
│ HAProxy │ │ :8000 :8080 │ ──┐
│ :8000 │ ───▶├── Octoprox-2 ──┤ │
│ :8080 │ │ :8000 :8080 │ ──┼──▶ Postgres
│ :8404 UI │ ├── Octoprox-3 ──┤ │ Redis
└────────────┘ │ :8000 :8080 │ ──┘
└────────────────┘
│
▼
upstream proxy pool
Compose files: docker-compose.cluster.yml (build from source) or docker-compose.cluster.ghcr.yml (pre-built image). HAProxy config: haproxy/haproxy.cfg.
# Local-build cluster (Makefile targets, fast iteration)
make cluster-up
make cluster-logs
make cluster-down
# Production-ready cluster (pre-built GHCR image) — invoke docker compose directly
docker compose -f docker-compose.cluster.ghcr.yml up -d
docker compose -f docker-compose.cluster.ghcr.yml logs -f
docker compose -f docker-compose.cluster.ghcr.yml down
What you gain by running N instances
- High availability. Any instance can die — clients keep being served by the others. Background workers (metrics flusher, autoscaler, etc.) fail over to a surviving instance within ~5 seconds.
- Request throughput. Tunnel termination, TLS/MITM relay, credential resolution, and routing decisions all run per-request on the receiving instance — N instances ≈ N× concurrent connections handled in parallel.
- Aggregate bandwidth + file descriptors. Each host contributes its own
NIC and
ulimit. - Sharded health-check capacity. Each proxy is checked by exactly one
instance at a time (rendezvous-hashed by
proxy_idacross the live membership). Adding instances divides the workload. - Zero-downtime deploys. Rolling-restart one instance at a time.
What does not scale by adding instances
- In-memory cache. Every instance still holds the full projects/credentials/connectors/proxies cache in RAM. 5 instances = 5× the same memory footprint. (See the TODO-control-data-plane-split.md plan for the future tier-split that fixes this.)
- Postgres write throughput. Definitions and historical metrics still live in one DB; the metrics flusher is leader-elected so only one instance writes at a time.
- Redis throughput. All instances hit the same Redis for sticky sessions, rate-limit windows, quarantine state, metrics counters, heartbeats, and leases.
- Cloud-provider API quotas. Per-connector lease means only one instance calls AWS/GCP/Azure for a given connector at a time — intentional, so you don’t get throttled by the cloud.
Rule of thumb: a cluster scales concurrent request handling and gives you HA. To scale beyond what a single Redis or a single host’s memory can take, see TODO-control-data-plane-split.md for the planned tier split.
How it works under the hood
A few small mechanisms keep N instances in sync without a coordinator.
Heartbeat (instance discovery)
Every instance writes a Redis key instance_registry:<instance_id> with a
10-second TTL, refreshed every 5 seconds. On graceful shutdown it deletes
its key; on hard kill, Redis expires it. The set of live keys is the
membership snapshot used by everything below.
Cross-instance event bus
Mutations on one instance reach the others over Redis Pub/Sub on the
octoprox:events channel. Messages carry only
(signal_name, instance_id, entity_id, op) — receivers re-read the entity
from Postgres or Redis and update their cache. The instance that
published drops its own echo so no infinite loops.
Cross-instance signals: project_changed, credential_changed,
connector_changed, proxy_changed, proxy_quarantine_changed.
A 60-second full-reload from Postgres runs in the background as a safety net for any messages dropped by Redis Pub/Sub (which is fire-and-forget).
Leader election for singleton workers
Some background work must run on exactly one instance at a time. Octoprox
uses Redis leases (SET NX PX with refresh + owner-checked release) for
this. If the leader dies, its lease expires within ~5 seconds and a
standby takes over on its next poll.
| Worker | Scope | Why leader-elected |
|---|---|---|
| Metrics flusher | Global | Two writers would double-count Postgres rows |
| Metrics compactor | Global | Compaction races on the same source rows |
| Autoscaler | Per-connector | Two scalers would double-provision cloud VMs |
| Provider syncer | Per-connector | Two syncers would call provider APIs twice |
Per-connector leases mean different connectors can be served by different instances in parallel — only the same connector is single-writer.
Sharded health checks
Health-checking is the highest-volume background task. Each proxy is
assigned to exactly one instance at a time via rendezvous hashing (HRW)
over the live instance_registry:* membership. When an instance joins
or leaves, only ~1/N of proxies move owner; the rest stay put.
Per-request side effects
Sticky-session bindings live in Redis (key
sticky:<project_id>:<session_id>) and are read-through on every
selection, so a session opened on one instance keeps its upstream proxy
even if the next request lands on a different one. The rate-limiter
sliding window lives in a Redis sorted set written atomically via a Lua
script, so N instances see one combined request rate per proxy.
Quarantine is a TTL’d Redis key, with a proxy_quarantine_changed Pub/Sub
event so peers refresh their local quarantine cache the moment one
instance trips a limit.
Operating the cluster
Endpoints
| Port | Served by | What |
|---|---|---|
| 8000 | HAProxy | API + Web UI (HTTP, round-robin across replicas) |
| 8080 | HAProxy | Proxy traffic (TCP, least-conn across replicas) |
| 8404 | HAProxy | HAProxy stats UI |
Both 8000 and 8080 use the same HTTP /health probe (on port 8000) to
decide whether a backend is fit. A container with a crashed API server
gets pulled out of the proxy-traffic backend automatically.
Inspecting cluster state
# Live membership
docker compose -f docker-compose.cluster.yml exec redis \
redis-cli KEYS 'instance_registry:*'
# Active leases (and who holds them)
docker compose -f docker-compose.cluster.yml exec redis \
redis-cli --scan --pattern 'lease:*' | while read k; do
echo "$k held by $(docker compose -f docker-compose.cluster.yml exec -T redis redis-cli GET "$k")"
done
# HAProxy backend status (CSV)
curl -s 'http://localhost:8404/;csv' | awk -F, '/^api_http|^proxy_tcp/{print $1"/"$2,$18}'
Failover smoke test
# Note who's holding the metrics flusher lease
docker compose -f docker-compose.cluster.yml exec redis \
redis-cli GET lease:metrics_flusher
# Kill that instance
docker stop octoprox-1 # (or whichever holds it)
# Within ~5s a different instance owns the lease
docker compose -f docker-compose.cluster.yml exec redis \
redis-cli GET lease:metrics_flusher
Production checklist
Before pointing real traffic at the cluster, edit either compose file:
- Set
OCTOPROX_AUTH_PASSWORDto something strong (notadmin). - Set
OCTOPROX_JWT_SECRETto a long random string. - Set
OCTOPROX_DB_PASSWORD(and the matchingPOSTGRES_PASSWORD). - Decide whether to expose Postgres (5433) and Redis (6379) on the host — for an internet-facing host you almost certainly want to remove those port mappings and keep them on the internal Docker network only.
- Mount the MITM CA from durable storage (or a Secret manager) so all instances trust the same root and CA rotations propagate cleanly.
When to outgrow this
A fleet of identical instances sharing one Redis + Postgres scales request handling and gives you HA. It hits a ceiling when one of the shared resources saturates — typically Redis throughput at a few tens of thousands of proxied requests per second, or per-host memory when the proxy pool grows past ~100k entries.
The next step is a tiered topology that splits the control plane (CRUD, config) from the data plane (request termination) and replaces the “every instance loads everything” cache with a snapshot pushed from the control plane. The design is documented in TODO-control-data-plane-split.md in the repo — read that when you genuinely need it, not before.