Skip to Content

Data Flow and Encryption

Where sensitive data lands during a Query lifecycle and what protection each surface has by default.

Ark runs inside your Kubernetes cluster. Encryption at rest and many in-transit protections depend on your cluster and cloud configuration, not on Ark defaults. Review every row in the tables below against your deployment before requesting InfoSec sign-off.

Query lifecycle data surfaces

A single Query execution touches multiple storage and communication surfaces. The diagram below shows the path from submission to response, with numbered callouts for each surface described in the tables.

┌────────────────────┐ │ Client / Dashboard │ └─────────┬──────────┘ │ HTTPS ┌─────────▼──────────┐ │ ark-api │ └─────────┬──────────┘ ┌──────────────────┼──────────────────┐ │ │ │ ┌───────▼────────┐ ┌──────▼───────┐ ┌─────────▼────────┐ │ Query CR │ │ Broker │ │ OTel collector │ │ etcd / PG │ │ mem store │ │ │ └───────┬────────┘ └──────────────┘ └──────────────────┘ ┌───────▼─────────┐ │ Controller │ └───────┬─────────┘ │ A2A HTTP ┌───────▼─────────┐ │ Executor │──── LLM Provider ── HTTPS │ (built-in or │──── MCP Servers ── HTTP(S) │ external) │──── Session files └─────────────────┘

Encryption at rest

SurfaceWhat lands thereEncrypted by default?How to enable
etcd (default storage)All CRs: Query inputs/outputs, Agent prompts, Model secret refs, Tool configsNo. etcd stores data unencrypted unless the cluster enables EncryptionConfiguration.Configure kube-apiserver --encryption-provider-config with an aesgcm or secretbox provider. See K8s docs .
PostgreSQL (optional storage)Same as etcd — full spec and status JSONB columns in the resources tableNo.Enable Transparent Data Encryption (TDE) or use an encrypted volume for the PG data directory.
Kubernetes SecretsAPI keys, credentials referenced by Models and ToolsNo — Secrets are base64-encoded, not encrypted, in etcd.Same etcd encryption config as above. Consider an external secrets operator (Vault, AWS Secrets Manager, GCP Secret Manager).
Broker memory storeConversation history, streaming chunks, OTel spans, controller eventsIn-memory by default (ephemeral). If persistence.enabled: true, stored as plaintext JSON at /data/.Use an encrypted PersistentVolume or mount an encrypted filesystem.
Broker messages (Postgres)Message bodies, when backends.message: postgres is enabledNo. Plaintext in the messages table.Enable Transparent Data Encryption (TDE) or use an encrypted volume for the PG data directory.
Executor session filesConversation state, tool outputs (Claude Agent SDK: /data/sessions/<conversationId>/)No. Plaintext on pod filesystem.Use an encrypted PersistentVolume. Restrict access via Pod Security Standards.
OTel collectorSpan attributes: session IDs, query/target names, dispatch addresses, error messages, token countsDepends on your collector and its backend.Configure your collector’s exporter with encryption (e.g., encrypted Elasticsearch, encrypted S3).
Pod logsQuery/target names, error messages, dispatch addresses. Input/output truncated to 48 chars at info level; verbose mode (-v 1) adds session details.Depends on your log aggregator.Ensure your log pipeline encrypts at rest. Review verbose logging before enabling in production.

In-transit encryption

HopProtocolTLS by default?Notes
Client → ark-apiHTTPSDepends on your Ingress / load balancer.Configure TLS termination at the Ingress or use cert-manager.
ark-api → kube-apiserverHTTPSYes. Kubernetes API server requires TLS.Managed by the cluster.
Controller → executor (A2A)HTTPNo. Uses Go http.DefaultTransport.Deploy a service mesh (Istio, Linkerd) for automatic mTLS between pods, or configure the executor behind a TLS-terminating sidecar.
Executor → LLM providerHTTPSYes. Ark’s validating webhook rejects non-HTTPS model base URLs.Exception: *.svc.cluster.local addresses are allowed over HTTP for in-cluster mock services.
Executor → MCP serversHTTP(S)No enforcement. Depends on the MCP server address scheme.Use HTTPS addresses for MCP servers that handle sensitive data. No webhook validation exists for MCP URLs today.
Controller / API → brokerHTTPNo.Service mesh mTLS or a TLS-terminating sidecar.
Controller → OTel collectorHTTPNo. Uses the URL in OTEL_EXPORTER_OTLP_ENDPOINT.Set an https:// endpoint URL. Per-namespace endpoint discovery is available via the otel-environment-variables Secret.
Controller → PostgreSQLTCPNo. sslMode: disable is the default.Set storage.postgresql.sslMode to require or verify-full in Helm values.
Broker → PostgreSQLTCPNo. sslmode=disable is used in local dev.Set sslmode=require (or stricter) in the broker’s DATABASE_URL. For verify-full with certificate mounting, see Securing the Postgres connection in the broker service docs.
Metrics scrapeHTTPSYes. Controller metrics on port 8443 are TLS-protected via cert-manager.ServiceMonitor uses insecureSkipVerify: false by default.

Log sanitization

Ark redacts sensitive data from logs in several components:

  • ark-api: SensitiveDataFilter redacts access_token, refresh_token, client_secret, code_verifier, and authorization values from MCP auth logs.
  • ark-broker: Pino logger auto-redacts authorization, cookie, x-api-key, x-auth-token, x-csrf-token, set-cookie, proxy-authorization headers and fields like password, token, secret, apiKey.
  • Controller: Query input/output is truncated to 48 characters in operation data. Full payloads are not logged at info level.

Log redaction does not cover query inputs, agent prompts, or tool call payloads. If your workload processes PII or regulated data, configure your log aggregator to filter or mask these fields before long-term storage.

Data retention and disposal

Ark does not manage data lifecycle. Retention periods, automated purging, and secure disposal are deployment-level concerns configured through your Kubernetes cluster, storage layer, and external backends.

SurfaceWhat Ark doesWhat you configure
etcd / PostgreSQL (Query CRs)Query resources persist indefinitely. Completed Queries are not automatically deleted.Set up a CronJob or operator policy to garbage-collect completed Query CRs after your retention window. For PostgreSQL, configure row-level retention or partition management.
Kubernetes SecretsSecrets persist until explicitly deleted.Rotate secrets on a schedule. If using an external secrets operator, configure its TTL and rotation policies.
Broker memory storeIn-memory data is lost on pod restart. If persistence is enabled, JSON files grow until the configured limits (maxMessages, maxChunks, etc.) are reached.Size the limits to your retention needs. For persistent storage, schedule cleanup of the data volume or rely on pod restarts to flush in-memory state.
Broker messages (Postgres)Messages expire via a TTL snapshot (expires_at) and are filtered out on read. The TTL aligns with the originating query’s TTL (spec.ttl); the default MESSAGE_VISIBILITY_TTL_SECONDS is 30 days.Tune the query TTL (ArkConfig.spec.queryTTL) or MESSAGE_VISIBILITY_TTL_SECONDS. Soft-deleted rows remain until purged — schedule a job to hard-delete expired rows if you need to reclaim space.
Executor session filesSession directories accumulate on the PVC. No built-in TTL or cleanup.Schedule a CronJob to prune /data/sessions/ directories older than your retention period. In scheduler mode, set shutdownPolicy: Delete and sessionIdleTTL to bound sandbox lifetime.
OTel tracesArk exports spans but does not control downstream storage.Configure retention and deletion policies in your collector backend (e.g., Elasticsearch ILM, S3 lifecycle rules).
Pod logsArk writes to stdout/stderr. Retention depends on the node and log aggregator.Configure your log pipeline’s retention and rotation policies.

Ark does not perform secure erasure (zero-fill, cryptographic wipe) on any surface. If your compliance framework requires secure disposal, use encrypted volumes — deleting the encryption key is the standard approach for certified media sanitization on shared storage.

Data loss prevention

Ark does not inspect, filter, or classify data content at any point in the Query lifecycle. Prompts, tool inputs, agent outputs, and MCP payloads flow through the platform without content-level scanning.

DLP controls are a deployment responsibility. Common integration points:

  • Egress proxy — Route LLM provider traffic through a DLP-aware proxy that inspects request/response bodies before they leave the cluster. The Model CRD baseUrl field can point to the proxy.
  • LLM provider policies — Most providers offer content filtering, usage policies, and data retention controls at the API level. Configure these in your provider account.
  • Admission webhooks — A Kubernetes validating webhook can inspect Query CRD .spec.input at creation time to block or flag content that matches DLP rules.
  • MCP server layer — MCP servers can implement input validation and output filtering before returning tool results to the executor.
  • Log aggregator — Configure field-level masking in your log pipeline to prevent PII from reaching long-term storage.
Last updated on