Models
Models define AI language model configurations for agents to use. Agents use the model named default if no specific model is configured.
OpenAI
# Example OpenAI model.
apiVersion: ark.mckinsey.com/v1alpha1
kind: Model
metadata:
name: default
spec:
# The 'completion' type is for text completion models.
# The 'openai' provider supports any OpenAI specification compatible model,
# including OpenAI, Google Gemini (in OpenAI compatibility mode), Anthropic
# Claude and so on.
provider: openai
type: completion
model:
# The specific model type.
value: gpt-4o
config:
openai:
# API endpoint URL
baseUrl:
value: "https://api.openai.com/v1"
# API authentication key - this should be set to a Kubernetes Secret
# for security purposes.
apiKey:
valueFrom:
secretKeyRef:
name: default-model-token
key: token
# Optional model generation parameters
properties:
temperature:
value: "0.7"
max_tokens:
value: "4096"
---
# Example of a secret that can be used to configure the API key for a model.
apiVersion: v1
kind: Secret
metadata:
name: default-model-token
type: Opaque
stringData:
token: "your-api-key-here"An API key secret can also be created like so:
kubectl create secret generic default-model-token --from-literal=token="your-api-key-here"Azure OpenAI
Azure models support three authentication methods via config.azure.auth: API Key (default), Managed Identity (AKS node identity), and Workload Identity (K8s ServiceAccount federated to Azure). Use exactly one.
API Key (legacy or explicit):
apiVersion: ark.mckinsey.com/v1alpha1
kind: Model
metadata:
name: gpt-4o-mini
spec:
provider: azure
type: completion
model:
value: gpt-4o-mini
config:
azure:
baseUrl:
value: "https://your-resource.openai.azure.com"
apiKey:
valueFrom:
secretKeyRef:
name: azure-openai-key
key: token
apiVersion:
value: "2024-12-01-preview"Managed Identity (AKS): Use when Ark runs on AKS and the cluster or node pool has a User-Assigned Managed Identity with access to the Azure OpenAI resource. No API key is stored on the cluster.
apiVersion: ark.mckinsey.com/v1alpha1
kind: Model
metadata:
name: gpt-4o-managed-identity
spec:
provider: azure
model:
value: gpt-4o
config:
azure:
baseUrl:
value: "https://your-resource.openai.azure.com"
apiVersion:
value: "2024-02-15-preview"
auth:
managedIdentity: {}
# Or with user-assigned identity:
# managedIdentity:
# clientId:
# value: "12345678-1234-1234-1234-123456789abc"Workload Identity: Use when running on any Kubernetes cluster (including non-Azure) with Azure Workload Identity configured. The pod’s ServiceAccount is federated to an Azure Managed Identity.
apiVersion: ark.mckinsey.com/v1alpha1
kind: Model
metadata:
name: gpt-4o-workload-identity
spec:
provider: azure
model:
value: gpt-4o
config:
azure:
baseUrl:
value: "https://your-resource.openai.azure.com"
apiVersion:
value: "2024-02-15-preview"
auth:
workloadIdentity:
clientId:
value: "12345678-1234-1234-1234-123456789abc"
tenantId:
value: "87654321-4321-4321-4321-210987654321"Testing Azure auth: (1) API Key: create a Secret with your key, apply a model using config.azure.auth.apiKey (or legacy top-level apiKey), then run a query. (2) Managed Identity: on AKS with managed identity enabled, apply a model with auth.managedIdentity and ensure the identity has “Cognitive Services User” on the Azure OpenAI resource. (3) Workload Identity: configure federation (e.g. Azure AD Workload Identity), apply a model with auth.workloadIdentity and the same clientId/tenantId, then run a query from a pod using the federated ServiceAccount.
AWS Bedrock
apiVersion: ark.mckinsey.com/v1alpha1
kind: Model
metadata:
name: claude-haiku
spec:
# The AWS Bedrock provider with completion model type.
provider: bedrock
type: completion
model:
value: "us.anthropic.claude-3-5-haiku-20241022-v1:0"
config:
bedrock:
# AWS region (optional, uses default)
region:
value: "us-west-2"
# Base URL - optional only needed if the non-default is required.
baseUrl:
value: "https://aws-bedrock.prod.ai-gateway.quantumblack.com/your-project-id"
# Explicit credentials (optional, defaults to IAM role)
accessKeyId:
valueFrom:
secretKeyRef:
name: aws-credentials
key: access-key-id
secretAccessKey:
valueFrom:
secretKeyRef:
name: aws-credentials
key: secret-access-key
# Session token for temporary credentials or JWT tokens
sessionToken:
valueFrom:
secretKeyRef:
name: aws-credentials
key: session-token
# Custom model ARN (optional)
modelArn:
value: "arn:aws:bedrock:..."
properties:
temperature:
value: "0.7"
max_tokens:
value: "4096"Google Gemini and Anthropic Models
Both Google Gemini and Anthropic provide OpenAI-compatible endpoints, allowing you to use their models with the openai provider and completion type. The base URLs are:
https://generativelanguage.googleapis.com/v1beta/openaifor Google Geminihttps://api.anthropic.com/v1for Anthropic Claude
Most other providers also support OpenAI compatible base URLs - check their docs for details.
Model Properties
All model providers support a flexible properties system that allows you to customize model behavior by setting parameters like temperature, max tokens, and other OpenAI ChatCompletion parameters.
Basic Properties Example
apiVersion: ark.mckinsey.com/v1alpha1
kind: Model
metadata:
name: gpt-4-custom
spec:
provider: openai
type: completion
model:
value: gpt-4o
config:
openai:
properties:
temperature:
value: "0.1"
max_tokens:
value: "1000"
baseUrl:
value: "https://api.openai.com/v1"
apiKey:
valueFrom:
secretKeyRef:
name: openai-secret
key: tokenAny OpenAI ChatCompletion parameters can be provided through the properties system, including temperature, max_tokens, top_p, frequency_penalty, presence_penalty, stop, seed, and more.
Custom HTTP Headers
OpenAI and Azure models support custom HTTP headers for advanced authentication and routing scenarios. Headers can be specified with direct values or loaded from Kubernetes Secrets and ConfigMaps.
Supported Providers:
- OpenAI
- Azure OpenAI
Basic Headers Example
apiVersion: ark.mckinsey.com/v1alpha1
kind: Model
metadata:
name: default
spec:
provider: azure
type: completion
model:
value: gpt-4o
config:
azure:
baseUrl:
value: "https://your-resource.openai.azure.com"
apiKey:
valueFrom:
secretKeyRef:
name: azure-openai-key
key: token
apiVersion:
value: "2024-12-01-preview"
# Custom HTTP headers sent with every request
headers:
- name: X-Custom-Header
value:
value: "direct-header-value"
- name: X-Request-ID
value:
value: "my-app-v1"Headers from Secrets and ConfigMaps
Load sensitive header values from Kubernetes Secrets or configuration from ConfigMaps:
apiVersion: v1
kind: Secret
metadata:
name: gateway-credentials
type: Opaque
stringData:
api-key: "your-gateway-api-key"
---
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
data:
user-agent: "MyApp/1.0"
---
apiVersion: ark.mckinsey.com/v1alpha1
kind: Model
metadata:
name: default
spec:
provider: azure
type: completion
model:
value: gpt-4o
config:
azure:
baseUrl:
value: "https://your-resource.openai.azure.com"
apiKey:
valueFrom:
secretKeyRef:
name: azure-openai-key
key: token
apiVersion:
value: "2024-12-01-preview"
headers:
# Load from Secret
- name: X-API-Gateway-Key
value:
valueFrom:
secretKeyRef:
name: gateway-credentials
key: api-key
# Load from ConfigMap
- name: User-Agent
value:
valueFrom:
configMapKeyRef:
name: app-config
key: user-agent
# Direct value
- name: X-Client-ID
value:
value: "production-client"OpenAI Provider Headers
Headers work the same way with OpenAI provider:
apiVersion: ark.mckinsey.com/v1alpha1
kind: Model
metadata:
name: openai-with-headers
spec:
provider: openai
type: completion
model:
value: gpt-4o
config:
openai:
baseUrl:
value: "https://api.openai.com/v1"
apiKey:
valueFrom:
secretKeyRef:
name: openai-secret
key: token
headers:
- name: X-Custom-Header
value:
value: "my-value"Status and Health Checking
ARK continuously monitors model availability through periodic health checks. The model controller probes each model at regular intervals to ensure it remains accessible and functional.
Health Check Configuration
The pollInterval field controls how often the model is probed:
spec:
pollInterval: 1m # Default: 1 minuteStatus Conditions
Model status is tracked using Kubernetes conditions pattern. The primary condition is ModelAvailable:
status:
conditions:
- type: ModelAvailable
status: "True" # True/False/Unknown
reason: "Available" # Short reason for the condition
message: "Model is available and probed successfully"
lastTransitionTime: "2024-01-15T10:30:00Z"Condition States:
- ModelAvailable: True - Model successfully responds to test prompts
- ModelAvailable: False - Model probe failed (network error, authentication issue, etc.)
- ModelAvailable: Unknown - Initial state before first probe completes
Viewing Model Status
Check model availability using kubectl:
# List models with availability status
kubectl get models
NAME TYPE MODEL AVAILABLE AGE
gpt-4-model azure gpt-4.1-mini True 5m
claude-model bedrock claude-3-sonnet-v1 False 3m
# Get detailed status
kubectl describe model gpt-4-modelThe AVAILABLE column shows the current state of the ModelAvailable condition, making it easy to identify models that may have connectivity or configuration issues.
Agent Model Configuration
Agents can specify which model to use. If no model is specified, the default model is used. If an agent references a model that doesn’t exist, the agent will remain in pending state. The modelRef parameter is used to specify the model name:
apiVersion: ark.mckinsey.com/v1alpha1
kind: Agent
metadata:
name: weather-agent
spec:
prompt: "You are a helpful weather assistant"
# Explicitly set the model to use
modelRef:
# Specify the model name. If no modelRef is provided then 'default' is used.
name: gpt-4o-mini