Models
Models define AI language model configurations for agents to use. Agents use the model named default
if no specific model is configured.
OpenAI
# Example OpenAI model.
apiVersion: ark.mckinsey.com/v1alpha1
kind: Model
metadata:
name: default
spec:
# The 'openai' type is any openai specification compatible model. This
# includes OpenAI, Google Gemini (in OpenAI compability mode), Anthropic
# Claude and so on.
type: openai
model:
# The specific model type.
value: gpt-4o
config:
openai:
# API endpoint URL
baseUrl:
value: "https://api.openai.com/v1"
# API authentication key - this should be set to a Kubernetes Secret
# for security purposes.
apiKey:
valueFrom:
secretKeyRef:
name: default-model-token
key: token
# Optional model generation parameters
properties:
temperature:
value: "0.7"
max_tokens:
value: "4096"
---
# Example of a secret that can be used to configure the API key for a model.
apiVersion: v1
kind: Secret
metadata:
name: default-model-token
type: Opaque
stringData:
token: "your-api-key-here"
An API key secret can also be created like so:
kubectl create secret generic default-model-token --from-literal=token="your-api-key-here"
Azure OpenAI
apiVersion: ark.mckinsey.com/v1alpha1
kind: Model
metadata:
name: gpt-4o-mini
spec:
# The Azure OpenAI model type.
type: azure
model:
value: gpt-4o-mini
config:
azure:
baseUrl:
value: "https://your-resource.openai.azure.com"
apiKey:
valueFrom:
secretKeyRef:
name: azure-openai-key
key: token
# Azure-specific API version
apiVersion:
value: "2024-12-01-preview"
# Optional properties.
properties:
temperature:
value: "0.7"
max_tokens:
value: "4096"
AWS Bedrock
apiVersion: ark.mckinsey.com/v1alpha1
kind: Model
metadata:
name: claude-haiku
spec:
# The AWS Bedrock model type.
type: bedrock
model:
value: "us.anthropic.claude-3-5-haiku-20241022-v1:0"
config:
bedrock:
# AWS region (optional, uses default)
region:
value: "us-west-2"
# Base URL - optional only needed if the non-default is required.
baseUrl:
value: "https://aws-bedrock.prod.ai-gateway.quantumblack.com/your-project-id"
# Explicit credentials (optional, defaults to IAM role)
accessKeyId:
valueFrom:
secretKeyRef:
name: aws-credentials
key: access-key-id
secretAccessKey:
valueFrom:
secretKeyRef:
name: aws-credentials
key: secret-access-key
# Session token for temporary credentials or JWT tokens
sessionToken:
valueFrom:
secretKeyRef:
name: aws-credentials
key: session-token
# Custom model ARN (optional)
modelArn:
value: "arn:aws:bedrock:..."
properties:
temperature:
value: "0.7"
max_tokens:
value: "4096"
Google Gemini and Anthropic Models
Both Google Gemini and Anthropic provide OpenAI-compatible endpoints, allowing you to use their models with the openai
type. The base URls are:
https://generativelanguage.googleapis.com/v1beta/openai
for Google Geminihttps://api.anthropic.com/v1
for Anthropic Claude
Most other providers also support OpenAI compatible base URLs - check their docs for details.
Model Properties
All model providers support a flexible properties system that allows you to customize model behavior by setting parameters like temperature, max tokens, and other OpenAI ChatCompletion parameters.
Basic Properties Example
apiVersion: ark.mckinsey.com/v1alpha1
kind: Model
metadata:
name: gpt-4-custom
spec:
type: openai
model:
value: gpt-4o
config:
openai:
properties:
temperature:
value: "0.1"
max_tokens:
value: "1000"
baseUrl:
value: "https://api.openai.com/v1"
apiKey:
valueFrom:
secretKeyRef:
name: openai-secret
key: token
Any OpenAI ChatCompletion parameters can be provided through the properties system, including temperature
, max_tokens
, top_p
, frequency_penalty
, presence_penalty
, stop
, seed
, and more.
Custom HTTP Headers
OpenAI and Azure models support custom HTTP headers for advanced authentication and routing scenarios. Headers can be specified with direct values or loaded from Kubernetes Secrets and ConfigMaps.
Supported Providers:
- OpenAI
- Azure OpenAI
Basic Headers Example
apiVersion: ark.mckinsey.com/v1alpha1
kind: Model
metadata:
name: default
spec:
type: azure
model:
value: gpt-4o
config:
azure:
baseUrl:
value: "https://your-resource.openai.azure.com"
apiKey:
valueFrom:
secretKeyRef:
name: azure-openai-key
key: token
apiVersion:
value: "2024-12-01-preview"
# Custom HTTP headers sent with every request
headers:
- name: X-Custom-Header
value:
value: "direct-header-value"
- name: X-Request-ID
value:
value: "my-app-v1"
Headers from Secrets and ConfigMaps
Load sensitive header values from Kubernetes Secrets or configuration from ConfigMaps:
apiVersion: v1
kind: Secret
metadata:
name: gateway-credentials
type: Opaque
stringData:
api-key: "your-gateway-api-key"
---
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
data:
user-agent: "MyApp/1.0"
---
apiVersion: ark.mckinsey.com/v1alpha1
kind: Model
metadata:
name: default
spec:
type: azure
model:
value: gpt-4o
config:
azure:
baseUrl:
value: "https://your-resource.openai.azure.com"
apiKey:
valueFrom:
secretKeyRef:
name: azure-openai-key
key: token
apiVersion:
value: "2024-12-01-preview"
headers:
# Load from Secret
- name: X-API-Gateway-Key
value:
valueFrom:
secretKeyRef:
name: gateway-credentials
key: api-key
# Load from ConfigMap
- name: User-Agent
value:
valueFrom:
configMapKeyRef:
name: app-config
key: user-agent
# Direct value
- name: X-Client-ID
value:
value: "production-client"
OpenAI Provider Headers
Headers work the same way with OpenAI provider:
apiVersion: ark.mckinsey.com/v1alpha1
kind: Model
metadata:
name: openai-with-headers
spec:
type: openai
model:
value: gpt-4o
config:
openai:
baseUrl:
value: "https://api.openai.com/v1"
apiKey:
valueFrom:
secretKeyRef:
name: openai-secret
key: token
headers:
- name: X-Custom-Header
value:
value: "my-value"
Status and Health Checking
ARK continuously monitors model availability through periodic health checks. The model controller probes each model at regular intervals to ensure it remains accessible and functional.
Health Check Configuration
The pollInterval
field controls how often the model is probed:
spec:
pollInterval: 1m # Default: 1 minute
Status Conditions
Model status is tracked using Kubernetes conditions pattern. The primary condition is ModelAvailable
:
status:
conditions:
- type: ModelAvailable
status: "True" # True/False/Unknown
reason: "Available" # Short reason for the condition
message: "Model is available and probed successfully"
lastTransitionTime: "2024-01-15T10:30:00Z"
Condition States:
- ModelAvailable: True - Model successfully responds to test prompts
- ModelAvailable: False - Model probe failed (network error, authentication issue, etc.)
- ModelAvailable: Unknown - Initial state before first probe completes
Viewing Model Status
Check model availability using kubectl:
# List models with availability status
kubectl get models
NAME TYPE MODEL AVAILABLE AGE
gpt-4-model azure gpt-4.1-mini True 5m
claude-model bedrock claude-3-sonnet-v1 False 3m
# Get detailed status
kubectl describe model gpt-4-model
The AVAILABLE
column shows the current state of the ModelAvailable
condition, making it easy to identify models that may have connectivity or configuration issues.
Agent Model Configuration
Agents can specify which model to use. If no model is specified, the default
model is used. If an agent references a model that doesn’t exist, the agent will remain in pending
state. The modelRef
parameter is used to specify the model name:
apiVersion: ark.mckinsey.com/v1alpha1
kind: Agent
metadata:
name: weather-agent
spec:
prompt: "You are a helpful weather assistant"
# Explicitly set the model to use
modelRef:
# Specify the model name. If no modelRef is provided then 'default' is used.
name: gpt-4o-mini