Skip to Content

Models

Models define AI language model configurations for agents to use. Agents use the model named default if no specific model is configured.

OpenAI

# Example OpenAI model. apiVersion: ark.mckinsey.com/v1alpha1 kind: Model metadata: name: default spec: # The 'completion' type is for text completion models. # The 'openai' provider supports any OpenAI specification compatible model, # including OpenAI, Google Gemini (in OpenAI compatibility mode), Anthropic # Claude and so on. provider: openai type: completion model: # The specific model type. value: gpt-4o config: openai: # API endpoint URL baseUrl: value: "https://api.openai.com/v1" # API authentication key - this should be set to a Kubernetes Secret # for security purposes. apiKey: valueFrom: secretKeyRef: name: default-model-token key: token # Optional model generation parameters properties: temperature: value: "0.7" max_tokens: value: "4096" --- # Example of a secret that can be used to configure the API key for a model. apiVersion: v1 kind: Secret metadata: name: default-model-token type: Opaque stringData: token: "your-api-key-here"

An API key secret can also be created like so:

kubectl create secret generic default-model-token --from-literal=token="your-api-key-here"

Azure OpenAI

Azure models support three authentication methods via config.azure.auth: API Key (default), Managed Identity (AKS node identity), and Workload Identity (K8s ServiceAccount federated to Azure). Use exactly one.

API Key (legacy or explicit):

apiVersion: ark.mckinsey.com/v1alpha1 kind: Model metadata: name: gpt-4o-mini spec: provider: azure type: completion model: value: gpt-4o-mini config: azure: baseUrl: value: "https://your-resource.openai.azure.com" apiKey: valueFrom: secretKeyRef: name: azure-openai-key key: token apiVersion: value: "2024-12-01-preview"

Managed Identity (AKS): Use when Ark runs on AKS and the cluster or node pool has a User-Assigned Managed Identity with access to the Azure OpenAI resource. No API key is stored on the cluster.

apiVersion: ark.mckinsey.com/v1alpha1 kind: Model metadata: name: gpt-4o-managed-identity spec: provider: azure model: value: gpt-4o config: azure: baseUrl: value: "https://your-resource.openai.azure.com" apiVersion: value: "2024-02-15-preview" auth: managedIdentity: {} # Or with user-assigned identity: # managedIdentity: # clientId: # value: "12345678-1234-1234-1234-123456789abc"

Workload Identity: Use when running on any Kubernetes cluster (including non-Azure) with Azure Workload Identity configured. The pod’s ServiceAccount is federated to an Azure Managed Identity.

apiVersion: ark.mckinsey.com/v1alpha1 kind: Model metadata: name: gpt-4o-workload-identity spec: provider: azure model: value: gpt-4o config: azure: baseUrl: value: "https://your-resource.openai.azure.com" apiVersion: value: "2024-02-15-preview" auth: workloadIdentity: clientId: value: "12345678-1234-1234-1234-123456789abc" tenantId: value: "87654321-4321-4321-4321-210987654321"

Testing Azure auth: (1) API Key: create a Secret with your key, apply a model using config.azure.auth.apiKey (or legacy top-level apiKey), then run a query. (2) Managed Identity: on AKS with managed identity enabled, apply a model with auth.managedIdentity and ensure the identity has “Cognitive Services User” on the Azure OpenAI resource. (3) Workload Identity: configure federation (e.g. Azure AD Workload Identity), apply a model with auth.workloadIdentity and the same clientId/tenantId, then run a query from a pod using the federated ServiceAccount.

AWS Bedrock

apiVersion: ark.mckinsey.com/v1alpha1 kind: Model metadata: name: claude-haiku spec: # The AWS Bedrock provider with completion model type. provider: bedrock type: completion model: value: "us.anthropic.claude-3-5-haiku-20241022-v1:0" config: bedrock: # AWS region (optional, uses default) region: value: "us-west-2" # Base URL - optional only needed if the non-default is required. baseUrl: value: "https://aws-bedrock.prod.ai-gateway.quantumblack.com/your-project-id" # Explicit credentials (optional, defaults to IAM role) accessKeyId: valueFrom: secretKeyRef: name: aws-credentials key: access-key-id secretAccessKey: valueFrom: secretKeyRef: name: aws-credentials key: secret-access-key # Session token for temporary credentials or JWT tokens sessionToken: valueFrom: secretKeyRef: name: aws-credentials key: session-token # Custom model ARN (optional) modelArn: value: "arn:aws:bedrock:..." properties: temperature: value: "0.7" max_tokens: value: "4096"

Google Gemini and Anthropic Models

Both Google Gemini and Anthropic provide OpenAI-compatible endpoints, allowing you to use their models with the openai provider and completion type. The base URLs are:

  • https://generativelanguage.googleapis.com/v1beta/openai for Google Gemini
  • https://api.anthropic.com/v1 for Anthropic Claude

Most other providers also support OpenAI compatible base URLs - check their docs for details.

Model Properties

All model providers support a flexible properties system that allows you to customize model behavior by setting parameters like temperature, max tokens, and other OpenAI ChatCompletion parameters.

Basic Properties Example

apiVersion: ark.mckinsey.com/v1alpha1 kind: Model metadata: name: gpt-4-custom spec: provider: openai type: completion model: value: gpt-4o config: openai: properties: temperature: value: "0.1" max_tokens: value: "1000" baseUrl: value: "https://api.openai.com/v1" apiKey: valueFrom: secretKeyRef: name: openai-secret key: token

Any OpenAI ChatCompletion parameters can be provided through the properties system, including temperature, max_tokens, top_p, frequency_penalty, presence_penalty, stop, seed, and more.

Custom HTTP Headers

OpenAI and Azure models support custom HTTP headers for advanced authentication and routing scenarios. Headers can be specified with direct values or loaded from Kubernetes Secrets and ConfigMaps.

Supported Providers:

  • OpenAI
  • Azure OpenAI

Basic Headers Example

apiVersion: ark.mckinsey.com/v1alpha1 kind: Model metadata: name: default spec: provider: azure type: completion model: value: gpt-4o config: azure: baseUrl: value: "https://your-resource.openai.azure.com" apiKey: valueFrom: secretKeyRef: name: azure-openai-key key: token apiVersion: value: "2024-12-01-preview" # Custom HTTP headers sent with every request headers: - name: X-Custom-Header value: value: "direct-header-value" - name: X-Request-ID value: value: "my-app-v1"

Headers from Secrets and ConfigMaps

Load sensitive header values from Kubernetes Secrets or configuration from ConfigMaps:

apiVersion: v1 kind: Secret metadata: name: gateway-credentials type: Opaque stringData: api-key: "your-gateway-api-key" --- apiVersion: v1 kind: ConfigMap metadata: name: app-config data: user-agent: "MyApp/1.0" --- apiVersion: ark.mckinsey.com/v1alpha1 kind: Model metadata: name: default spec: provider: azure type: completion model: value: gpt-4o config: azure: baseUrl: value: "https://your-resource.openai.azure.com" apiKey: valueFrom: secretKeyRef: name: azure-openai-key key: token apiVersion: value: "2024-12-01-preview" headers: # Load from Secret - name: X-API-Gateway-Key value: valueFrom: secretKeyRef: name: gateway-credentials key: api-key # Load from ConfigMap - name: User-Agent value: valueFrom: configMapKeyRef: name: app-config key: user-agent # Direct value - name: X-Client-ID value: value: "production-client"

OpenAI Provider Headers

Headers work the same way with OpenAI provider:

apiVersion: ark.mckinsey.com/v1alpha1 kind: Model metadata: name: openai-with-headers spec: provider: openai type: completion model: value: gpt-4o config: openai: baseUrl: value: "https://api.openai.com/v1" apiKey: valueFrom: secretKeyRef: name: openai-secret key: token headers: - name: X-Custom-Header value: value: "my-value"

Status and Health Checking

ARK continuously monitors model availability through periodic health checks. The model controller probes each model at regular intervals to ensure it remains accessible and functional.

Health Check Configuration

The pollInterval field controls how often the model is probed:

spec: pollInterval: 1m # Default: 1 minute

Status Conditions

Model status is tracked using Kubernetes conditions pattern. The primary condition is ModelAvailable:

status: conditions: - type: ModelAvailable status: "True" # True/False/Unknown reason: "Available" # Short reason for the condition message: "Model is available and probed successfully" lastTransitionTime: "2024-01-15T10:30:00Z"

Condition States:

  • ModelAvailable: True - Model successfully responds to test prompts
  • ModelAvailable: False - Model probe failed (network error, authentication issue, etc.)
  • ModelAvailable: Unknown - Initial state before first probe completes

Viewing Model Status

Check model availability using kubectl:

# List models with availability status kubectl get models NAME TYPE MODEL AVAILABLE AGE gpt-4-model azure gpt-4.1-mini True 5m claude-model bedrock claude-3-sonnet-v1 False 3m # Get detailed status kubectl describe model gpt-4-model

The AVAILABLE column shows the current state of the ModelAvailable condition, making it easy to identify models that may have connectivity or configuration issues.

Agent Model Configuration

Agents can specify which model to use. If no model is specified, the default model is used. If an agent references a model that doesn’t exist, the agent will remain in pending state. The modelRef parameter is used to specify the model name:

apiVersion: ark.mckinsey.com/v1alpha1 kind: Agent metadata: name: weather-agent spec: prompt: "You are a helpful weather assistant" # Explicitly set the model to use modelRef: # Specify the model name. If no modelRef is provided then 'default' is used. name: gpt-4o-mini
Last updated on