Skip to Content

Models

Models define AI language model configurations for agents to use. Agents use the model named default if no specific model is configured.

OpenAI

# Example OpenAI model. apiVersion: ark.mckinsey.com/v1alpha1 kind: Model metadata: name: default spec: # The 'openai' type is any openai specification compatible model. This # includes OpenAI, Google Gemini (in OpenAI compability mode), Anthropic # Claude and so on. type: openai model: # The specific model type. value: gpt-4o config: openai: # API endpoint URL baseUrl: value: "https://api.openai.com/v1" # API authentication key - this should be set to a Kubernetes Secret # for security purposes. apiKey: valueFrom: secretKeyRef: name: default-model-token key: token # Optional model generation parameters properties: temperature: value: "0.7" max_tokens: value: "4096" --- # Example of a secret that can be used to configure the API key for a model. apiVersion: v1 kind: Secret metadata: name: default-model-token type: Opaque stringData: token: "your-api-key-here"

An API key secret can also be created like so:

kubectl create secret generic default-model-token --from-literal=token="your-api-key-here"

Azure OpenAI

apiVersion: ark.mckinsey.com/v1alpha1 kind: Model metadata: name: gpt-4o-mini spec: # The Azure OpenAI model type. type: azure model: value: gpt-4o-mini config: azure: baseUrl: value: "https://your-resource.openai.azure.com" apiKey: valueFrom: secretKeyRef: name: azure-openai-key key: token # Azure-specific API version apiVersion: value: "2024-12-01-preview" # Optional properties. properties: temperature: value: "0.7" max_tokens: value: "4096"

AWS Bedrock

apiVersion: ark.mckinsey.com/v1alpha1 kind: Model metadata: name: claude-haiku spec: # The AWS Bedrock model type. type: bedrock model: value: "us.anthropic.claude-3-5-haiku-20241022-v1:0" config: bedrock: # AWS region (optional, uses default) region: value: "us-west-2" # Base URL - optional only needed if the non-default is required. baseUrl: value: "https://aws-bedrock.prod.ai-gateway.quantumblack.com/your-project-id" # Explicit credentials (optional, defaults to IAM role) accessKeyId: valueFrom: secretKeyRef: name: aws-credentials key: access-key-id secretAccessKey: valueFrom: secretKeyRef: name: aws-credentials key: secret-access-key # Session token for temporary credentials or JWT tokens sessionToken: valueFrom: secretKeyRef: name: aws-credentials key: session-token # Custom model ARN (optional) modelArn: value: "arn:aws:bedrock:..." properties: temperature: value: "0.7" max_tokens: value: "4096"

Google Gemini and Anthropic Models

Both Google Gemini and Anthropic provide OpenAI-compatible endpoints, allowing you to use their models with the openai type. The base URls are:

  • https://generativelanguage.googleapis.com/v1beta/openai for Google Gemini
  • https://api.anthropic.com/v1 for Anthropic Claude

Most other providers also support OpenAI compatible base URLs - check their docs for details.

Model Properties

All model providers support a flexible properties system that allows you to customize model behavior by setting parameters like temperature, max tokens, and other OpenAI ChatCompletion parameters.

Basic Properties Example

apiVersion: ark.mckinsey.com/v1alpha1 kind: Model metadata: name: gpt-4-custom spec: type: openai model: value: gpt-4o config: openai: properties: temperature: value: "0.1" max_tokens: value: "1000" baseUrl: value: "https://api.openai.com/v1" apiKey: valueFrom: secretKeyRef: name: openai-secret key: token

Any OpenAI ChatCompletion parameters can be provided through the properties system, including temperature, max_tokens, top_p, frequency_penalty, presence_penalty, stop, seed, and more.

Custom HTTP Headers

OpenAI and Azure models support custom HTTP headers for advanced authentication and routing scenarios. Headers can be specified with direct values or loaded from Kubernetes Secrets and ConfigMaps.

Supported Providers:

  • OpenAI
  • Azure OpenAI

Basic Headers Example

apiVersion: ark.mckinsey.com/v1alpha1 kind: Model metadata: name: default spec: type: azure model: value: gpt-4o config: azure: baseUrl: value: "https://your-resource.openai.azure.com" apiKey: valueFrom: secretKeyRef: name: azure-openai-key key: token apiVersion: value: "2024-12-01-preview" # Custom HTTP headers sent with every request headers: - name: X-Custom-Header value: value: "direct-header-value" - name: X-Request-ID value: value: "my-app-v1"

Headers from Secrets and ConfigMaps

Load sensitive header values from Kubernetes Secrets or configuration from ConfigMaps:

apiVersion: v1 kind: Secret metadata: name: gateway-credentials type: Opaque stringData: api-key: "your-gateway-api-key" --- apiVersion: v1 kind: ConfigMap metadata: name: app-config data: user-agent: "MyApp/1.0" --- apiVersion: ark.mckinsey.com/v1alpha1 kind: Model metadata: name: default spec: type: azure model: value: gpt-4o config: azure: baseUrl: value: "https://your-resource.openai.azure.com" apiKey: valueFrom: secretKeyRef: name: azure-openai-key key: token apiVersion: value: "2024-12-01-preview" headers: # Load from Secret - name: X-API-Gateway-Key value: valueFrom: secretKeyRef: name: gateway-credentials key: api-key # Load from ConfigMap - name: User-Agent value: valueFrom: configMapKeyRef: name: app-config key: user-agent # Direct value - name: X-Client-ID value: value: "production-client"

OpenAI Provider Headers

Headers work the same way with OpenAI provider:

apiVersion: ark.mckinsey.com/v1alpha1 kind: Model metadata: name: openai-with-headers spec: type: openai model: value: gpt-4o config: openai: baseUrl: value: "https://api.openai.com/v1" apiKey: valueFrom: secretKeyRef: name: openai-secret key: token headers: - name: X-Custom-Header value: value: "my-value"

Status and Health Checking

ARK continuously monitors model availability through periodic health checks. The model controller probes each model at regular intervals to ensure it remains accessible and functional.

Health Check Configuration

The pollInterval field controls how often the model is probed:

spec: pollInterval: 1m # Default: 1 minute

Status Conditions

Model status is tracked using Kubernetes conditions pattern. The primary condition is ModelAvailable:

status: conditions: - type: ModelAvailable status: "True" # True/False/Unknown reason: "Available" # Short reason for the condition message: "Model is available and probed successfully" lastTransitionTime: "2024-01-15T10:30:00Z"

Condition States:

  • ModelAvailable: True - Model successfully responds to test prompts
  • ModelAvailable: False - Model probe failed (network error, authentication issue, etc.)
  • ModelAvailable: Unknown - Initial state before first probe completes

Viewing Model Status

Check model availability using kubectl:

# List models with availability status kubectl get models NAME TYPE MODEL AVAILABLE AGE gpt-4-model azure gpt-4.1-mini True 5m claude-model bedrock claude-3-sonnet-v1 False 3m # Get detailed status kubectl describe model gpt-4-model

The AVAILABLE column shows the current state of the ModelAvailable condition, making it easy to identify models that may have connectivity or configuration issues.

Agent Model Configuration

Agents can specify which model to use. If no model is specified, the default model is used. If an agent references a model that doesn’t exist, the agent will remain in pending state. The modelRef parameter is used to specify the model name:

apiVersion: ark.mckinsey.com/v1alpha1 kind: Agent metadata: name: weather-agent spec: prompt: "You are a helpful weather assistant" # Explicitly set the model to use modelRef: # Specify the model name. If no modelRef is provided then 'default' is used. name: gpt-4o-mini
Last updated on