Skip to Content
Nextra 4.0 is released πŸŽ‰
ReferenceCustom Resource Definitions

Custom Resource Definitions (CRDs)

This page provides detailed specifications for each ARK custom resource. For an overview of how these resources work with Kubernetes, see Kubernetes Integration.

Resource Reference

ResourceAPI VersionDescription
Agentark.mckinsey.com/v1alpha1AI agents with prompts and tools
Teamark.mckinsey.com/v1alpha1Teams of agents with execution strategies
Modelark.mckinsey.com/v1alpha1LLM service configurations
Queryark.mckinsey.com/v1alpha1Queries to agents or teams
Toolark.mckinsey.com/v1alpha1Custom tools for agents
MCPServerark.mckinsey.com/v1alpha1Model Context Protocol servers
Evaluatorark.mckinsey.com/v1alpha1AI-powered query assessment services
Evaluationark.mckinsey.com/v1alpha1Multi-mode AI output assessments
A2AServerark.mckinsey.com/v1prealpha1Agent-to-Agent protocol servers
ExecutionEngineark.mckinsey.com/v1prealpha1External execution engines

Models

Models define connections to AI model providers and handle authentication and configuration.

Specification

apiVersion: ark.mckinsey.com/v1alpha1 kind: Model metadata: name: default # Special name 'default' used when agents don't specify a model spec: type: azure # openai, azure, bedrock model: value: gpt-4.1-mini config: azure: baseUrl: value: "https://lxo.openai.azure.com" apiKey: valueFrom: secretKeyRef: name: default-model-token key: token apiVersion: value: "2024-12-01-preview"

Supported Providers

  • Azure OpenAI: Enterprise-grade OpenAI models
  • OpenAI: Direct OpenAI API access
  • AWS Bedrock: Amazon’s managed AI service
  • Gemini: Google’s AI models

Configuration Options

  • API Keys: Stored securely in Kubernetes secrets
  • Base URLs: Custom endpoints for different providers
  • API Versions: Provider-specific API versions
  • Model Parameters: Temperature, max tokens, etc.

Agents

Agents are AI entities that process inputs using AI models and can use tools to extend their capabilities.

Specification

apiVersion: ark.mckinsey.com/v1alpha1 kind: Agent metadata: name: weather spec: description: Weather forecasting agent that provides current conditions and forecasts prompt: | You are a helpful weather assistant. You can provide weather forecasts and current conditions for any location. You should be concise, direct, and to the point. You should NOT answer with unnecessary preamble or postamble. tools: - type: custom name: get-coordinates - type: custom name: get-forecast

Key Fields

  • prompt: Defines the agent’s behavior and instructions
  • description: Human-readable description of the agent’s purpose
  • tools: List of tools the agent can use
  • modelRef: Optional reference to a specific model (uses β€˜default’ if not specified)

Teams

Teams coordinate multiple agents working together using different execution strategies.

Specification

apiVersion: ark.mckinsey.com/v1alpha1 kind: Team metadata: name: team-seq spec: members: - name: agent-seq type: agent - name: agent-seq type: agent - name: agent-seq type: agent strategy: "sequential"

Execution Strategies

  • sequential: Agents process input one after another
  • parallel: Agents process input simultaneously
  • round-robin: Agents take turns processing inputs
  • selector: Dynamic agent selection based on criteria

Member Types

  • agent: Reference to an Agent resource
  • team: Reference to another Team resource (nested teams)

Advanced Features

Teams support complex workflows including:

  • Graph-based strategies: Custom execution flows
  • Conditional routing: Dynamic member selection
  • Termination conditions: Early completion criteria

Queries

Queries represent requests sent to agents or teams and track their execution and results.

Specification

apiVersion: ark.mckinsey.com/v1alpha1 kind: Query metadata: name: weather-query spec: input: 'What is the weather today in New York?' targets: - type: agent name: weather

Target Types

  • agent: Send query to a specific agent
  • team: Send query to a team of agents
  • selector: Use label selectors to choose targets dynamically

Advanced Features

  • Session Management: Group related queries with sessionId
  • Template Parameters: Use variables in query input
  • Timeout Control: Set maximum execution time
  • Multiple Targets: Send same query to multiple agents/teams
  • Evaluators: Automatic assessment of query results

Evaluators

Evaluators provide either deterministic or AI-powered assessment of teams / agents / queries / tools to support quality control and testing. For non-deterministic cases, they use the β€œLLM-as-a-Judge” pattern to automatically evaluate agent responses.

How Evaluators Work

  • Optional Integration: Queries can optionally reference an Evaluator for quality assessment
  • LLM-as-a-Judge: Evaluators use AI models to assess response quality across multiple criteria
  • Automatic Triggering: When a query completes, if an evaluator is specified, the query enters β€œevaluating” phase
  • Quality Gating: Only after successful evaluation is the query marked as β€œdone”

Specification

apiVersion: ark.mckinsey.com/v1alpha1 kind: Evaluator metadata: name: llm-evaluator spec: type: llm-judge description: "LLM-as-a-Judge evaluator for query assessment" address: valueFrom: serviceRef: name: evaluator-llm port: "http" path: "/evaluate" selector: # optional - for automatic query evaluation resourceType: Query matchLabels: evaluate: "true" # will automatically evaluate any query with label evaluate=true parameters: # optional - including model configuration - name: model.name # specify model to use (default: "default") value: "gpt-4-model" - name: model.namespace # specify model namespace (default: evaluator's namespace) value: "models" - name: min-score # custom parameter passed to the evaluation service value: "0.8"

Auto-triggered Evaluation:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Evaluator with β”‚ Auto-triggers β”‚ New Query β”‚ β”‚ selector: β”‚ ◄───────────────── β”‚ labels: β”‚ β”‚ matchLabels: β”‚ when created β”‚ evaluate: β”‚ β”‚ evaluate: true β”‚ or modified β”‚ "true" β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key fields

  • Address - Service Reference: Target specific execution service to execute an evaluation.
  • Query selector: Automatic match of queries to be evaluated using specific labels.
  • Parameter map: Pass default evaluation parameters to the target evaluation service.

Evaluations

Evaluations assess different metrics using various modes including direct assessment, dataset comparison, and query result evaluation.
Current design is aligned with: One Evaluation = One Evaluator = One Specific Assessment

Overview

Evaluations work with Evaluators to assess AI outputs, both deterministic and non-deterministic. Multiple evaluation modes can target different evaluators, and evaluators can automatically process evaluations based on label selectors.

Evaluation Flow:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Evaluation │◄─────│ Evaluator(s)β”‚ β”‚ Ark Evaluation β”‚ β”‚ Mode │─────►│ │─────►│ Service(s) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ └─────► (*) Query ────► (Agent/Tool/Team)

Key Fields

  • type: Evaluation type (direct, baseline, query, batch, event)
  • evaluator: Reference to the Evaluator resource
  • config: Type-specific configuration with embedded fields
  • status.score: Evaluation score (0-1)
  • status.passed: Whether evaluation passed

Evaluation Types

Direct Type

Evaluate a single input/output pair:

apiVersion: ark.mckinsey.com/v1alpha1 kind: Evaluation metadata: name: direct-eval spec: type: direct evaluator: name: quality-evaluator config: input: "What's the weather in NYC?" output: "It's 72Β°F and sunny in New York City"

Baseline type

Evaluate against baseline datasets to measure performance and verify that the evaluator achieves proper metrics:

apiVersion: ark.mckinsey.com/v1alpha1 kind: Evaluation metadata: name: baseline-eval spec: type: baseline evaluator: name: llm-judge parameters: - name: dataset.serviceRef.name value: postgres-memory - name: dataset.datasetId value: weather-test-suite-v2 - name: dataset.testCaseIds value: "test-nyc-001,test-la-002" config: baseline: {}

Query type evaluation

Evaluate existing query results:

apiVersion: ark.mckinsey.com/v1alpha1 kind: Evaluation metadata: name: query-eval spec: type: query evaluator: name: accuracy-evaluator parameters: # optional - name: dataset.serviceRef.name value: dataset-service - name: dataset.serviceRef.namespace value: default - name: dataset.datasetId value: expected-results-v1 config: queryRef: name: weather-query-123 responseTarget: "weather-agent"

Batch type

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Evaluation β”‚ Aggregates multiple child evaluations β”‚ type=Batch β”‚ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─┐ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Eval 1 β”‚ β”‚ Eval 2 β”‚ β”‚ Eval n β”‚ β”‚ type=Query β”‚ β”‚ type=Query β”‚ β”‚ type=Direct β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Aggregate results from multiple evaluations:

apiVersion: ark.mckinsey.com/v1alpha1 kind: Evaluation metadata: name: batch-eval spec: type: batch evaluator: name: aggregator-evaluator config: evaluations: - name: weather-eval namespace: default # optional - name: agent-eval - name: tool-eval

Event/Rule based evaluation

Rule-based evaluations using CEL (Common Expression Language):

apiVersion: ark.mckinsey.com/v1alpha1 kind: Evaluation metadata: name: event-eval spec: type: event evaluator: name: tool-usage-evaluator config: rules: - name: "weather-tool-called" expression: 'event.type == "tool_call" && event.tool_name == "get-weather"' description: "Validates weather tool was called" weight: 1 - name: "response-contains-temperature" expression: 'response.content.contains("Β°F") || response.content.contains("Β°C")' description: "Ensures temperature is in response" weight: 2

Using Evaluators in Queries

Reference an evaluator in your query to enable automatic assessment:

apiVersion: ark.mckinsey.com/v1alpha1 kind: Query metadata: name: research-query spec: input: "Analyze renewable energy trends" targets: - type: agent name: research-agent evaluator: name: llm-evaluator

Evaluation Process

  1. Query Execution: Agent processes the query and generates response
  2. Evaluation Trigger: Query enters β€œevaluating” phase
  3. Assessment: Evaluator analyzes response quality using configured criteria
  4. Scoring: Evaluator provides numerical score and qualitative feedback
  5. Completion: Query marked as β€œdone” with evaluation results

Evaluation Results

Evaluation results are stored in the Query status:

status: phase: done evaluations: - evaluatorName: llm-evaluator passed: true score: "85" metadata: reasoning: "Response provides comprehensive analysis with supporting data" criteria: accuracy: "high" completeness: "good" relevance: "excellent"

A2A Servers

A2A (Agent-to-Agent) Servers enable hosting external agent frameworks within ARK.

Specification

apiVersion: ark.mckinsey.com/v1prealpha1 kind: A2AServer metadata: name: langchain-agents spec: address: valueFrom: serviceRef: name: langchain-service port: "8080"

Execution Engines

Execution Engines provide custom runtime environments for specialized agent execution.

Specification

apiVersion: ark.mckinsey.com/v1prealpha1 kind: ExecutionEngine metadata: name: custom-engine spec: type: external endpoint: "http://custom-engine-service:8080"

Resource Relationships

ARK resources work together in common patterns:

  • Agent + Model + Tools: Basic agent with capabilities
  • Team + Multiple Agents: Multi-agent collaboration
  • Query + Targets: Requests to agents or teams
  • MCP Server + Tools: Standardized tool integration
  • Memory + Sessions: Persistent conversations

Next: Learn about CLI Tools for working with these resources.

Last updated on