Skip to Content
Nextra 4.0 is released 🎉
Developer GuideEvaluator LLM Service

Evaluator LLM Service

AI-powered query evaluation service that uses large language models as judges to assess response quality automatically.

Overview

The Evaluator LLM service implements the LLM-as-a-Judge pattern, providing automated evaluation of query responses across multiple quality dimensions. It integrates seamlessly with the ARK platform to provide quality gating for agent interactions.

Features

  • LLM-as-a-Judge Pattern: Uses advanced language models to evaluate response quality objectively
  • Multi-Criteria Assessment: Evaluates responses across 5 key dimensions
  • Model Flexibility: Supports OpenAI and Azure OpenAI configurations
  • Kubernetes Native: Deploys as Evaluator custom resource
  • REST API: Simple HTTP interface for evaluation requests

Installation

Deploy the evaluator service using Helm:

# Install the evaluator-llm service helm install evaluator-llm ./services/evaluator-llm/chart # Verify deployment kubectl get pods -l app.kubernetes.io/name=evaluator-llm kubectl get evaluator evaluator-llm

Usage

1. Create Model Configuration

First, ensure you have a model configured for evaluation:

apiVersion: ark.mckinsey.com/v1alpha1 kind: Model metadata: name: evaluation-model spec: type: openai url: https://api.openai.com/v1/chat/completions model: gpt-4 apiKey: your-api-key

2. Configure Evaluator

The evaluator is automatically created by the Helm chart, but you can customize it:

apiVersion: ark.mckinsey.com/v1alpha1 kind: Evaluator metadata: name: llm-evaluator spec: type: llm-judge description: "LLM-as-a-Judge evaluator for query assessment" address: valueFrom: serviceRef: name: evaluator-llm port: "http" path: "/evaluate" modelRef: name: evaluation-model

3. Use in Queries

Reference the evaluator in your queries:

apiVersion: ark.mckinsey.com/v1alpha1 kind: Query metadata: name: research-query spec: input: "Explain the benefits of renewable energy" targets: - type: agent name: research-agent evaluator: name: llm-evaluator

Evaluation Process

When a query with an evaluator completes:

  1. Query Execution: Agent generates response normally
  2. Evaluation Trigger: Query status changes to “evaluating”
  3. AI Assessment: Evaluator analyzes response using configured model
  4. Quality Scoring: Response scored across multiple criteria
  5. Completion: Query marked as “done” after evaluation

Evaluation Criteria

The service evaluates responses across five dimensions (0-100 scale):

  • Relevance: How well the response addresses the query
  • Accuracy: Factual correctness and reliability
  • Completeness: Comprehensiveness of the information
  • Clarity: Readability and ease of understanding
  • Usefulness: Practical value to the user

A response with an overall score ≥70 is considered “passed”.

API Reference

Health Endpoints

  • GET /health - Service health status
  • GET /ready - Service readiness check

Evaluation Endpoint

  • POST /evaluate - Evaluate query responses

Request Format:

{ "queryId": "query-uuid", "input": "user query text", "responses": [ { "target": {"type": "agent", "name": "agent-name"}, "content": "agent response content" } ], "query": {...}, "model": { "spec": {...}, "metadata": {...} } }

Response Format:

{ "score": "85", "passed": true, "metadata": { "reasoning": "Response demonstrates good accuracy...", "criteria_scores": "relevance=90, accuracy=85, ..." } }

Configuration

Model Support

The evaluator supports these model types:

  • OpenAI: Standard OpenAI API endpoints
  • Azure OpenAI: Azure-hosted OpenAI services

Model configuration is passed automatically from the Evaluator custom resource.

Evaluation Parameters

The service uses optimized parameters for consistent evaluation:

  • Temperature: 0.1 (low for consistent scoring)
  • Max Tokens: 1000 (sufficient for detailed evaluation)
  • Timeout: 30 seconds per evaluation

Monitoring

Monitor evaluator performance:

# Check service logs kubectl logs -l app.kubernetes.io/name=evaluator-llm # View evaluator status kubectl get evaluator evaluator-llm -o yaml # Monitor query evaluation phases kubectl get query -w

Development

For local development:

cd services/evaluator-llm # Install dependencies make init # Run locally make dev # Run tests make test # Check code quality make lint

Architecture

The evaluator service consists of:

  • FastAPI Application: REST API server with async endpoints
  • LLM Evaluator: Core evaluation logic with structured prompting
  • LLM Client: HTTP client supporting OpenAI and Azure APIs
  • Type System: Pydantic models for request/response validation

The service integrates with ARK through:

  • Evaluator CRD: Kubernetes custom resource for configuration
  • ValueSource Resolution: Dynamic address and model resolution
  • Operation Tracking: Telemetry and monitoring integration

Next Steps

Last updated on