Skip to Content
Nextra 4.0 is released 🎉
Developer GuideEnd-to-End Testing

End-to-End Testing

ARK uses Chainsaw  to declaratively create resources, run scripts, and validate resources. For example, we can create agents, teams, and queries and validate the statuses of each and the success state of a query or evaluation.

Setup

Install Tools

Setup your cluster and install testing tools:

make quickstart

Install Chainsaw CLI

Install chainsaw for running tests locally:

# Install with Go go install github.com/kyverno/chainsaw@latest # Install with Homebrew brew tap kyverno/chainsaw https://github.com/kyverno/chainsaw brew install kyverno/chainsaw/chainsaw

Running Tests Locally

Simulate GitHub E2E Environment

To replicate the GitHub workflow environment locally:

# Install k3d and create test cluster brew install k3d k3d cluster create ark-e2e # Setup ARK with all dependencies (cert-manager, postgres, etc.) ./.github/actions/setup-e2e/setup-local.sh # Run preferred chainsaw tests... (cd tests && chainsaw test --selector '!evaluated') # Cleanup k3d cluster delete ark-e2e

Model Tests

Use the models e2e test as a sample:

# Setup required env vars - these are pre-configured for GitHub actions. export E2E_TEST_AZURE_OPENAI_KEY="your-key" export E2E_TEST_AZURE_OPENAI_BASE_URL="your-endpoint" # Run any specific tests. chainsaw test ./tests/models --fail-fast

Test Execution Details

Chainsaw tests will:

  • Check required environment variables are set (e.g., API keys)
  • Apply the test resources in a new namespace
  • Assert the resources reach the expected state
  • Clean up resources after test completion

You can see the resources that are created in the namespace during test execution in the chainsaw output.

Testing Workflows Locally

Use act  to test GitHub workflows locally:

# Install act, then run workflows locally act pull_request

Developing New Tests

Test Structure

Chainsaw tests follow this typical pattern:

apiVersion: chainsaw.kyverno.io/v1alpha1 kind: Test metadata: name: azure-openai-model-test spec: steps: # Validate required environment variables - name: check-env-vars try: - script: content: | if [ -z "$E2E_TEST_AZURE_OPENAI_KEY" ]; then echo "E2E_TEST_AZURE_OPENAI_KEY is required" exit 1 fi # Generate templated resources and apply them - name: apply try: - script: content: | kustomize build manifests | envsubst > /tmp/test-resources.yaml - apply: file: /tmp/test-resources.yaml finally: - script: content: rm -f /tmp/test-resources.yaml # Wait for model to reach ready state - name: assert try: - assert: file: assert-ready.yaml

Writing Test Assertions

Create assertion files to validate resource states:

# assert-ready.yaml apiVersion: v1alpha1 kind: Model metadata: name: test-model status: phase: ready

Environment Variable Templating

Use envsubst for dynamic resource generation:

# In your manifest template apiVersion: v1alpha1 kind: Model metadata: name: test-model spec: source: azure-openai config: endpoint: $E2E_TEST_AZURE_OPENAI_BASE_URL apiKey: $E2E_TEST_AZURE_OPENAI_KEY

Test Organization

Structure tests by component:

  • tests/models/ - Core model resource tests
  • services/{service}/test/ - Service-specific integration tests
  • ark/test/e2e/ - Controller and webhook tests

Debugging Tests

Verbose Output

Run chainsaw with verbose flags for debugging:

# Detailed output chainsaw test ./tests/models --verbose # Keep test namespaces for inspection chainsaw test ./tests/models --cleanup=false # Run specific test steps chainsaw test ./tests/models --test-dir=specific-test

Inspecting Resources

When tests fail, inspect the created resources:

# List namespaces created by chainsaw kubectl get ns | grep chainsaw # Check resources in test namespace kubectl get all -n chainsaw-test-namespace # View logs from failed pods kubectl logs -n chainsaw-test-namespace pod/failing-pod

Summarizing Chainsaw Test Results

For a quick summary of your Chainsaw test results, you can use the provided scripts/chainsaw_summary.py script. This script reads a Chainsaw JSON report and prints a concise table showing which tests passed or failed.

Usage

  1. Run your Chainsaw tests with JSON reporting enabled (e.g., chainsaw test ... --report-json /tmp/coverage-reports/chainsaw-report.json).

  2. Run the summary script:

    python3 scripts/chainsaw_summary.py /tmp/coverage-reports/chainsaw-report.json

    If you omit the report path, it defaults to /tmp/coverage-reports/chainsaw-report.json.

Example Output

Test Name | Result ------------------------------------------ query-model-target | ✅ Passed admission-failures | ❌ Failed query-label-selector | ✅ Passed query-event-recorder | ✅ Passed queries | ✅ Passed models | ✅ Passed
  1. Include the evaluation summary with the --append-evals flag:

    python3 scripts/chainsaw_summary.py --append-evals

Example Output

Evaluation | Score | Evaluator -------------------------------------------------- chicago-weather-query | 30 | evaluator-llm research-query | 95 | evaluator-llm

Common Issues

Environment Variables Not Set

  • Ensure all required env vars are exported before running tests
  • Use env | grep TEST to verify variables are set

Resource Not Ready

  • Increase timeout in assertion files
  • Check controller logs for resource processing errors
  • Verify all dependencies are deployed

Test Namespace Conflicts

  • Use unique test names to avoid namespace collisions
  • Clean up previous test runs with --cleanup=true

Available Environment Variables for GitHub Actions

These environment variables are available on GitHub runners for your tests:

VariableDescription
E2E_TEST_AZURE_OPENAI_KEYAzure OpenAI API key for testing model deployments
E2E_TEST_AZURE_OPENAI_BASE_URLAzure OpenAI endpoint URL (e.g., https://your-instance.openai.azure.com)

HTTP API Testing with Hurl

Overview

Hurl  is used for testing HTTP APIs of services within chainsaw tests. It provides comprehensive HTTP client functionality with JSON path validation and test assertions.

Service Test Structure

Services with HTTP APIs use this test structure:

services/{service-name}/test/ ├── test.hurl # HTTP test definitions ├── chainsaw-test.yaml # Chainsaw integration └── manifests/ ├── pod-{service}-test.yaml # Test pod with hurl image └── configmap.yaml # ConfigMap mounting hurl files

Basic Hurl Test Patterns

Health Check Testing

# Test service health endpoint GET http://service-name/health HTTP 200 [Asserts] body == "OK"

JSON API Testing

# Test JSON endpoint with validation GET http://service-name/api/endpoint HTTP 200 [Asserts] jsonpath "$.status" == "ready" jsonpath "$.data" exists jsonpath "$.data.items" count >= 1

POST with JSON Body

# Send JSON data to API PUT http://service-name/api/resource/session-id Content-Type: application/json { "data": { "field": "value", "items": ["item1", "item2"] } } HTTP 200 [Asserts] jsonpath "$.success" == true

Real-World Examples

PostgreSQL Memory Service

From services/postgres-memory/test/test.hurl:

# Test message storage and retrieval PUT http://postgres-memory/message/test-session Content-Type: application/json { "message": { "role": "user", "content": "Test message" } } HTTP 200 # Verify message retrieval GET http://postgres-memory/message/test-session HTTP 200 [Asserts] jsonpath "$.messages" count == 1 jsonpath "$.messages[0].role" == "user" jsonpath "$.messages[0].content" == "Test message" # Test session isolation GET http://postgres-memory/message/other-session HTTP 200 [Asserts] jsonpath "$.messages" == null

A2A Gateway Service

From services/a2agw/test/test.hurl:

# Test agent discovery GET http://a2agw:8080/agents HTTP 200 [Asserts] jsonpath "$" count >= 1 jsonpath "$[*]" contains "weather-bot" # Test JSON-RPC messaging POST http://a2agw:8080/agent/weather-bot/jsonrpc Content-Type: application/json { "jsonrpc": "2.0", "method": "message/send", "params": { "message": { "kind": "message", "messageId": "test-1", "role": "user", "parts": [{"text": "What's the weather?"}] } }, "id": 1 } HTTP 200 [Asserts] jsonpath "$.jsonrpc" == "2.0" jsonpath "$.result.messageId" exists

Chainsaw Integration

Test Pod Setup

# Pod with hurl Docker image - apply: resource: apiVersion: v1 kind: Pod metadata: name: service-test spec: containers: - name: test-client image: ghcr.io/orange-opensource/hurl:6.1.1 command: ["sleep", "300"] volumeMounts: - name: test-files mountPath: /tests volumes: - name: test-files configMap: name: hurl-test-files

Test Execution

# Execute hurl tests inside pod - name: run-hurl-tests try: - script: content: | kubectl exec service-test -n $NAMESPACE -- hurl --test /tests/test.hurl timeout: 120s

Combined Testing Pattern

Services typically combine HTTP API testing with ARK integration testing:

# First test HTTP endpoints directly - name: run-hurl-tests try: - script: content: kubectl exec test-pod -- hurl --test /tests/test.hurl # Then test ARK integration - name: test-ark-integration try: - assert: resource: apiVersion: ark.mckinsey.com/v1alpha1 kind: Query status: phase: done

This validates both the service’s HTTP API functionality and its integration with ARK resources.

Chainsaw Functions and Expressions

Basic Expressions

Comparison Operators

# Assert that there are exactly 3 pods ready in a Deployment (status.readyReplicas == `3`): true # Assert that the number of available pods is not zero (status.availableReplicas != `0`): true # Assert that the number of pods is greater than or equal to 2 (status.replicas >= `2`): true # Assert that the number of unavailable pods is less than 1 (status.unavailableReplicas < `1`): true # Combined conditions: at least 2 pods, but no more than 5 (status.replicas >= `2` && status.replicas <= `5`): true # Either all pods are ready, or the deployment is progressing (status.readyReplicas == status.replicas || status.conditions[*].type contains 'Progressing'): true

Type Conversion

# Convert string to number (to_number(evaluations[0].score) >= `0`): true # Convert to string (to_string(evaluations[0].passed) == 'true'): true # Type checking status: (type(evaluations[0].evaluatorName) == 'string'): true (type(tokenUsage.promptTokens) == 'number'): true (type(evaluations[0].passed) == 'boolean'): true (type(evaluations) == 'array'): true (type(evaluations[0]) == 'object'): true

Array and Object Functions

Array Operations

# Array length (length(responses)): 1 (length(evaluations) > `0`): true # Array contains (contains(responses[*].target.name, 'agent-name')): true (contains(['a', 'b', 'c'], 'b')): true # Array indexing (responses[0].content != ''): true (evaluations[0].passed): true

Object Operations

# Check if field exists (has(evaluations[0].metadata)): true (has(status.phase)): true # Get object keys (contains(keys(evaluations[0].metadata), 'reasoning')): true (length(keys(metadata)) > `0`): true # Get object values (contains(values(metadata), 'success')): true

String Functions

String Operations

# String length (length(responses[0].content) > `50`): true # String contains (contains(responses[0].content, 'Chicago')): true (contains(responses[0].content, 'weather')): true # String join (length(join('', responses[*].content)) > `50`): true (join(',', responses[*].target.name) == 'agent1,agent2'): true # String matching (responses[0].content =~ 'pattern'): true

Advanced Validation Patterns

Range Validation

# Numeric range (0-100) (to_number(evaluations[0].score) >= `0` && to_number(evaluations[0].score) <= `100`): true # String length range (length(responses[0].content) >= `10` && length(responses[0].content) <= `1000`): true

Multi-condition Validation

# All conditions must be true (evaluations[0].passed && to_number(evaluations[0].score) > `70` && evaluations[0].evaluatorName != ''): true # At least one condition must be true (contains(responses[0].content, 'Chicago') || contains(responses[0].content, 'chicago') || contains(responses[0].content, 'CHICAGO')): true

Nested Field Validation

# Deep object access (evaluations[0].metadata.reasoning != ''): true (responses[0].target.type == 'agent'): true # Array of objects (responses[*].target.name contains 'agent-name'): true (evaluations[*].passed contains true): true

Common ARK Testing Patterns

Resource Status Validation

# Basic resource ready state status: phase: ready # Query completion status: phase: done (length(responses) > `0`): true (length(evaluations) > `0`): true

Evaluation Validation

# Complete evaluation check (length(evaluations)): 1 (has(evaluations[0].passed)): true (to_number(evaluations[0].score) >= `0` && to_number(evaluations[0].score) <= `100`): true (evaluations[0].evaluatorName != ''): true (has(evaluations[0].metadata)): true (contains(keys(evaluations[0].metadata), 'reasoning')): true

Response Content Validation

# Response existence and content (length(responses)): 1 (contains(responses[*].target.name, 'agent-name')): true (length(responses[0].content) > `10`): true # Multiple content patterns (contains(responses[0].content, 'Chicago') || contains(responses[0].content, 'chicago')): true (contains(responses[0].content, 'weather') || contains(responses[0].content, 'forecast') || contains(responses[0].content, 'temperature')): true

Error Handling

# Check for absence of errors (has(status.error)): false (status.error == null): true # Validate error states status: phase: error (has(status.error)): true (status.error != ''): true

Best Practices

Readable Assertions

# Good: Multiple clear assertions (length(evaluations)): 1 (evaluations[0].passed): true (to_number(evaluations[0].score) >= `70`): true # Avoid: Complex single assertion (length(evaluations) == 1 && evaluations[0].passed && to_number(evaluations[0].score) >= `70`): true

Defensive Validation

# Check existence before accessing (has(evaluations[0])): true (has(evaluations[0].metadata)): true (contains(keys(evaluations[0].metadata), 'reasoning')): true # Validate types (type(evaluations[0].passed) == 'boolean'): true (type(evaluations[0].score) == 'string'): true

Flexible Content Matching

# Case-insensitive matching (contains(to_lower(responses[0].content), 'chicago')): true # Multiple acceptable values (evaluations[0].passed in [true, false]): true (status.phase in ['done', 'completed']): true

Additional Testing Approaches

Go-based E2E Tests

For controller-specific testing, use the Go-based e2e tests:

cd ark/ make setup-test-e2e # Setup Kind cluster make test-e2e # Run Ginkgo tests make cleanup-test-e2e # Cleanup

These tests validate:

  • Controller deployment and health
  • Webhook configuration and certificates
  • Custom resource processing
  • Metrics endpoints

Next Steps

Last updated on