End-to-End Testing
ARK uses Chainsaw to declaratively create resources, run scripts, and validate resources. For example, we can create agents, teams, and queries and validate the statuses of each and the success state of a query or evaluation.
Setup
Eusure you installed Ark.
Install Chainsaw CLI
Install chainsaw for running tests locally:
# Install with Homebrew:
brew tap kyverno/chainsaw https://github.com/kyverno/chainsaw
brew install kyverno/chainsaw/chainsaw
# Or with Go:
go install github.com/kyverno/chainsaw@latestRunning Tests Locally
Simulate GitHub E2E Environment
To replicate the GitHub workflow environment locally:
# Install k3d and create test cluster
brew install k3d
k3d cluster create ark-e2e
# Run standard tests.
(cd tests && chainsaw test --selector 'standard')
# Run LLM tests. Requires model credentials to be configured.
(cd tests && chainsaw test --selector 'llm')
# Run evaluated tests (requires evals to be setup)
(cd tests && chainsaw test --selector 'evaluated')Model Tests
Use the models e2e test as a sample:
# Setup required env vars - these are pre-configured for GitHub actions.
export E2E_TEST_AZURE_OPENAI_KEY="your-key"
export E2E_TEST_AZURE_OPENAI_BASE_URL="your-endpoint"
# Run any specific tests.
chainsaw test ./tests/models --fail-fastTest Execution Details
Chainsaw tests will:
- Check required environment variables are set (e.g., API keys)
- Apply the test resources in a new namespace
- Assert the resources reach the expected state
- Clean up resources after test completion
You can see the resources that are created in the namespace during test execution in the chainsaw output.
Testing Workflows Locally
Use act to test GitHub workflows locally:
# Install act, then run workflows locally
act pull_requestDeveloping New Tests
Test Structure
Chainsaw tests follow this typical pattern:
apiVersion: chainsaw.kyverno.io/v1alpha1
kind: Test
metadata:
name: azure-openai-model-test
spec:
steps:
# Validate required environment variables
- name: check-env-vars
try:
- script:
content: |
if [ -z "$E2E_TEST_AZURE_OPENAI_KEY" ]; then
echo "E2E_TEST_AZURE_OPENAI_KEY is required"
exit 1
fi
# Generate templated resources and apply them
- name: apply
try:
- script:
content: |
kustomize build manifests | envsubst > /tmp/test-resources.yaml
- apply:
file: /tmp/test-resources.yaml
finally:
- script:
content: rm -f /tmp/test-resources.yaml
# Wait for model to reach ready state
- name: assert
try:
- assert:
file: assert-ready.yamlWriting Test Assertions
Create assertion files to validate resource states:
# assert-ready.yaml
apiVersion: v1alpha1
kind: Model
metadata:
name: test-model
status:
conditions:
- type: "Ready"
status: "True"
reason: "ModelResolved"
message: "Model successfully resolved and validated"
observedGeneration: 1
- type: "Discovering"
status: "False"
reason: "ValidationComplete"
message: "Model validation completed successfully"
observedGeneration: 1Environment Variable Templating
Use envsubst for dynamic resource generation:
# In your manifest template
apiVersion: v1alpha1
kind: Model
metadata:
name: test-model
spec:
source: azure-openai
config:
endpoint: $E2E_TEST_AZURE_OPENAI_BASE_URL
apiKey: $E2E_TEST_AZURE_OPENAI_KEYTest Organization
Structure tests by component:
tests/models/- Core model resource testsservices/{service}/test/- Service-specific integration testsark/test/e2e/- Controller and webhook tests
Debugging Tests
Verbose Output
Run chainsaw with verbose flags for debugging:
# Detailed output
chainsaw test ./tests/models --verbose
# Keep test namespaces for inspection
chainsaw test ./tests/models --cleanup=false
# Run specific test steps
chainsaw test ./tests/models --test-dir=specific-testInspecting Resources
When tests fail, inspect the created resources:
# List namespaces created by chainsaw
kubectl get ns | grep chainsaw
# Check resources in test namespace
kubectl get all -n chainsaw-test-namespace
# View logs from failed pods
kubectl logs -n chainsaw-test-namespace pod/failing-podSummarizing Chainsaw Test Results
For a quick summary of your Chainsaw test results, you can use the provided scripts/chainsaw_summary.py script. This script reads a Chainsaw JSON report and prints a concise table showing which tests passed or failed.
Usage
-
Run your Chainsaw tests with JSON reporting enabled (e.g.,
chainsaw test ... --report-json /tmp/coverage-reports/chainsaw-report.json). -
Run the summary script:
python3 scripts/chainsaw_summary.py /tmp/coverage-reports/chainsaw-report.jsonIf you omit the report path, it defaults to
/tmp/coverage-reports/chainsaw-report.json.
Example Output
Test Name | Result
------------------------------------------
query-model-target | ✅ Passed
admission-failures | ❌ Failed
query-label-selector | ✅ Passed
query-event-recorder | ✅ Passed
queries | ✅ Passed
models | ✅ Passed
-
Include the evaluation summary with the
--append-evalsflag:python3 scripts/chainsaw_summary.py --append-evals
Example Output
Evaluation | Score | Evaluator
--------------------------------------------------
chicago-weather-query | 30 | evaluator-llm
research-query | 95 | evaluator-llmCommon Issues
Environment Variables Not Set
- Ensure all required env vars are exported before running tests
- Use
env | grep TESTto verify variables are set
Resource Not Ready
- Increase timeout in assertion files
- Check controller logs for resource processing errors
- Verify all dependencies are deployed
Test Namespace Conflicts
- Use unique test names to avoid namespace collisions
- Clean up previous test runs with
--cleanup=true
Available Environment Variables for GitHub Actions
These environment variables are available on GitHub runners for your tests:
| Variable | Description |
|---|---|
E2E_TEST_AZURE_OPENAI_KEY | Azure OpenAI API key for testing model deployments |
E2E_TEST_AZURE_OPENAI_BASE_URL | Azure OpenAI endpoint URL (e.g., https://your-instance.openai.azure.com) |
Testign with Mocked LLMs, Mocked A2A Servers, or Mocked MCP Servers
We have built and used mock-llm to allow us to define how LLMs, A2A servers, and MCP servers should behave, this allows for deterministic testing of behaviours that rely on models.
Search the codebase for mock-llm to see examples.
HTTP API Testing with Hurl
Hurl is used for testing HTTP APIs of services within chainsaw tests. It comprehensive HTTP client functionality with JSON path validation and test assertions. This is used to test some MCP servers.
Search the codebase for hurl for examples
Simple Test Example
Setting Up a Mock LLM Model
Install mock-llm Helm chart . An Ark model named mock-gpt-4.1 will be created:
# Mock-LLM doesn't need to gracefully end, so we can be fast terminating it.
terminationGracePeriodSeconds: 3
# When Mock-LLM is installed, create an Ark Model that points to it and uses
# the following configuration:
ark:
model:
enabled: true
name: mock-gpt-4.1
type: openai
model: gpt-4.1
pollInterval: 3s
apiKey: mock-api-key
# Specify rules - how the model will respond to specific inputs.
config:
# On any incoming completions request, echo the messaege.
rules:
- path: "/v1/chat/completions"
match: "@"
response:
status: 200
content: |
{
"id": "mock-{{timestamp}}",
"object": "chat.completion",
"model": "{{jmes request body.model}}",
"choices": [{
"message": {{jmes request body.messages[0]}},
"finish_reason": "stop"
}]
}Install in your chainsaw test:
- script:
content: |
helm install mock-llm oci://ghcr.io/dwmkerr/charts/mock-llm \
--version 0.1.23 \
--namespace $NAMESPACE \
--values mock-llm-values.yaml
env:
- name: NAMESPACE
value: ($namespace)Wait for the Model to become available - this step will pass when the Mock-LLM server is responding to heath check sent by the Ark controller and the model is Available:
- assert:
# 10 seconds is usually enough for mock-llm and the controller to reconcile
# the model.
timeout: 10s
resource:
apiVersion: ark.mckinsey.com/v1alpha1
kind: Model
metadata:
name: mock-gpt-4.1
status:
conditions:
- type: ModelAvailable
status: "True"
catch:
- describe:
apiVersion: ark.mckinsey.com/v1alpha1
kind: model
name: test-modelKeep a short timeout for Assert for determinstic tests - if resources fail to set the right conditions early the test is likely failing.
Also use catch and describe to show events and resources if assertations fail.
Create an Agent and assert it’s available:
- apply:
resource:
apiVersion: ark.mckinsey.com/v1alpha1
kind: Agent
metadata:
name: test-agent
spec:
modelRef:
name: mock-gpt-4.1
prompt: You are a test agent.
- assert:
timeout: 10s # Agents are quick to create
resource:
apiVersion: ark.mckinsey.com/v1alpha1
kind: Agent
metadata:
name: test-agent
status:
conditions:
- type: Available
status: "True"
catch:
- describe:
apiVersion: ark.mckinsey.com/v1alpha1
kind: model
name: test-modelCreate a Query and validate the response:
- apply:
resource:
apiVersion: ark.mckinsey.com/v1alpha1
kind: Query
metadata:
name: test-query
spec:
input: What is your name?
targets:
- type: agent
name: test-agent
catch:
- events: {}
- describe:
apiVersion: ark.mckinsey.com/v1alpha1
kind: agent
name: test-agent
- assert:
timeout: 10s # Queries should be quick with mock-llm
resource:
apiVersion: ark.mckinsey.com/v1alpha1
kind: Query
metadata:
name: test-query
status:
(conditions[?type == 'Completed']):
- status: 'True'
catch:
- events: {}
- describe:
apiVersion: ark.mckinsey.com/v1alpha1
kind: Query
name: test-queryMany more examples are in the ./tests/ folder.
Next Steps
- Services - Learn about ARK services
- Observability - Monitor your ARK applications