Skip to Content
ReferenceTroubleshooting

Troubleshooting

Troubleshooting guide for challenges when installing or using Ark.

Ark Installation

This section covers common issues and solutions when installing Ark on various platforms.

Prerequisites

Before installing Ark, ensure you have:

The easiest way to install dependencies on macOS:

brew install node kubectl helm # Install a cluster such as: # brew install minikube # brew install kind # brew install orbstack

Verify your cluster context:

kubectl config get-contexts

Common Installation Issues

Installing Ark on Windows

Ark requires a Kubernetes cluster to run. There are a number of potential issues that can arise from permissions, organizational policies and so on, particularly around running Hypervisors (which is needed to run services like Docker and local Kubernetes clusters).

WSL and Docker

In most cases, you should install and enable Windows Subsystem for Linux  with Ubuntu or another Linux distribution.

You should then install Docker Desktop for Windows . Hyper-V must be enabled. Enable Hyper-V and restart. Verify in “Task Manager” → “Performance” → “CPU” that “Virtualization” shows as “Enabled”.

Kubernetes Cluster

Docker Desktop for Windows has Kubernetes support, go to wDocker Desktop Settings → Kubernetes → Enable Kubernetes.

Another option is Minikube  for Windows. Install and start with minikube start.

Windows Installation Troubleshooting

Many Windows installation issues stem from enterprise security policies and restricted environments. Potential issues are:

  • Virtualization disabled - Hyper-V is often disabled by policy for security reasons
  • Restricted user permissions - Standard users may not be able to modify system settings
  • Security software - Endpoint protection may flag container operations as suspicious

Node.js Installation Hangs

Has been seen on some Windows installs. When the hang happens, terminate the installer, you can typically run npm install -g @agents-at-scale/ark and then install as normal, if not retry the Node.js installation.

Missing Dependencies

If ark status shows missing kubectl or helm then you local cluster is not setup, follow the guides above. The easiest way to install dependencies is via Chocolatey:

# Check system status ark status
# Install Chocolatey first (requires PowerShell as Administrator) Set-ExecutionPolicy Bypass -Scope Process -Force [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072 iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1')) # Install the core dependencies. choco install docker choco install helm # Install minikube if you want to use this as your cluster - this'll also install # the kubectl binary. choco install minikube # Other cluster options are Kind and so on. # choco install kind

Minikube fails with “Hyper-V PowerShell Module is not available”

Enable Hyper-V (requires restart):

Enable-WindowsOptionalFeature -Online -FeatureName Microsoft-Hyper-V-Tools-All -All

If the error persists after restart, check if your organization has a specific tool or process for enabling Hyper-V. May require BIOS-level virtualization to be enabled.

Minikube “IP not found” Errors

Update WSL, delete and recreate the minikube cluster:

wsl --update minikube delete minikube start

If using Docker Desktop, ensure it’s running with proper permissions

Security Software Blocking Operations

If you are unable to install Hyper-V, WSL2, Docker or Kubernetes you may need to work with your organizations IT department to request access to services.

Download Failures

Failed downloads, network errors, disconnects and so on for packages or installations can sometimes be due to corporate VPN issues, which may be able to be temporarily disabled.

Avoiding Kubernetes on Windows

If Kubernetes or Docker cannot be run in your environment, the best solution would be to run a remote Kubernetes cluster, on AWS/GCP or similar. Check the Operations Guide on how to do this.

Docker Desktop Issues

If you are using Docker Desktop to launch your kubernetes cluster, you may encounter an issue where pods fail to start. When describing said pods, you may see an error like this.

Warning Failed pod/ark-controller-xxxxxxxxx-xxxx Error: container has runAsNonRoot and image will run as root

This is due to an the Docker Desktop Kubernetes cluster trying to run containers as root, when they shouldn’t be.

To fix this, within your Docker Desktop Kubernetes set up:

  • Stop the Kubernetes cluster
  • Start the Kubernetes cluster again, but change the cluster type from Kubeadm to kind

Webhook Call Fails on Kind Cluster

See issue #393  for tracking this bug.

Error Message

When running Ark locally on a Kind cluster (especially multi-node clusters), model creation or installation fails with a timeout error when attempting to reach the validating webhook. The same setup works without issues on Minikube.

ark-api dev:ark-api logs HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Internal error occurred: failed calling webhook \"vmodel-v1.kb.io\": failed to call webhook: Post \"https://ark-webhook-service.ark-system.svc:443/validate-ark-mckinsey-com-v1alpha1-model?timeout=10s\": context deadline exceeded","reason":"InternalError","details":{"causes":[{"message":"failed calling webhook \"vmodel-v1.kb.io\": failed to call webhook: Post \"https://ark-webhook-service.ark-system.svc:443/validate-ark-mckinsey-com-v1alpha1-model?timeout=10s\": context deadline exceeded"}]},"code":500}

Cause

This failure occurs because the webhook service (ark-webhook-service) cannot be reached within the Kind cluster within the default 10-second timeout. This is more common in multi-node Kind clusters due to networking differences and potential delays in service discovery.

Fix

The issue can be resolved by increasing the webhook timeout and changing the failure policy. Update your Helm values:

Option 1: Increase Timeout (Recommended)

Create or update a values.yaml file with:

webhook: enable: true timeoutSeconds: 30 # Increase from default 10s to 30s failurePolicy: Fail

Then upgrade your installation:

helm upgrade ark-controller ./ark/dist/chart -f values.yaml -n ark-system

Option 2: Use Ignore Policy (Workaround - But its not advised and just for local development. In the codebase we should maintain Fail only)

If increasing the timeout doesn’t work, you can temporarily set the failure policy to Ignore:

webhook: enable: true timeoutSeconds: 30 failurePolicy: Ignore # Allows resources to be created even if webhook fails

Note: Using Ignore means webhook validation will be bypassed, which is not recommended for production but can be used as a temporary workaround for local development.

Access Denied When Installing agents-at-scale

Error Message

Release "agents-at-scale" does not exist. Installing it now. Error: GET "https://ghcr.io/v2/mck-private/qb-fm-labs-legacyx/charts/legacyx/tags/list": response status code 403: denied: permission_denied: read_package

Cause

The issue occurs when ark install attempts to pull the agents-at-scale Helm chart from the GitHub Container Registry (ghcr.io) under the private mck-private organization. If your GitHub token is missing the read:packages scope or is not authorized for SSO access to that organization, the registry request fails with a 403 Permission Denied error.

Workaround

Work in progress to fix the issue. Until it is fixed, you can install agents-at-scale manually by following the setup steps in its repository ‘agent-at-scale-user’

This ensures the required charts are locally available for Ark to reference.

403 Denied When Pulling Helm Charts from GHCR

Error Message

Error: GET "https://ghcr.io/v2/.../tags/list": response status code 403: denied: denied

Cause

Your GitHub Personal Access Token (PAT) has expired.

Fix

Re-authenticate with a valid PAT:

docker login ghcr.io

‘Field is immutable’ errors on install or upgrade

When installing or upgrading Ark, if a previous installation exists that was not installed with the expected installation name, or has only been partly uninstalled then you may encounter errors like:

# Example 1: Error: rendered manifests contain a resource that already exists. Unable to continue with install: Deployment "ark-controller-manager" in namespace "ark-system" exists and cannot be imported into the current release # Example 2: The CustomResourceDefinition "agents.ark.ai" is invalid: spec.versions[0].schema.openAPIV3Schema.properties[spec].properties[model].properties[name]: Invalid value: "string": field is immutable # Example 3: Error: UPGRADE FAILED: cannot patch "ark-controller-manager" with kind Deployment: Deployment.apps "ark-controller-manager" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/name":"ark", "app.kubernetes.io/instance":"ark"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable

The fastest way to deal with this is to re-install on a clean cluster (e.g. for local development by doing minikube delete && minikube start).

To manually clean up, you must find and delete all ark resources:

# Search for any ark installation then uninstall it. helm list --all-namespaces | grep ark # for each namespace/install found: # helm uninstall -n <namespace found> -n <installation_found> # Go through each ark resources (agent, query etc)... kubectl get crds | grep ark | awk '{print $1}' | while read crd; do # Delete all the resources and its definition. kubectl delete $crd --all --all-namespaces --force --grace-period=0 --timeout=10s || true kubectl delete crd $crd --force --grace-period=0 done

Agents

Agents without a model show as ‘Unavailable’

Agents created prior to version v0.1.34 that do not have a model set may show as “Unavailable” or have events such as ModelNotFound.

To resolve, set the agent modelRef to default. See v0.1.34 Upgrade Guide for more details.

Last updated on