Logging and Events
Ark diagnostics and informational logging is handled in two ways; either via direct logging in the code or raising events. The ‘Events’ page on the dashboard can be used to view and filter events:
And logs can be viewed in a number of ways, for example via k9s
:
This guide covers how to view events and logs and best practices for raising events and log messages.
Viewing Events
You can view events via kubectl
:
# Show events for a specific resource
kubectl describe query my-query
# Show all events in namespace
kubectl get events --sort-by='.lastTimestamp'
# Show events for specific resource type
kubectl get events --field-selector involvedObject.kind=Query
You can also view events in k9s
with the :events
command:
And finally, events can be seen in the Events page in the Ark Dashboard.
Viewing Logs
View the controller logs with:
# Controller logs
kubectl logs -f deployment/ark-controller-manager -n ark-system
Use the --follow
parameter to tail logs. You can also view logs in k9s
by going to the ark-controller
pod in the ark-system
namespace. There are many options to wrap, search, copy, etc. Use k9s
and choose :namespace
then ark-system
and select the ark-controller
pod:
Event and Logging Guidelines
Use events for:
- User-facing information visible in
kubectl describe
- State transitions and lifecycle events
- Successful completion of significant operations
- Error reporting that application owners need to see
As an example, query execution emits these events:
Normal/LLMCallComplete
: LLM call with token usage detailsNormal/AgentExecutionComplete
: Agent execution completion with durationNormal/TargetExecutionComplete
: Target execution completion
Use Logging For:
- Internal debugging and troubleshooting
- Detailed operational information
- Development insights
- Expected failures (log as INFO, not ERROR)
As an example, controllers log operational details:
log.Info("Memory processed successfully", "memory", memory.Name, "address", resolvedAddress)
log.Error(err, "failed to resolve ExecutionEngine address", "executionEngine", executionEngine.Name)
// Expected failures like probe failures are INFO, not ERROR
log.Info("model probe failed", "model", model.Name, "status", errorMessage)
How to log events
Controllers with event recording include an EventRecorder
field. Events must be either Normal
(informational) or Warning
(errors/warnings) as per the Kubernetes Event API documentation :
import (
corev1 "k8s.io/api/core/v1"
)
// Event recording in reconciler - follow {Resource}{Action}{State} convention
r.Recorder.Event(&query, corev1.EventTypeNormal, "QueryLLMCallComplete", eventData)
r.Recorder.Event(&query, corev1.EventTypeWarning, "QueryExecutionFailed", fmt.Sprintf("Error: %v", err))
Stable Messages for Conditions and Events
Event and condition messages must be simple strings, not JSON. Importantly, condition messages must be stable - they should not include request-specific details that change on every attempt. This is because controllers watch for status changes. If a condition message includes changing details (like RequestIDs), it triggers unnecessary reconciliation loops.
For example, in this case a request ID is tracked, the condition message will change and reconcillation will be retriggered (creating a ‘tight-loop’ and lots of noise / performance issues):
r.setCondition(&model, "Available", metav1.ConditionFalse, "ModelProbeFailed",
error) // error includes request id...
Compare to this example:
r.setCondition(&model, "Available", metav1.ConditionFalse, "ModelProbeFailed",
error.Message) // error.Message is stable...
This will show the following line in ‘events’, which is much clearer as it shows ‘x9 over 10’ and keeps the view clean:
Warning ModelProbeFailed 64s (x9 over 10m) model-controller The security token included in the request is invalid. (403)
The full details of the error can be stored in the logs instead.
Log Verbosity Configuration
The ARK controller supports configurable log verbosity levels (0-3, default 0):
- Level 0: Production monitoring - only query/model resolution events
- Level 1: Standard operations - includes all agent/team execution
- Level 2: Detailed debugging - adds LLM call tracking
- Level 3: Full debugging - includes response content
Configure via ZAPLOGLEVEL
environment variable, --zap-log-level
argument, or Helm chart logLevel
value.