---
title: "Observability"
description: "Execution-layer observability for agentsh — session reports, database proxy events, OpenTelemetry export, Watchtower transport, and audit log integrity."
doc_version: "1.0"
last_updated: "2026-05-29"
canonical: "https://www.agentsh.org/docs/observability/"
---

# Observability

Execution-layer observability — monitor, audit, and trace everything your agents do with session reports, OpenTelemetry export, and tamper-proof audit logs.

## Session Reports

Generate markdown reports summarizing session activity for auditing, debugging, and compliance.

```bash
# Quick summary of latest session
agentsh report latest --level=summary

# Detailed investigation with full timeline
agentsh report <session-id> --level=detailed --output=report.md

# Offline mode (no server required)
agentsh report latest --level=summary --direct-db
```

### Report Levels

| Level | Contents |
| --- | --- |
| `summary` | Overview, activity counts, security findings, decision summary |
| `detailed` | Everything in summary plus command history, file access, network connections, resource usage, and full event timeline |

### Example Summary Report

```text
# Session Report: sess-abc123

**Generated:** 2025-01-15T10:31:00Z
**Report Level:** Summary

## Session Overview

| Property | Value |
|----------|-------|
| Session ID | sess-abc123 |
| Duration | 25s |
| Workspace | /home/user/project |

## Activity Summary

| Metric | Count |
|--------|-------|
| Commands Executed | 6 |
| Files Accessed | 1 |
| Network Connections | 2 |
| Policy Denials | 2 |

## Security Findings

### Critical
- **Dangerous command blocked**: `rm -rf /` - rm -rf blocked for safety

### Warning
- **Network access denied**: Connection to `internal.corp.local:80` blocked

## Policy Decisions

| Decision | Count |
|----------|-------|
| Allow | 5 |
| Deny | 2 |
| Redirect | 0 |
```

### Example Detailed Report (excerpt)

```bash
## Command History

| Time | Command | Decision | Exit Code | Duration |
|------|---------|----------|-----------|----------|
| 10:30:01 | `ls -la` | allow | 0 | 126ms |
| 10:30:05 | `git status` | allow | 0 | 149ms |
| 10:30:15 | `rm -rf /` | **deny** | - | - |
| 10:30:20 | `curl https://api.github.com` | allow | 0 | 499ms |

## Network Connections

| Time | Domain | Port | Decision | Rule |
|------|--------|------|----------|------|
| 10:30:20 | api.github.com | 443 | allow | github.com allowed |
| 10:30:25 | internal.corp.local | 80 | **deny** | internal networks blocked |

## Event Timeline

```
10:30:00.000 [session_created] Session started in /home/user/project
10:30:01.123 [command_policy] ls -la → allow
10:30:15.000 [command_policy] rm -rf / → DENY (rm -rf blocked)
10:30:20.010 [net_connect] api.github.com:443 → allow
10:30:25.010 [net_connect] internal.corp.local:80 → DENY
```
```

## OpenTelemetry Export

agentsh exports audit events as **OpenTelemetry log records** via OTLP to any OTEL-compatible collector (Grafana Alloy, Datadog Agent, Honeycomb, etc.).

```yaml
audit:
  otel:
    enabled: true
    endpoint: otel-collector.internal:4317
    protocol: grpc   # grpc or http
```

Events are converted to OTEL `LogRecord`s, batched, and exported asynchronously. If the collector is unreachable, the SDK retries with exponential backoff and silently drops events after exhausting retries. The primary SQLite store always has the authoritative copy. Export failures never block the caller, and if the OTEL store fails to initialize at startup, the server logs an error and continues without it.

        Environment variable overrides

`AGENTSH_OTEL_ENDPOINT` and `AGENTSH_OTEL_PROTOCOL` override the config file. The standard `OTEL_EXPORTER_OTLP_ENDPOINT` is also respected as a fallback when `AGENTSH_OTEL_ENDPOINT` is not set.

        Plaintext warning

When `tls.enabled` is `false`, agentsh logs a warning at startup: *OTEL export is configured without TLS; event data will be sent in plaintext*. Enable TLS for production deployments.

### Configuration

```yaml
audit:
  otel:
    enabled: false
    endpoint: localhost:4317       # collector host:port
    protocol: grpc                 # grpc or http

    tls:
      enabled: false
      cert_file: ""               # client certificate
      key_file: ""                # client key
      insecure: false             # skip server cert verification (dev only)

    headers:                        # custom headers (e.g. auth tokens)
      Authorization: "Bearer ${OTEL_TOKEN}"

    timeout: 10s                   # export timeout per batch

    signals:
      logs: true                   # export as OTEL log records
      spans: true                  # accepted but not yet implemented

    batch:
      max_size: 512                # records per batch
      timeout: 5s                 # auto-flush interval

    filter:
      include_types: []             # glob patterns: ["file_*", "net_*"]
      exclude_types: []             # glob patterns: ["file_stat"]
      include_categories: []        # exact: ["file", "network"]
      exclude_categories: []
      min_risk_level: ""            # low, medium, high, or critical

    resource:
      service_name: agentsh         # OTEL resource service.name
      extra_attributes: {}          # additional resource key-values
```

| Field | Default | Description |
| --- | --- | --- |
| `enabled` | `false` | Enable OTEL event export. When `false` the entire OTEL pipeline is skipped. |
| `endpoint` | `localhost:4317` | Collector address (`host:port`). Required when enabled—validation fails without it. |
| `protocol` | `grpc` | `grpc` or `http` (OTLP). Any other value is rejected at startup. |
| `tls.enabled` | `false` | Enable TLS for the exporter connection. When `false`, a plaintext warning is logged. |
| `tls.cert_file` | *none* | Path to client certificate for mutual TLS. Only used when `tls.enabled` is `true`. |
| `tls.key_file` | *none* | Path to client key for mutual TLS. Must be set together with `cert_file`. |
| `tls.insecure` | `false` | Skip server certificate verification. Development only—do not use in production. |
| `headers` | *none* | Custom HTTP headers sent with every export request (e.g. `Authorization` tokens). |
| `timeout` | `10s` | Export timeout per batch. Must be a valid Go duration string. |
| `signals.logs` | `true`* | Export events as OTEL log records. |
| `signals.spans` | `true`* | Accepted in config but span export is not yet implemented; has no effect. |
| `batch.max_size` | `512` | Maximum records per export batch. When reached the batch is flushed immediately. |
| `batch.timeout` | `5s` | Auto-flush interval. Pending records are exported even if the batch is not full. |
| `resource.service_name` | `agentsh` | OTEL `service.name` resource attribute. |
| `resource.extra_attributes` | *none* | Additional key-value pairs added to the OTEL resource (e.g. `deployment.environment: prod`). |

* `signals` default: when **neither** `logs` nor `spans` is explicitly set, both default to `true`. If you explicitly set one (e.g. `logs: true`), the other stays `false`.

        TLS in production

When `tls.enabled` is `true` and `tls.insecure` is `false` (the default), the OS certificate store is used for server verification. Supply `cert_file` and `key_file` for mutual TLS.

### Event Filtering

Filters reduce export volume by selecting which events reach the collector. When all filter fields are empty (the default), every event is exported.

| Filter | Type | When empty | Semantics |
| --- | --- | --- | --- |
| `include_types` | glob list | All types pass | Event type must match at least one pattern |
| `exclude_types` | glob list | Nothing excluded | Events matching any pattern are dropped |
| `include_categories` | exact list | All categories pass | Event category must be in the list |
| `exclude_categories` | exact list | Nothing excluded | Events in matching categories are dropped |
| `min_risk_level` | string | No risk filtering | Only export events at or above this level (`low` < `medium` < `high` < `critical`). Events that do not carry a risk level always pass this filter. |

Evaluation order: include types → include categories → exclude types → exclude categories → min risk level. Glob patterns support `*` and `?` wildcards. Valid values for `min_risk_level` are `low`, `medium`, `high`, and `critical`—any other value is rejected at startup.

```text
# Only high-risk file and network events
filter:
  include_categories: [file, network]
  min_risk_level: high

# Everything except noisy stat/list operations
filter:
  exclude_types: [file_stat, dir_list]
```

### Watchtower Transport

The Watchtower transport sends compacted audit events to a Watchtower endpoint and can receive signed policy pushes for live enforcement updates. v0.20.1 introduced an explicit `audit.watchtower.agent_id` field for the operator-visible identity on the wire.

```yaml
audit:
  watchtower:
    enabled: true
    endpoint: "watchtower.internal:443"
    agent_id: "agent-edge-001"
    session_id: ""   # optional; generated when empty
```

When `agent_id` is unset or whitespace, agentsh falls back to `<hostname>-<pid>`. If hostname lookup fails, the fallback uses `unknown-<pid>`.

Policy pushes are installed only when `policies.dir` and `policies.signing.trust_store` are configured. agentsh verifies the pushed Ed25519 signature against the local trust store, checks the wire content hash, writes the policy YAML and companion `.sig` atomically, reloads the policy manager, and swaps the active engine when available. Without those local policy-signing settings, the daemon logs receipt of the pushed policy but leaves enforcement unchanged.

### Event Reference

Every operation intercepted by agentsh produces a typed event. Events flow to all configured stores (SQLite, JSONL, webhooks, OTEL). Use the type and category names below with `include_types`, `exclude_types`, `include_categories`, and `exclude_categories` filters.

#### File (category: `file`)

| Event Type | Description |
| --- | --- |
| `file_open` | File opened for reading or writing |
| `file_read` | File contents read |
| `file_write` | File contents written |
| `file_create` | New file created |
| `file_delete` | File deleted |
| `file_rename` | File renamed or moved |
| `file_stat` | File metadata queried |
| `file_chmod` | File permissions changed |
| `dir_create` | Directory created |
| `dir_delete` | Directory deleted |
| `dir_list` | Directory contents listed |

#### Network (category: `network`)

| Event Type | Description |
| --- | --- |
| `dns_query` | DNS resolution attempt |
| `net_connect` | Outbound TCP/network connection |
| `net_listen` | Socket bound for listening |
| `net_accept` | Incoming connection accepted |
| `dns_redirect` | DNS resolution redirected to different address |
| `connect_redirect` | Network connection redirected to different destination |
| `connect_redirect_fallback` | Redirect target unreachable, fell back to original destination |

#### Database (category: `database`)

| Event Type | Description |
| --- | --- |
| `db_statement` | Postgres statement or governed CancelRequest evaluated by the DB proxy. Fields include `db_service`, `effects`, `statement_digest`, `decision`, `result`, `tx_context`, and redirect metadata. |
| `db_listener_auth_fail` | Connection to the per-session DB proxy listener failed SessionID peer authentication. |
| `db_handshake_fail` | Postgres startup, TLS, or auth forwarding failed closed. |
| `degraded_visibility_warning` | Policy allowed a DB mode where statement visibility is degraded, such as replication or GSS encryption passthrough. |
| `db_cancel_unmatched` | CancelRequest did not match a known proxy-owned BackendKeyData mapping. |
| `db_cancel_after_disconnect` | CancelRequest arrived after the original backend connection was gone. |
| `db_cancel_forward_failed` | Proxy failed while forwarding an allowed cancel request upstream. |
| `db_cancel_mapping_fail` | Proxy could not create or commit the synthetic cancel-key mapping. |
| `db_bypass_attempt` | Generated DB unavoidability rule denied direct DB access. Fields include `db_service`, `bypass_mode`, `destination`, `process_identity`, and `suppressed_count`. |

#### Process (category: `process`)

| Event Type | Description |
| --- | --- |
| `process_start` | Process started |
| `process_spawn` | Child process created |
| `process_exit` | Process exited |
| `process_tree_kill` | Entire process tree terminated |

#### Environment (category: `environment`)

| Event Type | Description |
| --- | --- |
| `env_read` | Environment variable read |
| `env_write` | Environment variable set or modified |
| `env_list` | Environment variables enumerated |
| `env_blocked` | Environment variable access blocked by policy |

#### Trash (category: `trash`)

| Event Type | Description |
| --- | --- |
| `soft_delete` | File diverted to trash instead of deleted |
| `trash_restore` | File restored from trash |
| `trash_purge` | Trash entries permanently purged |

#### Shell (category: `shell`)

| Event Type | Description |
| --- | --- |
| `shell_invoke` | Shell shim intercepted a shell invocation |
| `shell_passthrough` | Shell shim bypassed (not in agentsh mode) |
| `session_autostart` | Server auto-started by shim on first invocation |

#### Command (category: `command`)

| Event Type | Description |
| --- | --- |
| `command_intercept` | Command evaluated by the policy engine |
| `command_redirect` | Command redirected to a different binary |
| `command_blocked` | Command denied by policy |
| `path_redirect` | File path redirected to a different location |

#### Resource (category: `resource`)

| Event Type | Description |
| --- | --- |
| `resource_limit_set` | Resource limits applied to process or session |
| `resource_limit_warning` | Resource usage approaching configured threshold |
| `resource_limit_exceeded` | Resource limit exceeded |
| `resource_usage_snapshot` | Periodic resource usage snapshot |

#### Cgroup (category: `cgroup`)

| Event Type | Description |
| --- | --- |
| `cgroup_mode` | Cgroup v2 mode probed at startup. Fields include `mode`, `reason`, `own_cgroup`, `slice_dir`, `io_available`, and `leaf_moved`. |
| `cgroup_orphans_reaped` | Orphaned cgroup members were cleaned up during session or slice maintenance. |
| `cgroup_unavailable_refusal` | Resource-limit or network enforcement refused to continue because required cgroup support was unavailable. |

#### IPC (category: `ipc`)

| Event Type | Description |
| --- | --- |
| `unix_socket_connect` | Unix domain socket connection |
| `unix_socket_bind` | Unix domain socket bound |
| `unix_socket_blocked` | Unix socket operation blocked by policy |
| `named_pipe_open` | Windows named pipe opened |
| `named_pipe_blocked` | Windows named pipe blocked by policy |
| `ipc_observed` | IPC activity detected (audit only, no enforcement) |

#### Seccomp (category: `seccomp`)

| Event Type | Description |
| --- | --- |
| `seccomp_blocked` | Process killed by seccomp for a blocked syscall |
| `notify_handler_panic` | Seccomp notify handler recovered from a panic (includes stack trace) |
| `seccomp_file_denied` | File operation denied by seccomp-notify file enforcement |
| `seccomp_file_emulated` | File open emulated via AddFD (supervisor opened file and injected fd) |
| `seccomp_io_uring_blocked` | io_uring syscall blocked to prevent seccomp bypass |
| `seccomp_socket_family_blocked` | `socket(2)` / `socketpair(2)` denied for a blocked `AF_*` family (`engine` field reports `seccomp` or `ptrace`) |

#### Signal (category: `signal`)

| Event Type | Description |
| --- | --- |
| `signal_sent` | Signal delivered to a process |
| `signal_blocked` | Signal blocked by policy |
| `signal_redirected` | Signal redirected to a different target or signal number |
| `signal_absorbed` | Signal absorbed (not delivered) |
| `signal_approved` | Signal approved after pending human approval |
| `signal_would_deny` | Signal would be denied (audit mode, not enforced) |

#### MCP (category: `mcp`)

| Event Type | Description |
| --- | --- |
| `mcp_tool_seen` | MCP tool detected and registered |
| `mcp_tool_changed` | MCP tool definition changed (rug-pull detection) |
| `mcp_tool_called` | MCP tool call observed in agent request |
| `mcp_detection` | MCP security pattern detected |
| `mcp_tool_call_intercepted` | MCP tool call evaluated by proxy (allow or block) |
| `mcp_cross_server_blocked` | Cross-server attack rule triggered (shadow, burst, read-then-send, flow) |
| `mcp_network_connection` | Network connection to a known MCP server address |
| `mcp_server_name_similarity` | MCP server name suspiciously similar to a known server (typosquat detection) |

#### Policy (category: `policy`)

| Event Type | Description |
| --- | --- |
| `policy_loaded` | Policy loaded (at startup, reload, or via API) |
| `policy_changed` | Active policy replaced with a new version |

#### Package (category: `package`)

| Event Type | Description |
| --- | --- |
| `package_check_started` | Package install security check initiated for a command |
| `package_check_completed` | Package check finished with an overall verdict |
| `package_blocked` | Package install blocked by policy (critical vulnerability, malware, etc.) |
| `package_approved` | Package install approved after human approval or policy allow |
| `package_warning` | Package check produced warnings but install was permitted |
| `package_provider_error` | A check provider failed (timeout, API error, rate-limited) |

#### Ptrace (category: `ptrace`)

| Event Type | Description |
| --- | --- |
| `ptrace_attached` | Ptrace tracer attached to a process |
| `ptrace_detached` | Ptrace tracer detached from a process |
| `ptrace_syscall_blocked` | Syscall blocked by policy via ptrace interception |
| `ptrace_syscall_redirected` | Syscall arguments modified for redirect/steering |
| `ptrace_dns_redirected` | DNS resolution redirected via ptrace DNS proxy |
| `ptrace_connect_redirected` | Outbound connection steered to alternative endpoint |
| `ptrace_tls_sni_rewritten` | TLS Server Name Indication rewritten on outbound connection |
| `ptrace_signal_blocked` | Signal syscall blocked by policy via ptrace interception |
| `ptrace_signal_redirected` | Signal number rewritten for kill/tgkill/tkill |
| `ptrace_prefilter_injected` | seccomp BPF pre-filter injected into tracee |
| `ptrace_tracee_timeout` | Held tracee released after timeout |
| `ptrace_exit_verify_denied` | Exit-time path verification denied an openat (symlink bypass detected) |
| `ptrace_soft_delete` | File soft-deleted via ptrace syscall injection (unlinkat → renameat2) |
| `ptrace_vfork_fastpath` | Syscall fast-pathed in vfork child (skipped policy evaluation) |

### Attributes

Each log record carries attributes following OTEL semantic conventions where applicable, plus agentsh-specific fields under the `canyonroad.*` namespace.

#### Semantic Conventions

| Attribute | Source |
| --- | --- |
| `process.pid` | Process ID |
| `process.parent_pid` | Parent process ID |
| `process.executable.path` | Binary path |

#### agentsh Namespace

| Attribute | Description |
| --- | --- |
| `canyonroad.product` | Product identifier (always `"agentsh"`) |
| `canyonroad.event.id` | Unique event identifier |
| `canyonroad.event.type` | Event type (always present) |
| `canyonroad.session.id` | Session identifier |
| `canyonroad.command.id` | Command identifier |
| `canyonroad.source` | Event source |
| `canyonroad.path` | File path |
| `canyonroad.domain` | Network domain |
| `canyonroad.remote` | Remote address |
| `canyonroad.operation` | Operation name |
| `canyonroad.effective_action` | Final action taken |
| `canyonroad.decision` | Policy decision |
| `canyonroad.policy.rule` | Matching policy rule |

#### Well-Known Fields

These are extracted from the event's `fields` map when present. Only non-empty values are included.

| Attribute | Type | Description |
| --- | --- | --- |
| `canyonroad.risk_level` | string | Risk level (low/medium/high/critical) |
| `canyonroad.agent_id` | string | Agent identifier |
| `canyonroad.agent_type` | string | Agent type |
| `canyonroad.agent_framework` | string | Agent framework name |
| `canyonroad.tenant_id` | string | Tenant identifier |
| `canyonroad.workspace_id` | string | Workspace identifier |
| `canyonroad.policy_name` | string | Name of the matching policy |
| `canyonroad.latency_us` | int | Total latency in microseconds |
| `canyonroad.queue_time_us` | int | Queue wait time in microseconds |
| `canyonroad.policy_eval_us` | int | Policy evaluation time in microseconds |
| `canyonroad.intercept_us` | int | Intercept processing time in microseconds |
| `canyonroad.backend_us` | int | Backend processing time in microseconds |
| `canyonroad.error` | string | Error message |
| `canyonroad.error_code` | string | Error code |
| `canyonroad.ptrace.syscall` | string | Syscall name (ptrace events only) |
| `canyonroad.ptrace.tracee_pid` | int | Traced process PID |
| `canyonroad.ptrace.redirect_target` | string | Redirect destination (DNS/connect redirects) |
| `canyonroad.ptrace.signal` | int | Signal number (signal events only) |
| `canyonroad.ptrace.attach_mode` | string | Ptrace attach mode (children or pid) |

#### Severity Mapping

The policy decision determines the log record severity:

| Decision | Severity |
| --- | --- |
| `allow`, `audit` | INFO |
| `redirect`, `approve`, `soft_delete` | WARN |
| `deny` | ERROR |

#### Trace Correlation

If an event's `fields` map contains `trace_id` (32-hex-char) and/or `span_id` (16-hex-char), they are attached to the log record for correlation with distributed traces.

#### Log Record Body

Each record's body is a human-readable summary:

```text
file_write: /workspace/test.go [allow]
net_connect: 1.2.3.4:443 [deny]
dns_query: example.com [redirect]
process_start
```

### Example OTLP Log Records

Below is how two events appear as OTLP JSON log records after conversion. This is the payload sent to the collector.

```text
{
  "resourceLogs": [{
    "resource": {
      "attributes": [
        { "key": "service.name", "value": { "stringValue": "agentsh" } }
      ]
    },
    "scopeLogs": [{
      "scope": { "name": "agentsh" },
      "logRecords": [
        {
          "timeUnixNano": "1708200015000000000",
          "severityNumber": 9,
          "severityText": "INFO",
          "body": { "stringValue": "file_write: /workspace/main.go [allow]" },
          "attributes": [
            { "key": "process.pid",              "value": { "intValue": "48201" } },
            { "key": "process.executable.path",  "value": { "stringValue": "/usr/bin/node" } },
            { "key": "canyonroad.product",           "value": { "stringValue": "agentsh" } },
            { "key": "canyonroad.event.type",       "value": { "stringValue": "file_write" } },
            { "key": "canyonroad.event.id",         "value": { "stringValue": "evt-9f3a2b" } },
            { "key": "canyonroad.session.id",       "value": { "stringValue": "sess-abc123" } },
            { "key": "canyonroad.path",             "value": { "stringValue": "/workspace/main.go" } },
            { "key": "canyonroad.operation",        "value": { "stringValue": "write" } },
            { "key": "canyonroad.decision",         "value": { "stringValue": "allow" } },
            { "key": "canyonroad.policy.rule",      "value": { "stringValue": "workspace-write" } },
            { "key": "canyonroad.risk_level",       "value": { "stringValue": "low" } },
            { "key": "canyonroad.agent_id",         "value": { "stringValue": "claude-code-1" } },
            { "key": "canyonroad.latency_us",       "value": { "intValue": "340" } }
          ],
          "traceId": "",
          "spanId": ""
        },
        {
          "timeUnixNano": "1708200020000000000",
          "severityNumber": 17,
          "severityText": "ERROR",
          "body": { "stringValue": "net_connect: 10.0.0.5:6379 [deny]" },
          "attributes": [
            { "key": "process.pid",              "value": { "intValue": "48201" } },
            { "key": "canyonroad.product",           "value": { "stringValue": "agentsh" } },
            { "key": "canyonroad.event.type",       "value": { "stringValue": "net_connect" } },
            { "key": "canyonroad.event.id",         "value": { "stringValue": "evt-c71d04" } },
            { "key": "canyonroad.session.id",       "value": { "stringValue": "sess-abc123" } },
            { "key": "canyonroad.remote",           "value": { "stringValue": "10.0.0.5:6379" } },
            { "key": "canyonroad.decision",         "value": { "stringValue": "deny" } },
            { "key": "canyonroad.policy.rule",      "value": { "stringValue": "no-internal-network" } },
            { "key": "canyonroad.risk_level",       "value": { "stringValue": "high" } },
            { "key": "canyonroad.agent_id",         "value": { "stringValue": "claude-code-1" } }
          ],
          "traceId": "4bf92f3577b34da6a3ce929d0e0e4736",
          "spanId": "00f067aa0ba902b7"
        }
      ]
    }]
  }]
}
```

The first record shows an allowed file write at `INFO` severity. The second shows a denied network connection at `ERROR` severity with trace correlation IDs set—these came from `trace_id` and `span_id` in the event's fields map.

### W3C Distributed Tracing

agentsh supports **W3C Trace Context** propagation, allowing you to correlate agentsh events with traces from external observability systems. When a trace context is set on a session, every subsequent event carries `trace_id`, `span_id`, and `trace_flags` that integrate with your existing OpenTelemetry pipelines.

#### Setting Trace Context via the REST API

External orchestrators or CI systems can inject a trace context into a running session:

```bash
# Set trace context for a session
curl -X PUT http://localhost:18080/api/v1/sessions/$SID/trace-context \
  -H "Content-Type: application/json" \
  -d '{
    "trace_id": "0af7651916cd43dd8448eb211c80319c",
    "span_id": "b7ad6b7169203331",
    "trace_flags": "01"
  }'
```

| Field | Format | Required | Description |
| --- | --- | --- | --- |
| `trace_id` | 32 hex chars | Yes | W3C trace identifier (must not be all zeros) |
| `span_id` | 16 hex chars | No | Parent span identifier (must not be all zeros) |
| `trace_flags` | 2 hex chars | No | Sampling flag: `01` = sampled, `00` = unsampled |

#### Propagation via gRPC and Environment

Trace context also propagates through two additional paths:

- **gRPC metadata:** The `traceparent` header is extracted from gRPC calls using the standard W3C format: `00-{trace_id}-{span_id}-{trace_flags}`

- **Environment variable:** When `TRACEPARENT` is set, the agentsh client automatically propagates it to gRPC calls

#### Event Injection

Once trace context is set, all events emitted during command execution—file I/O, network connections, policy decisions—include the trace fields:

```text
{
  "type": "file_write",
  "session_id": "sess-abc123",
  "trace_id": "0af7651916cd43dd8448eb211c80319c",
  "span_id": "b7ad6b7169203331",
  "trace_flags": "01",
  "path": "/workspace/main.go"
}
```

When exported via OTEL, the `trace_id` and `span_id` are set on the log record's span context, enabling direct correlation in tools like Grafana Tempo, Jaeger, or Honeycomb. Upstream sampling decisions are respected—an unsampled trace flag (`00`) is propagated as-is rather than being forced to sampled.

## Audit Log Integrity

agentsh audit logs are chained with HMAC signatures for tamper detection. Each log entry contains an `integrity` field that links it to the previous entry, forming a cryptographic chain that detects any insertion, deletion, or modification of records.

#### Chain structure

Every audit log entry includes an `integrity` object:

```text
{
  "integrity": {
    "sequence": 42,
    "prev_hash": "a1b2c3d4...",
    "entry_hash": "e5f6a7b8..."
  }
}
```

- **sequence** — monotonically increasing counter, detects deletion

- **prev_hash** — the `entry_hash` of the previous entry, links the chain

- **entry_hash** — `HMAC(key, sequence | prev_hash | canonical_JSON_payload)`

The same input always produces the same hash. A single modified, inserted, or deleted entry breaks the chain from that point forward.

#### Configuration

```yaml
audit:
  integrity:
    enabled: true
    algorithm: "hmac-sha256"     # or "hmac-sha512"
    key_source: "file"           # file, env, aws_kms, azure_keyvault, hashicorp_vault, gcp_kms
    key_file: "/etc/agentsh/hmac.key"
```

For enterprise key management, use an external KMS provider. See [Setup → Audit Log Integrity](https://www.agentsh.org/docs/setup/#audit-integrity-config) for the full configuration reference including AWS KMS, Azure Key Vault, HashiCorp Vault, and GCP Cloud KMS.

#### Encryption at rest

Audit data can also be encrypted at rest using AES-256-GCM, independently of the integrity chain:

```yaml
audit:
  encryption:
    enabled: true
    key_source: "file"
    key_file: "/etc/agentsh/encrypt.key"
```

#### Verification

Verify the integrity chain of any audit log file offline:

```bash
# Verify audit log integrity with a key file
agentsh audit verify /var/log/agentsh/audit.log --key-file /etc/agentsh/hmac.key

# Verify using an environment variable for the key
agentsh audit verify /var/log/agentsh/audit.log --key-env AGENTSH_HMAC_KEY

# Use SHA-512 algorithm
agentsh audit verify /var/log/agentsh/audit.log --key-file hmac.key --algorithm hmac-sha512
```

The verify command reads JSONL audit log entries line by line, checks `prev_hash` chain continuity, recomputes each HMAC to verify `entry_hash`, and reports any chain breaks with line number and reason.

        Compliance evidence

The HMAC integrity chain provides the tamper-evident audit trail required by SOC 2, NIST AI RMF, and ISO 27001. Combined with encryption at rest and external KMS integration, audit data meets enterprise security and compliance requirements.

## Sitemap

- [Canonical HTML](https://www.agentsh.org/docs/observability/)
- [Site map](https://www.agentsh.org/sitemap.md)
- [Full documentation](https://www.agentsh.org/llms-full.md)
