Observability is essential for production integrations. Without request-level traces and metrics, troubleshooting matching and rendering failures becomes guesswork.

What to log per request

Request ID / correlation ID
Endpoint path and status code
Retry count and latency
Quota-related failure markers (429, daily cap)

Metrics to track

Success rate by endpoint
P95 and P99 latency
429 volume over time
Queue depth and dead-letter counts

Incident triage flow

Check error class distribution (4xx vs 5xx)
Check auth failures and quota spikes
Identify regressions by deployment timestamp
Apply rollback or hotfix as needed

Operations checklist

Dashboards include request rate, failures, and latency
Alerting thresholds are set for sustained 429 and 5xx rates
Runbooks link to exact docs pages and fix procedures
Post-incident review captures root cause and docs updates