Design a reliable outbound webhook system with signing, retries, idempotency, and consumer-friendly delivery.
## CONTEXT I am building an outbound webhook system so my platform can notify customer endpoints of events in 2026. Webhooks look trivial until production reality hits: consumer endpoints are slow or flaky, deliveries get duplicated, payloads need cryptographic signing, retries cause thundering herds, and consumers cannot tell which delivery is the real one. A poorly designed webhook system erodes partner trust faster than almost any other integration surface. I want a delivery architecture that is reliable, secure, debuggable, and pleasant for consumers to integrate against, including a clear contract for what they must implement on their side. ## ROLE Act as a distributed-systems engineer who has operated a high-volume webhook platform and handled the incidents that come with it: duplicate deliveries, signature mismatches, retry storms, and silent consumer outages. You design for the failure cases first. ## RESPONSE GUIDELINES - Open with a high-level delivery flow described as a sequence of steps from event to acknowledgment. - Provide a concrete signed-payload example including headers and the signature scheme. - Specify exact retry timing, backoff, and give-up policies with numbers, not vague terms. - Separate the producer-side responsibilities from the consumer-side contract clearly. - Note observability and debugging affordances throughout, not as an afterthought. ## TASK CRITERIA ### 1. Event & Payload Design - Define a stable event envelope (id, type, created time, data, api version). - Decide between thin (id-only) and fat (full-object) payloads with tradeoffs. - Establish event-type naming conventions and a versioning strategy for payloads. - Specify how sensitive data is handled or excluded from payloads. ### 2. Security & Verification - Design an HMAC signature scheme and document the exact signing string. - Include a timestamp and replay-protection window to prevent replay attacks. - Recommend secret rotation supporting overlapping keys without downtime. - Provide the verification pseudocode a consumer should implement. ### 3. Delivery Reliability & Retries - Define delivery semantics (at-least-once) and the resulting idempotency contract. - Specify retry schedule, backoff with jitter, max attempts, and dead-letter handling. - Address ordering guarantees (or explicit lack thereof) and how consumers cope. - Design circuit-breaking or auto-disable for chronically failing endpoints. ### 4. Consumer Experience & Contract - Document exactly what a consumer endpoint must return and how fast. - Provide an idempotency-key strategy so consumers can dedupe safely. - Recommend a manual replay/redelivery feature and event log for consumers. - Specify how consumers register, validate, and test endpoints. ### 5. Operations & Observability - Define delivery metrics, per-endpoint health, and alerting thresholds. - Recommend queue and worker architecture to isolate slow consumers. - Describe a delivery-attempt audit log usable for support investigations. - List the top failure scenarios and the runbook response for each. ## ASK THE USER FOR - The event volume, peak burst rate, and number of distinct consumer endpoints. - Whether consumers are external partners, internal services, or both. - Your existing queue/infrastructure stack and any compliance constraints. - Ordering and latency expectations for the most important event types.
Or press ⌘C to copy