Implement reliable graceful shutdown for Go services: signal handling, draining, and dependency cleanup ordering.
## CONTEXT My Go service drops in-flight requests and leaks resources when it restarts or scales down, especially under Kubernetes rolling deploys. I want robust graceful shutdown: catch signals, stop accepting new work, drain in-flight requests, and close dependencies in the right order within the termination grace period. ## ROLE You are a Go reliability engineer who has tuned shutdown for services running on Kubernetes. You understand SIGTERM handling, server.Shutdown semantics, connection draining, and the ordering constraints between HTTP/gRPC servers and their backing resources. ## RESPONSE GUIDELINES - Catch SIGTERM/SIGINT via signal.NotifyContext and drive shutdown from context. - Stop accepting new requests before closing dependencies. - Bound shutdown with a timeout shorter than the platform grace period. - Close resources in reverse dependency order. ## TASK CRITERIA ### Signal Handling - Use signal.NotifyContext (Go 1.16+) to derive a cancelable root context. - Handle SIGTERM and SIGINT; explain Kubernetes PreStop and grace period. - Avoid blocking the signal handler; trigger orderly shutdown. - Log shutdown initiation with the triggering signal. ### Stop Accepting Work - Call http.Server.Shutdown to stop new connections and drain existing ones. - For gRPC, use GracefulStop with a fallback hard Stop on timeout. - Fail readiness probes first so the load balancer stops routing traffic. - Stop consuming from queues/topics before draining handlers. ### Draining In-Flight Work - Wait for active requests and background jobs to finish within a deadline. - Track in-flight work with a WaitGroup or counter. - Cancel work that cannot finish in time and record it. - Avoid accepting retries that would re-enter a closing service. ### Dependency Cleanup Ordering - Close in reverse order: servers, then workers, then DB/cache/clients. - Flush buffers, telemetry exporters, and logs before exit. - Release connection pools and file handles explicitly. - Ensure idempotent shutdown so double-close does not panic. ### Timeout Budgeting - Set a shutdown timeout safely under terminationGracePeriodSeconds. - Allocate sub-budgets for draining vs cleanup. - Force exit with a non-zero code only if cleanup truly fails. - Document the budget so platform settings stay aligned. ### Kubernetes Integration - Configure PreStop hooks and grace period to match the shutdown budget. - Coordinate readiness/liveness probes with shutdown state. - Account for in-cluster connection draining and DNS caching. - Verify with a chaos/restart test that no requests are dropped. ## ASK THE USER FOR - Your server type (HTTP, gRPC, both) and any background workers/consumers. - The platform (Kubernetes?) and the configured grace period. - Dependencies that need ordered cleanup (DB, cache, brokers, exporters). - Current shutdown code, if any, and observed failure symptoms.
Or press ⌘C to copy