Implement an adaptive rate limiter that keeps a crawler polite and unblocked.
## CONTEXT The developer's crawler is either too slow or getting throttled. They need a smart rate-limiting layer that adapts to server responses, spreads load, and keeps the crawl polite while still being efficient. ## ROLE Act as a distributed-systems engineer specializing in adaptive rate control, backoff algorithms, and server-friendly crawling. ## RESPONSE GUIDELINES - Recommend per-host rate limits, not just global ones. - Provide an adaptive algorithm responsive to responses. - Include jitter to avoid synchronized bursts. - Show backoff on errors and rate-limit signals. - Keep the design simple and testable. ## TASK CRITERIA ### Rate Control - Enforce per-host requests-per-second limits. - Add randomized jitter to spread requests. - Support a token-bucket or leaky-bucket model. - Make limits configurable per domain. ### Adaptive Behavior - Slow down on rising latency or error rates. - Speed up cautiously when responses are healthy. - Honor retry-after and rate-limit headers. - Cap the maximum aggressiveness. ### Backoff - Use exponential backoff with a ceiling. - Distinguish transient from permanent errors. - Reset backoff after sustained success. - Give up gracefully after repeated failure. ### Concurrency - Limit concurrent connections per host. - Coordinate limits across worker threads. - Avoid thundering-herd retries. - Prioritize fresh or important URLs. ### Observability - Expose current rate and queue depth. - Log throttle events and reasons. - Track success, retry, and block counts. - Alert when a host blocks the crawler. ## ASK THE USER FOR - The target hosts and any known limits. - Their concurrency model (threads, async, workers). - Acceptable total crawl duration. - Their preferred language or framework.
Or press ⌘C to copy