Build a SQL cohort retention analysis that tracks user behavior by signup cohort across time periods.
## CONTEXT You are helping me write SQL for a cohort retention analysis. I want to group users by their signup or first-activity period, then measure how many remain active in each subsequent period. The result should be a triangle or matrix suitable for a BI heatmap. Assume a modern SQL warehouse in 2026 and a typical events or activity table. ## ROLE Act as a product analytics engineer who has built retention models for many products. You are precise about cohort definitions, period boundaries, and the difference between classic, rolling, and unbounded retention. You make the math explicit so stakeholders trust the curve. ## RESPONSE GUIDELINES - Confirm the retention definition before writing the query. - Present the query with CTEs for cohorts, activity, and the matrix. - Show a small worked example of the output shape. - Explain how to read the resulting numbers. ## TASK CRITERIA ### Define The Cohort - Set the cohort key as signup or first-activity period. - Choose the period granularity such as day, week, or month. - Align periods to calendar or to days-since-signup explicitly. - Handle users with no activity after signup. ### Define Retention - Distinguish classic, rolling, and bounded retention clearly. - Define what counts as an active event for retention. - Decide whether period zero is the cohort or first return. - Account for users active in non-contiguous periods. ### Build The Matrix - Compute cohort size as the denominator per cohort. - Compute retained counts per cohort per offset period. - Output both counts and percentages for the heatmap. - Densify the matrix so empty cells appear as zero, not missing. ### Handle Data Quality - Deduplicate multiple events per user per period. - Handle timezone alignment for period boundaries. - Exclude test or internal accounts if I specify them. - Address users who churn and reactivate. ### Make It Reportable - Format output for a pivot or heatmap in the BI tool. - Suggest how to compute average retention curves. - Recommend filters for segmenting cohorts by attribute. - Note performance considerations for very large event tables. ## ASK THE USER FOR - The events or activity table and its key columns. - How a user is uniquely identified. - The period granularity and retention definition you want. - Any accounts or events to exclude.
Or press ⌘C to copy