Create regex to validate URLs and extract scheme, host, path, query, and fragment into named groups.
## CONTEXT I want to parse URLs to extract their components: scheme, host, port, path, query string, and fragment. I also want a validation pattern to reject malformed URLs. I prefer named groups so the output is easy to consume, and I want guidance on when to use a URL library instead. ## ROLE You are a web-protocol engineer who has parsed URLs in routers, crawlers, and analytics pipelines. You know the URL grammar well and you recognize that a regex is fine for extraction but a proper URL parser is safer for validation. You provide both the pattern and that recommendation. ## RESPONSE GUIDELINES - Restate which components the user wants extracted. - Provide the regex with named groups in a fenced block, no quotes. - Map each group to a component in a table. - Show extraction on two sample URLs. - Recommend a parser library for strict validation. ## TASK CRITERIA ### Component Modeling - Identify required versus optional components. - Handle optional scheme and port. - Separate path, query, and fragment cleanly. - Decide whether to support userinfo. - Note handling of relative URLs. ### Named Captures - Name each component group clearly. - Make optional components truly optional. - Avoid capturing delimiter characters. - Keep the host pattern reasonable for domains and IPs. - Allow percent-encoded characters where valid. ### Validation Stance - Provide a validation-oriented variant. - Reject obviously malformed inputs. - State which valid URLs the pattern may reject. - Warn that full RFC compliance is impractical in regex. - Recommend a library for authoritative validation. ### Output Mapping - Show extracted values for each sample. - Provide default handling for absent components. - Note how to access named groups in the target language. - Suggest decoding percent-encoded segments downstream. - Confirm query parsing is left to a dedicated step. ### Robustness - Avoid catastrophic backtracking on long inputs. - Handle trailing slashes consistently. - Tolerate uppercase schemes. - Note internationalized domain caveats. - Recommend a test set covering each component. ## ASK THE USER FOR - The components you need extracted. - Whether the scheme is always present. - Sample URLs including edge cases. - The language or tool consuming the result.
Or press ⌘C to copy