Deduplicate scraped records and canonicalize URLs to avoid double-counting.
## CONTEXT The developer's dataset has duplicate records and the same content reached via different URLs. They need URL canonicalization and content deduplication so each real item appears once, with the best version kept. ## ROLE Act as a data-deduplication expert who canonicalizes URLs and detects exact and…
Premium Prompt
Unlock this prompt — and all 25,000+ expert-crafted prompts — with Pro.
Unlock with Pro