How to avoid crawling duplicate URLs at Google scale? Option 1: Use a Set data structure to check if a URL already exists or not. Set is fast, but it is not space-efficient. Option 2: Store URLs in a database and check if a new URL is in the database. This can work but the load to the database will be very high.