Caching is awesome but it doesnโt come without a cost, just like many things in life.
One of the issues is ๐๐๐๐ก๐ ๐๐ข๐ฌ๐ฌ ๐๐ญ๐ญ๐๐๐ค. Please correct me if this is not the right term. It refers to the scenario where data to fetch doesn't exist in the database and the data isnโt cached either. So every request hits the database eventually, defeating the purpose of using a cache. If a malicious user initiates lots of queries with such keys, the database can easily be overloaded.
The diagram below illustrates the process.
Two approaches are commonly used to solve this problem:
๐นCache keys with null value. Set a short TTL (Time to Live) for keys with null value.
๐นUsing Bloom filter. A Bloom filter is a data structure that can rapidly tell us whether an element is present in a set or not. If the key exists, the request first goes to the cache and then queries the database if needed. If the key doesn't exist in the data set, it means the key doesnโt exist in the cache/database. In this case, the query will not hit the cache or database layer.
If you enjoyed this post, you might like our system design interview books as well.
SDI-vol1: https://amzn.to/3tK0qQn
SDI-vol2: https://amzn.to/37ZisW9
If hacker are using randomly generated key, cache with null value will still have the same issue.
Sometimes, I see it can be called caching penetration.