I don't quite understand how the jobs are "dispatched" from Redis to the workers. Wouldn't workers need to poll the Redis cache in-order to figure out the available jobs? Can someone please explain? And also it is mentioned that it is debatable whether cron was the right choice for Slack. What other possible options Slack could have considered?
This mentions they don't have to keep pods in sync due to having a single leader, which makes sense from the execution perspective, but how do they ensure a shared consistent schedule across pods? Do they pull the whole schedule from the DB after a pod is elected as leader? I wonder how that would work for millions of scheduled jobs
> While this approach worked fine, there were cases where a script’s execution time exceeded its recurrence intervals, leading to the possibility of two copies running concurrently
> they used flock
How is that? Flock does exactly that - prevents other processes from running if the lock is not released. If the other script is starting it would either wait for the lock or skip running depending on configuration
why are both kafka and redis required here. wouldnt one be enough
I don't quite understand how the jobs are "dispatched" from Redis to the workers. Wouldn't workers need to poll the Redis cache in-order to figure out the available jobs? Can someone please explain? And also it is mentioned that it is debatable whether cron was the right choice for Slack. What other possible options Slack could have considered?
I had the same question in mind, I wonder if they went for pub/sub with Redis (https://redis.io/docs/latest/develop/interact/pubsub/) but yeah a clarification would be nice.
it because historical reason of evolving thir job queue system from Redis to Kafka+Redis
you can check this - https://slack.engineering/scaling-slacks-job-queue/
They might have used something like redisGears to fulfill this
Yes that might be possible. Thanks for the pointer!
Good to know
This mentions they don't have to keep pods in sync due to having a single leader, which makes sense from the execution perspective, but how do they ensure a shared consistent schedule across pods? Do they pull the whole schedule from the DB after a pod is elected as leader? I wonder how that would work for millions of scheduled jobs
> While this approach worked fine, there were cases where a script’s execution time exceeded its recurrence intervals, leading to the possibility of two copies running concurrently
> they used flock
How is that? Flock does exactly that - prevents other processes from running if the lock is not released. If the other script is starting it would either wait for the lock or skip running depending on configuration