How Slack Built a Distributed Cron Execution…

ByteByteGo

May 21, 2024

167

👋Goodbye low test coverage and slow QA cycles (Sponsored)

Read →

9 Comments

Vishal Shah

May 22, 2024

why are both kafka and redis required here. wouldnt one be enough

Expand full comment

Inder

May 21, 2024

I don't quite understand how the jobs are "dispatched" from Redis to the workers. Wouldn't workers need to poll the Redis cache in-order to figure out the available jobs? Can someone please explain? And also it is mentioned that it is debatable whether cron was the right choice for Slack. What other possible options Slack could have considered?

Expand full comment

Reply (3)

Gabri

May 22, 2024

I had the same question in mind, I wonder if they went for pub/sub with Redis (https://redis.io/docs/latest/develop/interact/pubsub/) but yeah a clarification would be nice.

Expand full comment

Wilber Chao

Jun 8, 2024

it because historical reason of evolving thir job queue system from Redis to Kafka+Redis

you can check this - https://slack.engineering/scaling-slacks-job-queue/

Expand full comment

Ramakant

May 22, 2024

They might have used something like redisGears to fulfill this

Expand full comment

Reply (1)

Inder

May 22, 2024

Yes that might be possible. Thanks for the pointer!

Expand full comment

Prince Ajaero

May 22, 2024

Good to know

Expand full comment

Tomas

Jun 6, 2024Edited

This mentions they don't have to keep pods in sync due to having a single leader, which makes sense from the execution perspective, but how do they ensure a shared consistent schedule across pods? Do they pull the whole schedule from the DB after a pod is elected as leader? I wonder how that would work for millions of scheduled jobs

Expand full comment

Oleg Kuralenko

May 24, 2024

> While this approach worked fine, there were cases where a script’s execution time exceeded its recurrence intervals, leading to the possibility of two copies running concurrently

> they used flock

How is that? Flock does exactly that - prevents other processes from running if the lock is not released. If the other script is starting it would either wait for the lock or skip running depending on configuration

Expand full comment

ByteByteGo Newsletter

How Slack Built a Distributed Cron Execution…