Replies: 1 comment 4 replies
-
|
set airflow pools on your tasks to restrict the contention and if |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all,
Not sure what the formatting here is and if there is a template for me to use to apologies if the setup is unclear. I tried to include all relevant information needed.
The problem
In bigger deployments a lot of DAGs end up on the same cron time.
@dailyalways resolves to0 0 * * *, so every daily DAG fires at midnight at the exact same moment. On a production MWAA environment I work on we had around 26 daily DAGs all firing at00:00and it caused task failures from the contention at that boundary.Right now there is no native way to spread them out. You either:
@dailyintent and is not reusable.Neither feels like something every team should be reinventing on their own.
What I would like to propose
A way to add deterministic jitter to a schedule. Something like a
JitteredCronTimetable, or a jitter option on the existing cron timetables, that offsets each DAG by a stable function of its dag id inside a configurable window (say up to 60 minutes), while keeping thedata_intervalandlogical_datesemantics exactly the same.The important part is that it stays deterministic rather than random, so a given DAG always lands in the same slot and runs stay stable and predictable across scheduler restarts and timetable serialization.
Prior art
I looked and could not find anything that already does this. No native option, and no community plugin that I could find. Other schedulers tend to offer something in this space (Kubernetes CronJobs for example).
Questions before I build anything
I already have a working product at my current workplace in production and I am happy to bring tests and docs.
Beta Was this translation helpful? Give feedback.
All reactions