Can I have tasks under one DAG with different start dates in Airflow?How to manage a million airflow tasks with different start dates?Airflow: changing the crontab time for a DAG in AirflowRunning dags with different frequency | AirflowAssigning tasks to specific machines with airflowTask lineage between Dependant Dags in Airflowclear an upstream task in airflow within the dagAirflow DAG does not skip tasks after BranchPythonOperator or ShortCircuitOperatorRunning airflow DAG with proper schedulingAirflow 1.9.0 is queuing but not launching tasksAirFlow DAG Get stuck in running stateAirflow dag gets stuck after renaming a task
Does WiFi affect the quality of images downloaded from the internet?
Why is the concept of the Null hypothesis associated with the student's t distribution?
Are athlete's college degrees discounted by employers and graduate school admissions?
When editor does not respond to the request for withdrawal
Convert GE Load Center to main breaker
If absolute velocity does not exist, how can we say a rocket accelerates in empty space?
Is it advisable to add a location heads-up when a scene changes in a novel?
ISP is not hashing the password I log in with online. Should I take any action?
I am caught when I was about to steal some candies
Was the Lonely Mountain, where Smaug lived, a volcano?
Nth term of Van Eck Sequence
About the paper by Buekenhout, Delandtsheer, Doyen, Kleidman, Liebeck and Saxl
What do I need to do, tax-wise, for a sudden windfall?
Is Jesus the last Prophet?
What is the theme of analysis?
Why would a home insurer offer a discount based on credit score?
How (un)safe is it to ride barefoot?
I sent an angry e-mail to my interviewers about a conflict at my home institution. Could this affect my application?
Is plausible to have subspecies with & without separate sexes?
Placement of positioning lights on A320 winglets
Is this Homebrew Eldritch Invocation, Accursed Memory, balanced?
What does BREAD stand for while drafting?
When to use the uncountable form of a noun?
Simple log rotation script
Can I have tasks under one DAG with different start dates in Airflow?
How to manage a million airflow tasks with different start dates?Airflow: changing the crontab time for a DAG in AirflowRunning dags with different frequency | AirflowAssigning tasks to specific machines with airflowTask lineage between Dependant Dags in Airflowclear an upstream task in airflow within the dagAirflow DAG does not skip tasks after BranchPythonOperator or ShortCircuitOperatorRunning airflow DAG with proper schedulingAirflow 1.9.0 is queuing but not launching tasksAirFlow DAG Get stuck in running stateAirflow dag gets stuck after renaming a task
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I have a DAG which runs two tasks: A
and B
.
Instead of specifying the start_date
on DAG level, I have added it as an attribute to the operators (I am using a PythonOperator
in this case) and removed it form the DAG dictionary. Both tasks run daily.
The start_date
for A
is 2013-01-01 and the start_date
for B
is 2015-01-01. My problem is that Airflow runs for 16 days for tasks A (because I guess in my airflow.cfg
I have left the default dag_concurrency = 16
)from 2013-01-01 and after that it stops. The DAGs are in state running
and the tasks for B
are in state with no status
.
Clearly I am doing something wrong and I can simply set the start_date
on DAG level and have B
run from the start_date
of A
, but that's not what i want to do.
Alternatively I can split them in separate DAGs, but again, that's not how I want to monitor them.
Is there a way to have a DAG with multiple tasks each having its own start_date
? If so, how to do this?
UPDATE:
I know that a ShortCircuitOperator
can be added, but this seems to work only for a flow of tasks which are dependent and there is a downstream. In my case A
is independent of B
.
airflow
add a comment |
I have a DAG which runs two tasks: A
and B
.
Instead of specifying the start_date
on DAG level, I have added it as an attribute to the operators (I am using a PythonOperator
in this case) and removed it form the DAG dictionary. Both tasks run daily.
The start_date
for A
is 2013-01-01 and the start_date
for B
is 2015-01-01. My problem is that Airflow runs for 16 days for tasks A (because I guess in my airflow.cfg
I have left the default dag_concurrency = 16
)from 2013-01-01 and after that it stops. The DAGs are in state running
and the tasks for B
are in state with no status
.
Clearly I am doing something wrong and I can simply set the start_date
on DAG level and have B
run from the start_date
of A
, but that's not what i want to do.
Alternatively I can split them in separate DAGs, but again, that's not how I want to monitor them.
Is there a way to have a DAG with multiple tasks each having its own start_date
? If so, how to do this?
UPDATE:
I know that a ShortCircuitOperator
can be added, but this seems to work only for a flow of tasks which are dependent and there is a downstream. In my case A
is independent of B
.
airflow
How about SubDagOperator?
– RyanTheCoder
Mar 25 at 1:14
How can I use this to achieve this?
– Newskooler
Mar 25 at 1:50
Can you elaborate on your use case? If the tasks are completely idempotent, just make two DAGs.
– dorvak
Mar 25 at 8:13
They are completely idempotent, but it makes logical sense to group them as they are very similar.
– Newskooler
Mar 25 at 12:03
add a comment |
I have a DAG which runs two tasks: A
and B
.
Instead of specifying the start_date
on DAG level, I have added it as an attribute to the operators (I am using a PythonOperator
in this case) and removed it form the DAG dictionary. Both tasks run daily.
The start_date
for A
is 2013-01-01 and the start_date
for B
is 2015-01-01. My problem is that Airflow runs for 16 days for tasks A (because I guess in my airflow.cfg
I have left the default dag_concurrency = 16
)from 2013-01-01 and after that it stops. The DAGs are in state running
and the tasks for B
are in state with no status
.
Clearly I am doing something wrong and I can simply set the start_date
on DAG level and have B
run from the start_date
of A
, but that's not what i want to do.
Alternatively I can split them in separate DAGs, but again, that's not how I want to monitor them.
Is there a way to have a DAG with multiple tasks each having its own start_date
? If so, how to do this?
UPDATE:
I know that a ShortCircuitOperator
can be added, but this seems to work only for a flow of tasks which are dependent and there is a downstream. In my case A
is independent of B
.
airflow
I have a DAG which runs two tasks: A
and B
.
Instead of specifying the start_date
on DAG level, I have added it as an attribute to the operators (I am using a PythonOperator
in this case) and removed it form the DAG dictionary. Both tasks run daily.
The start_date
for A
is 2013-01-01 and the start_date
for B
is 2015-01-01. My problem is that Airflow runs for 16 days for tasks A (because I guess in my airflow.cfg
I have left the default dag_concurrency = 16
)from 2013-01-01 and after that it stops. The DAGs are in state running
and the tasks for B
are in state with no status
.
Clearly I am doing something wrong and I can simply set the start_date
on DAG level and have B
run from the start_date
of A
, but that's not what i want to do.
Alternatively I can split them in separate DAGs, but again, that's not how I want to monitor them.
Is there a way to have a DAG with multiple tasks each having its own start_date
? If so, how to do this?
UPDATE:
I know that a ShortCircuitOperator
can be added, but this seems to work only for a flow of tasks which are dependent and there is a downstream. In my case A
is independent of B
.
airflow
airflow
edited Mar 25 at 0:35
Newskooler
asked Mar 25 at 0:07
NewskoolerNewskooler
71121229
71121229
How about SubDagOperator?
– RyanTheCoder
Mar 25 at 1:14
How can I use this to achieve this?
– Newskooler
Mar 25 at 1:50
Can you elaborate on your use case? If the tasks are completely idempotent, just make two DAGs.
– dorvak
Mar 25 at 8:13
They are completely idempotent, but it makes logical sense to group them as they are very similar.
– Newskooler
Mar 25 at 12:03
add a comment |
How about SubDagOperator?
– RyanTheCoder
Mar 25 at 1:14
How can I use this to achieve this?
– Newskooler
Mar 25 at 1:50
Can you elaborate on your use case? If the tasks are completely idempotent, just make two DAGs.
– dorvak
Mar 25 at 8:13
They are completely idempotent, but it makes logical sense to group them as they are very similar.
– Newskooler
Mar 25 at 12:03
How about SubDagOperator?
– RyanTheCoder
Mar 25 at 1:14
How about SubDagOperator?
– RyanTheCoder
Mar 25 at 1:14
How can I use this to achieve this?
– Newskooler
Mar 25 at 1:50
How can I use this to achieve this?
– Newskooler
Mar 25 at 1:50
Can you elaborate on your use case? If the tasks are completely idempotent, just make two DAGs.
– dorvak
Mar 25 at 8:13
Can you elaborate on your use case? If the tasks are completely idempotent, just make two DAGs.
– dorvak
Mar 25 at 8:13
They are completely idempotent, but it makes logical sense to group them as they are very similar.
– Newskooler
Mar 25 at 12:03
They are completely idempotent, but it makes logical sense to group them as they are very similar.
– Newskooler
Mar 25 at 12:03
add a comment |
1 Answer
1
active
oldest
votes
Use BranchPythonOperator
and check in that task that your execution_date >= '2015-01-01' or not. If true it should execute Task B, if not it should execute a Dummy Task.
However, I would recommend using a Separate DAG.
Documentation on branching: https://airflow.readthedocs.io/en/1.10.2/concepts.html#branching
Adding a dummy task does not sound like a very good deasing. Could you please provide an example and also advice whether this is indeed the best way to address such issues? Wouldn't having two DAGs be cleaner?
– Newskooler
Mar 25 at 12:02
After reading about this operation, it does not do what I wish it ides, since it branches, so when I check the UI, the job will not be "skipped"
– Newskooler
Mar 25 at 12:11
@Newskooler I would definitely recommend using a separate DAG but I read in the comments below the question that you are looking for a different solution other than separating it to another DAG. Also, have a look at airflow.readthedocs.io/en/1.10.2/concepts.html#branching - which explains branching in more detail. And would should in the UI if your task is skipped. I have updated my answer as well.
– kaxil
Mar 25 at 12:43
I have about 750k tasks with different start dates. In this case, would you still have them as separate DAGs?
– Newskooler
Mar 25 at 13:28
If you have 750k tasks, I am sure you are generating them dynamically, if so use BranchPythonOperator. I would have separated them as different DAGs based on logically-dependent groups. If that wasn't possible, I would have used BranchPythonOperator so that I can see when a task was skipped or ran.
– kaxil
Mar 25 at 19:37
|
show 1 more comment
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55329782%2fcan-i-have-tasks-under-one-dag-with-different-start-dates-in-airflow%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Use BranchPythonOperator
and check in that task that your execution_date >= '2015-01-01' or not. If true it should execute Task B, if not it should execute a Dummy Task.
However, I would recommend using a Separate DAG.
Documentation on branching: https://airflow.readthedocs.io/en/1.10.2/concepts.html#branching
Adding a dummy task does not sound like a very good deasing. Could you please provide an example and also advice whether this is indeed the best way to address such issues? Wouldn't having two DAGs be cleaner?
– Newskooler
Mar 25 at 12:02
After reading about this operation, it does not do what I wish it ides, since it branches, so when I check the UI, the job will not be "skipped"
– Newskooler
Mar 25 at 12:11
@Newskooler I would definitely recommend using a separate DAG but I read in the comments below the question that you are looking for a different solution other than separating it to another DAG. Also, have a look at airflow.readthedocs.io/en/1.10.2/concepts.html#branching - which explains branching in more detail. And would should in the UI if your task is skipped. I have updated my answer as well.
– kaxil
Mar 25 at 12:43
I have about 750k tasks with different start dates. In this case, would you still have them as separate DAGs?
– Newskooler
Mar 25 at 13:28
If you have 750k tasks, I am sure you are generating them dynamically, if so use BranchPythonOperator. I would have separated them as different DAGs based on logically-dependent groups. If that wasn't possible, I would have used BranchPythonOperator so that I can see when a task was skipped or ran.
– kaxil
Mar 25 at 19:37
|
show 1 more comment
Use BranchPythonOperator
and check in that task that your execution_date >= '2015-01-01' or not. If true it should execute Task B, if not it should execute a Dummy Task.
However, I would recommend using a Separate DAG.
Documentation on branching: https://airflow.readthedocs.io/en/1.10.2/concepts.html#branching
Adding a dummy task does not sound like a very good deasing. Could you please provide an example and also advice whether this is indeed the best way to address such issues? Wouldn't having two DAGs be cleaner?
– Newskooler
Mar 25 at 12:02
After reading about this operation, it does not do what I wish it ides, since it branches, so when I check the UI, the job will not be "skipped"
– Newskooler
Mar 25 at 12:11
@Newskooler I would definitely recommend using a separate DAG but I read in the comments below the question that you are looking for a different solution other than separating it to another DAG. Also, have a look at airflow.readthedocs.io/en/1.10.2/concepts.html#branching - which explains branching in more detail. And would should in the UI if your task is skipped. I have updated my answer as well.
– kaxil
Mar 25 at 12:43
I have about 750k tasks with different start dates. In this case, would you still have them as separate DAGs?
– Newskooler
Mar 25 at 13:28
If you have 750k tasks, I am sure you are generating them dynamically, if so use BranchPythonOperator. I would have separated them as different DAGs based on logically-dependent groups. If that wasn't possible, I would have used BranchPythonOperator so that I can see when a task was skipped or ran.
– kaxil
Mar 25 at 19:37
|
show 1 more comment
Use BranchPythonOperator
and check in that task that your execution_date >= '2015-01-01' or not. If true it should execute Task B, if not it should execute a Dummy Task.
However, I would recommend using a Separate DAG.
Documentation on branching: https://airflow.readthedocs.io/en/1.10.2/concepts.html#branching
Use BranchPythonOperator
and check in that task that your execution_date >= '2015-01-01' or not. If true it should execute Task B, if not it should execute a Dummy Task.
However, I would recommend using a Separate DAG.
Documentation on branching: https://airflow.readthedocs.io/en/1.10.2/concepts.html#branching
edited Mar 25 at 12:41
answered Mar 25 at 11:32
kaxilkaxil
4,550929
4,550929
Adding a dummy task does not sound like a very good deasing. Could you please provide an example and also advice whether this is indeed the best way to address such issues? Wouldn't having two DAGs be cleaner?
– Newskooler
Mar 25 at 12:02
After reading about this operation, it does not do what I wish it ides, since it branches, so when I check the UI, the job will not be "skipped"
– Newskooler
Mar 25 at 12:11
@Newskooler I would definitely recommend using a separate DAG but I read in the comments below the question that you are looking for a different solution other than separating it to another DAG. Also, have a look at airflow.readthedocs.io/en/1.10.2/concepts.html#branching - which explains branching in more detail. And would should in the UI if your task is skipped. I have updated my answer as well.
– kaxil
Mar 25 at 12:43
I have about 750k tasks with different start dates. In this case, would you still have them as separate DAGs?
– Newskooler
Mar 25 at 13:28
If you have 750k tasks, I am sure you are generating them dynamically, if so use BranchPythonOperator. I would have separated them as different DAGs based on logically-dependent groups. If that wasn't possible, I would have used BranchPythonOperator so that I can see when a task was skipped or ran.
– kaxil
Mar 25 at 19:37
|
show 1 more comment
Adding a dummy task does not sound like a very good deasing. Could you please provide an example and also advice whether this is indeed the best way to address such issues? Wouldn't having two DAGs be cleaner?
– Newskooler
Mar 25 at 12:02
After reading about this operation, it does not do what I wish it ides, since it branches, so when I check the UI, the job will not be "skipped"
– Newskooler
Mar 25 at 12:11
@Newskooler I would definitely recommend using a separate DAG but I read in the comments below the question that you are looking for a different solution other than separating it to another DAG. Also, have a look at airflow.readthedocs.io/en/1.10.2/concepts.html#branching - which explains branching in more detail. And would should in the UI if your task is skipped. I have updated my answer as well.
– kaxil
Mar 25 at 12:43
I have about 750k tasks with different start dates. In this case, would you still have them as separate DAGs?
– Newskooler
Mar 25 at 13:28
If you have 750k tasks, I am sure you are generating them dynamically, if so use BranchPythonOperator. I would have separated them as different DAGs based on logically-dependent groups. If that wasn't possible, I would have used BranchPythonOperator so that I can see when a task was skipped or ran.
– kaxil
Mar 25 at 19:37
Adding a dummy task does not sound like a very good deasing. Could you please provide an example and also advice whether this is indeed the best way to address such issues? Wouldn't having two DAGs be cleaner?
– Newskooler
Mar 25 at 12:02
Adding a dummy task does not sound like a very good deasing. Could you please provide an example and also advice whether this is indeed the best way to address such issues? Wouldn't having two DAGs be cleaner?
– Newskooler
Mar 25 at 12:02
After reading about this operation, it does not do what I wish it ides, since it branches, so when I check the UI, the job will not be "skipped"
– Newskooler
Mar 25 at 12:11
After reading about this operation, it does not do what I wish it ides, since it branches, so when I check the UI, the job will not be "skipped"
– Newskooler
Mar 25 at 12:11
@Newskooler I would definitely recommend using a separate DAG but I read in the comments below the question that you are looking for a different solution other than separating it to another DAG. Also, have a look at airflow.readthedocs.io/en/1.10.2/concepts.html#branching - which explains branching in more detail. And would should in the UI if your task is skipped. I have updated my answer as well.
– kaxil
Mar 25 at 12:43
@Newskooler I would definitely recommend using a separate DAG but I read in the comments below the question that you are looking for a different solution other than separating it to another DAG. Also, have a look at airflow.readthedocs.io/en/1.10.2/concepts.html#branching - which explains branching in more detail. And would should in the UI if your task is skipped. I have updated my answer as well.
– kaxil
Mar 25 at 12:43
I have about 750k tasks with different start dates. In this case, would you still have them as separate DAGs?
– Newskooler
Mar 25 at 13:28
I have about 750k tasks with different start dates. In this case, would you still have them as separate DAGs?
– Newskooler
Mar 25 at 13:28
If you have 750k tasks, I am sure you are generating them dynamically, if so use BranchPythonOperator. I would have separated them as different DAGs based on logically-dependent groups. If that wasn't possible, I would have used BranchPythonOperator so that I can see when a task was skipped or ran.
– kaxil
Mar 25 at 19:37
If you have 750k tasks, I am sure you are generating them dynamically, if so use BranchPythonOperator. I would have separated them as different DAGs based on logically-dependent groups. If that wasn't possible, I would have used BranchPythonOperator so that I can see when a task was skipped or ran.
– kaxil
Mar 25 at 19:37
|
show 1 more comment
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55329782%2fcan-i-have-tasks-under-one-dag-with-different-start-dates-in-airflow%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
How about SubDagOperator?
– RyanTheCoder
Mar 25 at 1:14
How can I use this to achieve this?
– Newskooler
Mar 25 at 1:50
Can you elaborate on your use case? If the tasks are completely idempotent, just make two DAGs.
– dorvak
Mar 25 at 8:13
They are completely idempotent, but it makes logical sense to group them as they are very similar.
– Newskooler
Mar 25 at 12:03