How to manage a million airflow tasks with different start dates?Can I have tasks under one DAG with different start dates in Airflow?Airflow dynamic DAG and Task IdsHow to Run a Simple Airflow DAGAirflow Docker Deployment: Task Not Getting Run After start_date + schedule_intervalDynamic task generation in an Airflow DAGAirflow: Tasks queued but not runningHow to submit parameters to a Python program in Airflow?Unable to run Airflow Tasks due to execution date and start dateGetting unique_id for apache airflow tasksHow to exit with error from script to Airflow?Airflow task initiation issue

Can one use the present progressive or gerund like an adjective?

What do you call a notepad used to keep a record?

Comment traduire « That screams X »

How to unit test methods which using static methods?

Is it okay to submit a paper from a master's thesis without informing the advisor?

My colleague is constantly blaming me for his errors

I need help with pasta

What game is this character in the Pixels movie from?

Can a nowhere continuous function have a connected graph?

How do I tell the reader that my character is autistic in Fantasy?

Just graduated with a master’s degree, but I internalised nothing

How to properly say asset/assets in German

How is this practical and very old scene shot?

Company threatening to call my current job after I declined their offer

Prime parity peregrination

Can SOCPs approximate better than LPs?

Why wasn't ASCII designed with a contiguous alphanumeric character order?

Is there a legal way for US presidents to extend their terms beyond two terms of four years?

Single level file directory

Is Cyclic Ether oxidised by periodic acid

Have any large aeroplanes been landed - safely and without damage - in locations that they could not be flown away from?

for xml path('') output

Can a stressful Wish's Strength reduction be cured early by a Greater Restoration spell?

Find the radius of the hoop.



How to manage a million airflow tasks with different start dates?


Can I have tasks under one DAG with different start dates in Airflow?Airflow dynamic DAG and Task IdsHow to Run a Simple Airflow DAGAirflow Docker Deployment: Task Not Getting Run After start_date + schedule_intervalDynamic task generation in an Airflow DAGAirflow: Tasks queued but not runningHow to submit parameters to a Python program in Airflow?Unable to run Airflow Tasks due to execution date and start dateGetting unique_id for apache airflow tasksHow to exit with error from script to Airflow?Airflow task initiation issue






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








1















I have about one million Airflow tasks which use the same python function. Each need to run with a different start date and parameters.



Earlier I asked a question on how to run two such tasks under one DAG. However, when the tasks become many, the answers there are not scalable. (see link and notes)



Question



How can I run a million (or any large number) or tasks under in a scalable fashion on Airflow, where each tasks stems from the same python function but has a different start date and different arguments?



Notes



The tasks don't need to run on a PythonOperator (as they stem from a python function). In reality, they would run in a distributed fashion on a Kubernetes Cluster most likely (so with a KubernetesExecutor or KubernetesPodOperator). Either way, the architectural problem behind the contribution of the DAG(s) still remains.)



Solution ideas



One solution which I was thinking of is that under one DAG, to dynamically construct all tasks and within the python function which gets executed, to pass the different start dates. On the outside Airflow will execute each task every day, but inside the function, if the execution_date is earlier than the start_date, the function will just return 0.










share|improve this question



















  • 1





    Can you provide a bit more detail? This sounds like a lot of work; are you prepared to throw an army of machines at it so that it finishes before the heat death of the universe?

    – Robert Harvey
    Mar 25 at 14:10











  • Sure, let me know what kind of information would be used to add? I have limited it to one clear question now.

    – Newskooler
    Mar 25 at 14:15






  • 1





    I'm a bit confused. I once worked for a company that had a similar workflow arrangement. I can't imagine the number of tasks being in the thousands, let alone the millions, so I think I'm missing something here (by at least three orders of magnitude).

    – Robert Harvey
    Mar 25 at 14:30












  • Here is an example: Say I have a million users. Each of them joined my network at a different date (so has a different start date). Everyone has its activity saved in daily .json files. If I want to download this date to work for, I need to have a task fro each users. They would all have different start dates and the function I use to download would have a different argument (e.g. the user name). Your comment is suggesting that I may be thinking about the issue in the wrong way I guess.

    – Newskooler
    Mar 25 at 14:54











  • You are thinking about it wrong way. Airflow can be used in millions of dynamic tasks, but it should not. Airflow DAGs are supposed to be pretty constant. I suggest you to use another tools for this problem. You still can use Airflow, for example, to process the whole bunch of users and use this info in your ETL process later.

    – vurmux
    Mar 26 at 9:53

















1















I have about one million Airflow tasks which use the same python function. Each need to run with a different start date and parameters.



Earlier I asked a question on how to run two such tasks under one DAG. However, when the tasks become many, the answers there are not scalable. (see link and notes)



Question



How can I run a million (or any large number) or tasks under in a scalable fashion on Airflow, where each tasks stems from the same python function but has a different start date and different arguments?



Notes



The tasks don't need to run on a PythonOperator (as they stem from a python function). In reality, they would run in a distributed fashion on a Kubernetes Cluster most likely (so with a KubernetesExecutor or KubernetesPodOperator). Either way, the architectural problem behind the contribution of the DAG(s) still remains.)



Solution ideas



One solution which I was thinking of is that under one DAG, to dynamically construct all tasks and within the python function which gets executed, to pass the different start dates. On the outside Airflow will execute each task every day, but inside the function, if the execution_date is earlier than the start_date, the function will just return 0.










share|improve this question



















  • 1





    Can you provide a bit more detail? This sounds like a lot of work; are you prepared to throw an army of machines at it so that it finishes before the heat death of the universe?

    – Robert Harvey
    Mar 25 at 14:10











  • Sure, let me know what kind of information would be used to add? I have limited it to one clear question now.

    – Newskooler
    Mar 25 at 14:15






  • 1





    I'm a bit confused. I once worked for a company that had a similar workflow arrangement. I can't imagine the number of tasks being in the thousands, let alone the millions, so I think I'm missing something here (by at least three orders of magnitude).

    – Robert Harvey
    Mar 25 at 14:30












  • Here is an example: Say I have a million users. Each of them joined my network at a different date (so has a different start date). Everyone has its activity saved in daily .json files. If I want to download this date to work for, I need to have a task fro each users. They would all have different start dates and the function I use to download would have a different argument (e.g. the user name). Your comment is suggesting that I may be thinking about the issue in the wrong way I guess.

    – Newskooler
    Mar 25 at 14:54











  • You are thinking about it wrong way. Airflow can be used in millions of dynamic tasks, but it should not. Airflow DAGs are supposed to be pretty constant. I suggest you to use another tools for this problem. You still can use Airflow, for example, to process the whole bunch of users and use this info in your ETL process later.

    – vurmux
    Mar 26 at 9:53













1












1








1








I have about one million Airflow tasks which use the same python function. Each need to run with a different start date and parameters.



Earlier I asked a question on how to run two such tasks under one DAG. However, when the tasks become many, the answers there are not scalable. (see link and notes)



Question



How can I run a million (or any large number) or tasks under in a scalable fashion on Airflow, where each tasks stems from the same python function but has a different start date and different arguments?



Notes



The tasks don't need to run on a PythonOperator (as they stem from a python function). In reality, they would run in a distributed fashion on a Kubernetes Cluster most likely (so with a KubernetesExecutor or KubernetesPodOperator). Either way, the architectural problem behind the contribution of the DAG(s) still remains.)



Solution ideas



One solution which I was thinking of is that under one DAG, to dynamically construct all tasks and within the python function which gets executed, to pass the different start dates. On the outside Airflow will execute each task every day, but inside the function, if the execution_date is earlier than the start_date, the function will just return 0.










share|improve this question
















I have about one million Airflow tasks which use the same python function. Each need to run with a different start date and parameters.



Earlier I asked a question on how to run two such tasks under one DAG. However, when the tasks become many, the answers there are not scalable. (see link and notes)



Question



How can I run a million (or any large number) or tasks under in a scalable fashion on Airflow, where each tasks stems from the same python function but has a different start date and different arguments?



Notes



The tasks don't need to run on a PythonOperator (as they stem from a python function). In reality, they would run in a distributed fashion on a Kubernetes Cluster most likely (so with a KubernetesExecutor or KubernetesPodOperator). Either way, the architectural problem behind the contribution of the DAG(s) still remains.)



Solution ideas



One solution which I was thinking of is that under one DAG, to dynamically construct all tasks and within the python function which gets executed, to pass the different start dates. On the outside Airflow will execute each task every day, but inside the function, if the execution_date is earlier than the start_date, the function will just return 0.







airflow






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 25 at 14:23







Newskooler

















asked Mar 25 at 14:08









NewskoolerNewskooler

7032 gold badges12 silver badges29 bronze badges




7032 gold badges12 silver badges29 bronze badges







  • 1





    Can you provide a bit more detail? This sounds like a lot of work; are you prepared to throw an army of machines at it so that it finishes before the heat death of the universe?

    – Robert Harvey
    Mar 25 at 14:10











  • Sure, let me know what kind of information would be used to add? I have limited it to one clear question now.

    – Newskooler
    Mar 25 at 14:15






  • 1





    I'm a bit confused. I once worked for a company that had a similar workflow arrangement. I can't imagine the number of tasks being in the thousands, let alone the millions, so I think I'm missing something here (by at least three orders of magnitude).

    – Robert Harvey
    Mar 25 at 14:30












  • Here is an example: Say I have a million users. Each of them joined my network at a different date (so has a different start date). Everyone has its activity saved in daily .json files. If I want to download this date to work for, I need to have a task fro each users. They would all have different start dates and the function I use to download would have a different argument (e.g. the user name). Your comment is suggesting that I may be thinking about the issue in the wrong way I guess.

    – Newskooler
    Mar 25 at 14:54











  • You are thinking about it wrong way. Airflow can be used in millions of dynamic tasks, but it should not. Airflow DAGs are supposed to be pretty constant. I suggest you to use another tools for this problem. You still can use Airflow, for example, to process the whole bunch of users and use this info in your ETL process later.

    – vurmux
    Mar 26 at 9:53












  • 1





    Can you provide a bit more detail? This sounds like a lot of work; are you prepared to throw an army of machines at it so that it finishes before the heat death of the universe?

    – Robert Harvey
    Mar 25 at 14:10











  • Sure, let me know what kind of information would be used to add? I have limited it to one clear question now.

    – Newskooler
    Mar 25 at 14:15






  • 1





    I'm a bit confused. I once worked for a company that had a similar workflow arrangement. I can't imagine the number of tasks being in the thousands, let alone the millions, so I think I'm missing something here (by at least three orders of magnitude).

    – Robert Harvey
    Mar 25 at 14:30












  • Here is an example: Say I have a million users. Each of them joined my network at a different date (so has a different start date). Everyone has its activity saved in daily .json files. If I want to download this date to work for, I need to have a task fro each users. They would all have different start dates and the function I use to download would have a different argument (e.g. the user name). Your comment is suggesting that I may be thinking about the issue in the wrong way I guess.

    – Newskooler
    Mar 25 at 14:54











  • You are thinking about it wrong way. Airflow can be used in millions of dynamic tasks, but it should not. Airflow DAGs are supposed to be pretty constant. I suggest you to use another tools for this problem. You still can use Airflow, for example, to process the whole bunch of users and use this info in your ETL process later.

    – vurmux
    Mar 26 at 9:53







1




1





Can you provide a bit more detail? This sounds like a lot of work; are you prepared to throw an army of machines at it so that it finishes before the heat death of the universe?

– Robert Harvey
Mar 25 at 14:10





Can you provide a bit more detail? This sounds like a lot of work; are you prepared to throw an army of machines at it so that it finishes before the heat death of the universe?

– Robert Harvey
Mar 25 at 14:10













Sure, let me know what kind of information would be used to add? I have limited it to one clear question now.

– Newskooler
Mar 25 at 14:15





Sure, let me know what kind of information would be used to add? I have limited it to one clear question now.

– Newskooler
Mar 25 at 14:15




1




1





I'm a bit confused. I once worked for a company that had a similar workflow arrangement. I can't imagine the number of tasks being in the thousands, let alone the millions, so I think I'm missing something here (by at least three orders of magnitude).

– Robert Harvey
Mar 25 at 14:30






I'm a bit confused. I once worked for a company that had a similar workflow arrangement. I can't imagine the number of tasks being in the thousands, let alone the millions, so I think I'm missing something here (by at least three orders of magnitude).

– Robert Harvey
Mar 25 at 14:30














Here is an example: Say I have a million users. Each of them joined my network at a different date (so has a different start date). Everyone has its activity saved in daily .json files. If I want to download this date to work for, I need to have a task fro each users. They would all have different start dates and the function I use to download would have a different argument (e.g. the user name). Your comment is suggesting that I may be thinking about the issue in the wrong way I guess.

– Newskooler
Mar 25 at 14:54





Here is an example: Say I have a million users. Each of them joined my network at a different date (so has a different start date). Everyone has its activity saved in daily .json files. If I want to download this date to work for, I need to have a task fro each users. They would all have different start dates and the function I use to download would have a different argument (e.g. the user name). Your comment is suggesting that I may be thinking about the issue in the wrong way I guess.

– Newskooler
Mar 25 at 14:54













You are thinking about it wrong way. Airflow can be used in millions of dynamic tasks, but it should not. Airflow DAGs are supposed to be pretty constant. I suggest you to use another tools for this problem. You still can use Airflow, for example, to process the whole bunch of users and use this info in your ETL process later.

– vurmux
Mar 26 at 9:53





You are thinking about it wrong way. Airflow can be used in millions of dynamic tasks, but it should not. Airflow DAGs are supposed to be pretty constant. I suggest you to use another tools for this problem. You still can use Airflow, for example, to process the whole bunch of users and use this info in your ETL process later.

– vurmux
Mar 26 at 9:53












1 Answer
1






active

oldest

votes


















1














After our conversation in comments I think I can get an answer:



Airflow can be used in millions of dynamic tasks, but it should not. Airflow DAGs are supposed to be pretty constant. You still can use Airflow, for example, to process the whole bunch of users (given from somewhere) and use this info in your ETL process later.



I recommend to build your task system on top of Celery library (don't mess up with the CeleryExecutor in Airflow, because Airflow can be used on top of Celery). It is a task queue that is focused on millions of real-time tasks:




Celery is used in production systems to process millions of tasks a day.




Celery is written on Python, is production-ready, stable and is incredibly scalable. I think it is the best tool to solve your problem.






share|improve this answer

























  • I am already running airflow with a CeleryExecutor. However when I construct a DAG for say 20130102 it will have 120k tasks and on the next day it will have 150k tasks and a week later it will have 100k tasks. How does the fact that I am using celery help out here? I though it's good to keep the number of tasks constants in a DAG?

    – Newskooler
    Mar 26 at 13:47











  • Apache Airflow can work on top of Celery (it is why Airflow is so scalable). I recommend to use the Celery itself, not inside the Airflow. You can write a script that will run 100k tasks with Celery, get their result, and send it to some Airflow task, which will work with them.

    – vurmux
    Mar 26 at 13:53











  • So when you say working with Celery, you don't mean the CeleryExecutor, but Celery standalone integrated into my code?

    – Newskooler
    Mar 26 at 14:41











  • Yes, exactly. I meant that you can build your code on top of the Celery library

    – vurmux
    Mar 26 at 15:10






  • 1





    Perfect, thanks. Maybe add this to your reply, so that there is no confusion over the Celery Executor.

    – Newskooler
    Mar 26 at 15:12










Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55339704%2fhow-to-manage-a-million-airflow-tasks-with-different-start-dates%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














After our conversation in comments I think I can get an answer:



Airflow can be used in millions of dynamic tasks, but it should not. Airflow DAGs are supposed to be pretty constant. You still can use Airflow, for example, to process the whole bunch of users (given from somewhere) and use this info in your ETL process later.



I recommend to build your task system on top of Celery library (don't mess up with the CeleryExecutor in Airflow, because Airflow can be used on top of Celery). It is a task queue that is focused on millions of real-time tasks:




Celery is used in production systems to process millions of tasks a day.




Celery is written on Python, is production-ready, stable and is incredibly scalable. I think it is the best tool to solve your problem.






share|improve this answer

























  • I am already running airflow with a CeleryExecutor. However when I construct a DAG for say 20130102 it will have 120k tasks and on the next day it will have 150k tasks and a week later it will have 100k tasks. How does the fact that I am using celery help out here? I though it's good to keep the number of tasks constants in a DAG?

    – Newskooler
    Mar 26 at 13:47











  • Apache Airflow can work on top of Celery (it is why Airflow is so scalable). I recommend to use the Celery itself, not inside the Airflow. You can write a script that will run 100k tasks with Celery, get their result, and send it to some Airflow task, which will work with them.

    – vurmux
    Mar 26 at 13:53











  • So when you say working with Celery, you don't mean the CeleryExecutor, but Celery standalone integrated into my code?

    – Newskooler
    Mar 26 at 14:41











  • Yes, exactly. I meant that you can build your code on top of the Celery library

    – vurmux
    Mar 26 at 15:10






  • 1





    Perfect, thanks. Maybe add this to your reply, so that there is no confusion over the Celery Executor.

    – Newskooler
    Mar 26 at 15:12















1














After our conversation in comments I think I can get an answer:



Airflow can be used in millions of dynamic tasks, but it should not. Airflow DAGs are supposed to be pretty constant. You still can use Airflow, for example, to process the whole bunch of users (given from somewhere) and use this info in your ETL process later.



I recommend to build your task system on top of Celery library (don't mess up with the CeleryExecutor in Airflow, because Airflow can be used on top of Celery). It is a task queue that is focused on millions of real-time tasks:




Celery is used in production systems to process millions of tasks a day.




Celery is written on Python, is production-ready, stable and is incredibly scalable. I think it is the best tool to solve your problem.






share|improve this answer

























  • I am already running airflow with a CeleryExecutor. However when I construct a DAG for say 20130102 it will have 120k tasks and on the next day it will have 150k tasks and a week later it will have 100k tasks. How does the fact that I am using celery help out here? I though it's good to keep the number of tasks constants in a DAG?

    – Newskooler
    Mar 26 at 13:47











  • Apache Airflow can work on top of Celery (it is why Airflow is so scalable). I recommend to use the Celery itself, not inside the Airflow. You can write a script that will run 100k tasks with Celery, get their result, and send it to some Airflow task, which will work with them.

    – vurmux
    Mar 26 at 13:53











  • So when you say working with Celery, you don't mean the CeleryExecutor, but Celery standalone integrated into my code?

    – Newskooler
    Mar 26 at 14:41











  • Yes, exactly. I meant that you can build your code on top of the Celery library

    – vurmux
    Mar 26 at 15:10






  • 1





    Perfect, thanks. Maybe add this to your reply, so that there is no confusion over the Celery Executor.

    – Newskooler
    Mar 26 at 15:12













1












1








1







After our conversation in comments I think I can get an answer:



Airflow can be used in millions of dynamic tasks, but it should not. Airflow DAGs are supposed to be pretty constant. You still can use Airflow, for example, to process the whole bunch of users (given from somewhere) and use this info in your ETL process later.



I recommend to build your task system on top of Celery library (don't mess up with the CeleryExecutor in Airflow, because Airflow can be used on top of Celery). It is a task queue that is focused on millions of real-time tasks:




Celery is used in production systems to process millions of tasks a day.




Celery is written on Python, is production-ready, stable and is incredibly scalable. I think it is the best tool to solve your problem.






share|improve this answer















After our conversation in comments I think I can get an answer:



Airflow can be used in millions of dynamic tasks, but it should not. Airflow DAGs are supposed to be pretty constant. You still can use Airflow, for example, to process the whole bunch of users (given from somewhere) and use this info in your ETL process later.



I recommend to build your task system on top of Celery library (don't mess up with the CeleryExecutor in Airflow, because Airflow can be used on top of Celery). It is a task queue that is focused on millions of real-time tasks:




Celery is used in production systems to process millions of tasks a day.




Celery is written on Python, is production-ready, stable and is incredibly scalable. I think it is the best tool to solve your problem.







share|improve this answer














share|improve this answer



share|improve this answer








edited Mar 26 at 15:14

























answered Mar 26 at 13:42









vurmuxvurmux

5,6852 gold badges8 silver badges30 bronze badges




5,6852 gold badges8 silver badges30 bronze badges












  • I am already running airflow with a CeleryExecutor. However when I construct a DAG for say 20130102 it will have 120k tasks and on the next day it will have 150k tasks and a week later it will have 100k tasks. How does the fact that I am using celery help out here? I though it's good to keep the number of tasks constants in a DAG?

    – Newskooler
    Mar 26 at 13:47











  • Apache Airflow can work on top of Celery (it is why Airflow is so scalable). I recommend to use the Celery itself, not inside the Airflow. You can write a script that will run 100k tasks with Celery, get their result, and send it to some Airflow task, which will work with them.

    – vurmux
    Mar 26 at 13:53











  • So when you say working with Celery, you don't mean the CeleryExecutor, but Celery standalone integrated into my code?

    – Newskooler
    Mar 26 at 14:41











  • Yes, exactly. I meant that you can build your code on top of the Celery library

    – vurmux
    Mar 26 at 15:10






  • 1





    Perfect, thanks. Maybe add this to your reply, so that there is no confusion over the Celery Executor.

    – Newskooler
    Mar 26 at 15:12

















  • I am already running airflow with a CeleryExecutor. However when I construct a DAG for say 20130102 it will have 120k tasks and on the next day it will have 150k tasks and a week later it will have 100k tasks. How does the fact that I am using celery help out here? I though it's good to keep the number of tasks constants in a DAG?

    – Newskooler
    Mar 26 at 13:47











  • Apache Airflow can work on top of Celery (it is why Airflow is so scalable). I recommend to use the Celery itself, not inside the Airflow. You can write a script that will run 100k tasks with Celery, get their result, and send it to some Airflow task, which will work with them.

    – vurmux
    Mar 26 at 13:53











  • So when you say working with Celery, you don't mean the CeleryExecutor, but Celery standalone integrated into my code?

    – Newskooler
    Mar 26 at 14:41











  • Yes, exactly. I meant that you can build your code on top of the Celery library

    – vurmux
    Mar 26 at 15:10






  • 1





    Perfect, thanks. Maybe add this to your reply, so that there is no confusion over the Celery Executor.

    – Newskooler
    Mar 26 at 15:12
















I am already running airflow with a CeleryExecutor. However when I construct a DAG for say 20130102 it will have 120k tasks and on the next day it will have 150k tasks and a week later it will have 100k tasks. How does the fact that I am using celery help out here? I though it's good to keep the number of tasks constants in a DAG?

– Newskooler
Mar 26 at 13:47





I am already running airflow with a CeleryExecutor. However when I construct a DAG for say 20130102 it will have 120k tasks and on the next day it will have 150k tasks and a week later it will have 100k tasks. How does the fact that I am using celery help out here? I though it's good to keep the number of tasks constants in a DAG?

– Newskooler
Mar 26 at 13:47













Apache Airflow can work on top of Celery (it is why Airflow is so scalable). I recommend to use the Celery itself, not inside the Airflow. You can write a script that will run 100k tasks with Celery, get their result, and send it to some Airflow task, which will work with them.

– vurmux
Mar 26 at 13:53





Apache Airflow can work on top of Celery (it is why Airflow is so scalable). I recommend to use the Celery itself, not inside the Airflow. You can write a script that will run 100k tasks with Celery, get their result, and send it to some Airflow task, which will work with them.

– vurmux
Mar 26 at 13:53













So when you say working with Celery, you don't mean the CeleryExecutor, but Celery standalone integrated into my code?

– Newskooler
Mar 26 at 14:41





So when you say working with Celery, you don't mean the CeleryExecutor, but Celery standalone integrated into my code?

– Newskooler
Mar 26 at 14:41













Yes, exactly. I meant that you can build your code on top of the Celery library

– vurmux
Mar 26 at 15:10





Yes, exactly. I meant that you can build your code on top of the Celery library

– vurmux
Mar 26 at 15:10




1




1





Perfect, thanks. Maybe add this to your reply, so that there is no confusion over the Celery Executor.

– Newskooler
Mar 26 at 15:12





Perfect, thanks. Maybe add this to your reply, so that there is no confusion over the Celery Executor.

– Newskooler
Mar 26 at 15:12








Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.







Got a question that you can’t ask on public Stack Overflow? Learn more about sharing private information with Stack Overflow for Teams.



















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55339704%2fhow-to-manage-a-million-airflow-tasks-with-different-start-dates%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript