Micro-batching through Nifi Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) Data science time! April 2019 and salary with experience The Ask Question Wizard is Live!Data aggregation in Apache NifiWhat is the best practice for nifi production deploymentNiFi Flowfile Attributes from KafkaConsumerCreating larger NiFi flow files when using the ConsumeKafka processorApache Nifi HBASE lookupHow do I extract one key value pair from _consumeKafka_ in Apache Nifi?Nifi Processor with multiple inputs, triggers only after receiving certain flow filesNiFi GetMongo fetches data foreverSyslog to Kafka : Most performant workflow in NIFI?Apache Nifi KafkaTimestamp Creation Issue

What does this say in Elvish?

One-one communication

Intuitive explanation of the rank-nullity theorem

Is there any word for a place full of confusion?

How to save space when writing equations with cases?

If Windows 7 doesn't support WSL, then what is "Subsystem for UNIX-based Applications"?

How to write capital alpha?

Semigroups with no morphisms between them

Deconstruction is ambiguous

What to do with repeated rejections for phd position

Girl Hackers - Logic Puzzle

Significance of Cersei's obsession with elephants?

Why do aircraft stall warning systems use angle-of-attack vanes rather than detecting airflow separation directly?

AppleTVs create a chatty alternate WiFi network

Why is it faster to reheat something than it is to cook it?

An adverb for when you're not exaggerating

Getting prompted for verification code but where do I put it in?

Did any compiler fully use 80-bit floating point?

What makes a man succeed?

Drawing spherical mirrors

Why are vacuum tubes still used in amateur radios?

Why weren't discrete x86 CPUs ever used in game hardware?

How does the math work when buying airline miles?

Crossing US/Canada Border for less than 24 hours



Micro-batching through Nifi



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)
Data science time! April 2019 and salary with experience
The Ask Question Wizard is Live!Data aggregation in Apache NifiWhat is the best practice for nifi production deploymentNiFi Flowfile Attributes from KafkaConsumerCreating larger NiFi flow files when using the ConsumeKafka processorApache Nifi HBASE lookupHow do I extract one key value pair from _consumeKafka_ in Apache Nifi?Nifi Processor with multiple inputs, triggers only after receiving certain flow filesNiFi GetMongo fetches data foreverSyslog to Kafka : Most performant workflow in NIFI?Apache Nifi KafkaTimestamp Creation Issue



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








0















I have a scenario where my kafka messages(from same topic) are flowing through single enrichment pipeline and written at the end into HDFS and MongoDB. My Kafka consumer for HDFS will run on hourly basis(for micro-batching). So I need to know the best possible way to route flowfiles to putHDFS and putMongo based on which consumer it is coming from(Consumer for HDFS or consumer for Mongo DB).



Or please suggest if there is any other way to achieve micro-batching through Nifi.



Thanks










share|improve this question






























    0















    I have a scenario where my kafka messages(from same topic) are flowing through single enrichment pipeline and written at the end into HDFS and MongoDB. My Kafka consumer for HDFS will run on hourly basis(for micro-batching). So I need to know the best possible way to route flowfiles to putHDFS and putMongo based on which consumer it is coming from(Consumer for HDFS or consumer for Mongo DB).



    Or please suggest if there is any other way to achieve micro-batching through Nifi.



    Thanks










    share|improve this question


























      0












      0








      0








      I have a scenario where my kafka messages(from same topic) are flowing through single enrichment pipeline and written at the end into HDFS and MongoDB. My Kafka consumer for HDFS will run on hourly basis(for micro-batching). So I need to know the best possible way to route flowfiles to putHDFS and putMongo based on which consumer it is coming from(Consumer for HDFS or consumer for Mongo DB).



      Or please suggest if there is any other way to achieve micro-batching through Nifi.



      Thanks










      share|improve this question
















      I have a scenario where my kafka messages(from same topic) are flowing through single enrichment pipeline and written at the end into HDFS and MongoDB. My Kafka consumer for HDFS will run on hourly basis(for micro-batching). So I need to know the best possible way to route flowfiles to putHDFS and putMongo based on which consumer it is coming from(Consumer for HDFS or consumer for Mongo DB).



      Or please suggest if there is any other way to achieve micro-batching through Nifi.



      Thanks







      apache-kafka apache-nifi






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Mar 22 at 11:08







      Isha

















      asked Mar 22 at 11:01









      IshaIsha

      11




      11






















          1 Answer
          1






          active

          oldest

          votes


















          0














          You could set Nifi up to use a Scheduling Strategy for the processors that upload data.



          And I would think you want the Kafka consumers to always read data, building a backlog of FlowFiles in NiFi, and then having the puts run on a less-frequent basis.




          This is similar to how Kafka Connect would run for its HDFS Connector






          share|improve this answer























          • Yes so I have two kafka Consumer processor in Nifi, one to serve speed layer which saves data in MongoDB and other for batch layer which saves in HDFS. So the second processor is scheduled on hourly basis. But messages from both the processors go through single enrichment pipeline before written into respective databases. So my question revolves around how am I going to differentiate between messages and route them to correct databases.

            – Isha
            Mar 24 at 6:00











          • It's been a while since I used NiFi. I don't think you can have a single pipeline. Not without copying the FlowFiles somehow before they are sent to a downsteam. That being said, might be better to use two separate Kafka consumer processors, with different group ids

            – cricket_007
            Mar 25 at 19:14












          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55298202%2fmicro-batching-through-nifi%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          0














          You could set Nifi up to use a Scheduling Strategy for the processors that upload data.



          And I would think you want the Kafka consumers to always read data, building a backlog of FlowFiles in NiFi, and then having the puts run on a less-frequent basis.




          This is similar to how Kafka Connect would run for its HDFS Connector






          share|improve this answer























          • Yes so I have two kafka Consumer processor in Nifi, one to serve speed layer which saves data in MongoDB and other for batch layer which saves in HDFS. So the second processor is scheduled on hourly basis. But messages from both the processors go through single enrichment pipeline before written into respective databases. So my question revolves around how am I going to differentiate between messages and route them to correct databases.

            – Isha
            Mar 24 at 6:00











          • It's been a while since I used NiFi. I don't think you can have a single pipeline. Not without copying the FlowFiles somehow before they are sent to a downsteam. That being said, might be better to use two separate Kafka consumer processors, with different group ids

            – cricket_007
            Mar 25 at 19:14
















          0














          You could set Nifi up to use a Scheduling Strategy for the processors that upload data.



          And I would think you want the Kafka consumers to always read data, building a backlog of FlowFiles in NiFi, and then having the puts run on a less-frequent basis.




          This is similar to how Kafka Connect would run for its HDFS Connector






          share|improve this answer























          • Yes so I have two kafka Consumer processor in Nifi, one to serve speed layer which saves data in MongoDB and other for batch layer which saves in HDFS. So the second processor is scheduled on hourly basis. But messages from both the processors go through single enrichment pipeline before written into respective databases. So my question revolves around how am I going to differentiate between messages and route them to correct databases.

            – Isha
            Mar 24 at 6:00











          • It's been a while since I used NiFi. I don't think you can have a single pipeline. Not without copying the FlowFiles somehow before they are sent to a downsteam. That being said, might be better to use two separate Kafka consumer processors, with different group ids

            – cricket_007
            Mar 25 at 19:14














          0












          0








          0







          You could set Nifi up to use a Scheduling Strategy for the processors that upload data.



          And I would think you want the Kafka consumers to always read data, building a backlog of FlowFiles in NiFi, and then having the puts run on a less-frequent basis.




          This is similar to how Kafka Connect would run for its HDFS Connector






          share|improve this answer













          You could set Nifi up to use a Scheduling Strategy for the processors that upload data.



          And I would think you want the Kafka consumers to always read data, building a backlog of FlowFiles in NiFi, and then having the puts run on a less-frequent basis.




          This is similar to how Kafka Connect would run for its HDFS Connector







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Mar 22 at 19:16









          cricket_007cricket_007

          84.6k1147120




          84.6k1147120












          • Yes so I have two kafka Consumer processor in Nifi, one to serve speed layer which saves data in MongoDB and other for batch layer which saves in HDFS. So the second processor is scheduled on hourly basis. But messages from both the processors go through single enrichment pipeline before written into respective databases. So my question revolves around how am I going to differentiate between messages and route them to correct databases.

            – Isha
            Mar 24 at 6:00











          • It's been a while since I used NiFi. I don't think you can have a single pipeline. Not without copying the FlowFiles somehow before they are sent to a downsteam. That being said, might be better to use two separate Kafka consumer processors, with different group ids

            – cricket_007
            Mar 25 at 19:14


















          • Yes so I have two kafka Consumer processor in Nifi, one to serve speed layer which saves data in MongoDB and other for batch layer which saves in HDFS. So the second processor is scheduled on hourly basis. But messages from both the processors go through single enrichment pipeline before written into respective databases. So my question revolves around how am I going to differentiate between messages and route them to correct databases.

            – Isha
            Mar 24 at 6:00











          • It's been a while since I used NiFi. I don't think you can have a single pipeline. Not without copying the FlowFiles somehow before they are sent to a downsteam. That being said, might be better to use two separate Kafka consumer processors, with different group ids

            – cricket_007
            Mar 25 at 19:14

















          Yes so I have two kafka Consumer processor in Nifi, one to serve speed layer which saves data in MongoDB and other for batch layer which saves in HDFS. So the second processor is scheduled on hourly basis. But messages from both the processors go through single enrichment pipeline before written into respective databases. So my question revolves around how am I going to differentiate between messages and route them to correct databases.

          – Isha
          Mar 24 at 6:00





          Yes so I have two kafka Consumer processor in Nifi, one to serve speed layer which saves data in MongoDB and other for batch layer which saves in HDFS. So the second processor is scheduled on hourly basis. But messages from both the processors go through single enrichment pipeline before written into respective databases. So my question revolves around how am I going to differentiate between messages and route them to correct databases.

          – Isha
          Mar 24 at 6:00













          It's been a while since I used NiFi. I don't think you can have a single pipeline. Not without copying the FlowFiles somehow before they are sent to a downsteam. That being said, might be better to use two separate Kafka consumer processors, with different group ids

          – cricket_007
          Mar 25 at 19:14






          It's been a while since I used NiFi. I don't think you can have a single pipeline. Not without copying the FlowFiles somehow before they are sent to a downsteam. That being said, might be better to use two separate Kafka consumer processors, with different group ids

          – cricket_007
          Mar 25 at 19:14




















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55298202%2fmicro-batching-through-nifi%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

          Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

          Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript