Mapper processing different number of linesDifferences between HashMap and Hashtable?What is the difference between public, protected, package-private and private in Java?Difference between StringBuilder and StringBufferInitialization of an ArrayList in one lineWhat's the difference between @Component, @Repository & @Service annotations in Spring?Type mismatch in key from map: expected .. Text, received … LongWritableWhy is it faster to process a sorted array than an unsorted array?Use hive custom outputformat to handle log fileswhy spill failure happens for Custom Data Type in HadoopMap Reduce Array Out of Bounds Exception

What is the most expensive material in the world that could be used to create Pun-Pun's lute?

Discriminated by senior researcher because of my ethnicity

Why does Mind Blank stop the Feeblemind spell?

Multiple options vs single option UI

Extension of 2-adic valuation to the real numbers

Is Diceware more secure than a long passphrase?

Can I criticise the more senior developers around me for not writing clean code?

How come there are so many candidates for the 2020 Democratic party presidential nomination?

How could Tony Stark make this in Endgame?

Why do games have consumables?

"Whatever a Russian does, they end up making the Kalashnikov gun"? Are there any similar proverbs in English?

How to pronounce 'c++' in Spanish

A ​Note ​on ​N!

Can someone publish a story that happened to you?

Aligning equation numbers vertically

Is there any official lore on the Far Realm?

What's the polite way to say "I need to urinate"?

What is the smallest unit of eos?

On The Origin of Dissonant Chords

How did Captain America manage to do this?

How much cash can I safely carry into the USA and avoid civil forfeiture?

Why was the Spitfire's elliptical wing almost uncopied by other aircraft of World War 2?

Don’t seats that recline flat defeat the purpose of having seatbelts?

What happened to Captain America in Endgame?



Mapper processing different number of lines


Differences between HashMap and Hashtable?What is the difference between public, protected, package-private and private in Java?Difference between StringBuilder and StringBufferInitialization of an ArrayList in one lineWhat's the difference between @Component, @Repository & @Service annotations in Spring?Type mismatch in key from map: expected .. Text, received … LongWritableWhy is it faster to process a sorted array than an unsorted array?Use hive custom outputformat to handle log fileswhy spill failure happens for Custom Data Type in HadoopMap Reduce Array Out of Bounds Exception






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








0















So I noticed an odd behavior of my map-reduce code today. Spent 3 hours trying to figure it out, still got nothing.



I am trying to find the answers for 3 simple questions based on this dataset:



The Academy Awards, 1927-2015



  • How many awards were given in a particular year?

  • Which actor/actress
    has received the most awards overall?

  • Which film has received the
    most awards in a ceremony?

I wrote my map reduce code and noticed that the mapper is running for different number of lines for each code.



For Q1 - 3251 lines, Q2 - 3251 lines, and Q3 - 33 lines!



I don't understand why this is happening.



Driver



public static void main(String[] args) throws Exception 

String inputFilePath = "./database.csv";
String outputFilePath = "./<BASED_ON_QUESTION>";

Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "<BASED_ON_QUESTION>");
job.setJarByClass(YearlyAwards.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

FileInputFormat.addInputPath(job, new Path(inputFilePath));

try
File f = new File(outputFilePath);
FileUtils.forceDelete(f);
catch (Exception e)



FileOutputFormat.setOutputPath(job, new Path(outputFilePath));
System.exit(job.waitForCompletion(true) ? 0 : 1);



Mapper



Q1



public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> 
Integer count = 0;

public void map(Object key, Text value, Context context) throws IOException, InterruptedException

String[] quoteLessVal = value.toString().split(""");
value = new Text(String.join("", quoteLessVal));
String[] values = value.toString().split(",");
String year = values[0];
String win = values[3];
count += 1;

if (!win.equals(""))
context.write(new Text(year), new IntWritable(new Integer(win)));



@Override
protected void cleanup(Mapper<Object, Text, Text, IntWritable>.Context context)
throws IOException, InterruptedException
super.cleanup(context);
System.out.println(count);




Q1 Output



Q2



public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> 
Integer count = 0;

public void map(Object key, Text value, Context context) throws IOException, InterruptedException

count += 1;

String[] quoteLessVal = value.toString().split(""");
value = new Text(String.join("", quoteLessVal));
String[] values = value.toString().split(",");
String name = values[4];
String win = values[3];

if (!win.equals(""))
context.write(new Text(name), new IntWritable(new Integer(win)));



@Override
protected void cleanup(Mapper<Object, Text, Text, IntWritable>.Context context)
throws IOException, InterruptedException
super.cleanup(context);
System.out.println(count);




Q2 Output



Q3



public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> 
Integer count = 0;

public void map(Object key, Text value, Context context) throws IOException, InterruptedException

count += 1;

String[] quoteLessVal = value.toString().split(""");
value = new Text(String.join("", quoteLessVal));
String[] values = value.toString().split(",");
String name = values[5];
String win = values[3];
count += 1;

if (!win.equals(""))
context.write(new Text(name), new IntWritable(new Integer(win)));



@Override
protected void cleanup(Mapper<Object, Text, Text, IntWritable>.Context context)
throws IOException, InterruptedException
super.cleanup(context);
System.out.println(count);




Q3 Output



Reducer



(Fairly standard for all Qs)



public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> 

public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException
Integer count = 0;

for (IntWritable val : values)
count += 1;

System.out.println(key + " > " + count);

context.write(key, new IntWritable(count));




I think the code's execution is halting for some reason, as I am not getting any output file (part-r-00000) in the output folder!










share|improve this question




























    0















    So I noticed an odd behavior of my map-reduce code today. Spent 3 hours trying to figure it out, still got nothing.



    I am trying to find the answers for 3 simple questions based on this dataset:



    The Academy Awards, 1927-2015



    • How many awards were given in a particular year?

    • Which actor/actress
      has received the most awards overall?

    • Which film has received the
      most awards in a ceremony?

    I wrote my map reduce code and noticed that the mapper is running for different number of lines for each code.



    For Q1 - 3251 lines, Q2 - 3251 lines, and Q3 - 33 lines!



    I don't understand why this is happening.



    Driver



    public static void main(String[] args) throws Exception 

    String inputFilePath = "./database.csv";
    String outputFilePath = "./<BASED_ON_QUESTION>";

    Configuration conf = new Configuration();
    Job job = Job.getInstance(conf, "<BASED_ON_QUESTION>");
    job.setJarByClass(YearlyAwards.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setInputFormatClass(TextInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);

    FileInputFormat.addInputPath(job, new Path(inputFilePath));

    try
    File f = new File(outputFilePath);
    FileUtils.forceDelete(f);
    catch (Exception e)



    FileOutputFormat.setOutputPath(job, new Path(outputFilePath));
    System.exit(job.waitForCompletion(true) ? 0 : 1);



    Mapper



    Q1



    public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> 
    Integer count = 0;

    public void map(Object key, Text value, Context context) throws IOException, InterruptedException

    String[] quoteLessVal = value.toString().split(""");
    value = new Text(String.join("", quoteLessVal));
    String[] values = value.toString().split(",");
    String year = values[0];
    String win = values[3];
    count += 1;

    if (!win.equals(""))
    context.write(new Text(year), new IntWritable(new Integer(win)));



    @Override
    protected void cleanup(Mapper<Object, Text, Text, IntWritable>.Context context)
    throws IOException, InterruptedException
    super.cleanup(context);
    System.out.println(count);




    Q1 Output



    Q2



    public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> 
    Integer count = 0;

    public void map(Object key, Text value, Context context) throws IOException, InterruptedException

    count += 1;

    String[] quoteLessVal = value.toString().split(""");
    value = new Text(String.join("", quoteLessVal));
    String[] values = value.toString().split(",");
    String name = values[4];
    String win = values[3];

    if (!win.equals(""))
    context.write(new Text(name), new IntWritable(new Integer(win)));



    @Override
    protected void cleanup(Mapper<Object, Text, Text, IntWritable>.Context context)
    throws IOException, InterruptedException
    super.cleanup(context);
    System.out.println(count);




    Q2 Output



    Q3



    public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> 
    Integer count = 0;

    public void map(Object key, Text value, Context context) throws IOException, InterruptedException

    count += 1;

    String[] quoteLessVal = value.toString().split(""");
    value = new Text(String.join("", quoteLessVal));
    String[] values = value.toString().split(",");
    String name = values[5];
    String win = values[3];
    count += 1;

    if (!win.equals(""))
    context.write(new Text(name), new IntWritable(new Integer(win)));



    @Override
    protected void cleanup(Mapper<Object, Text, Text, IntWritable>.Context context)
    throws IOException, InterruptedException
    super.cleanup(context);
    System.out.println(count);




    Q3 Output



    Reducer



    (Fairly standard for all Qs)



    public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> 

    public void reduce(Text key, Iterable<IntWritable> values, Context context)
    throws IOException, InterruptedException
    Integer count = 0;

    for (IntWritable val : values)
    count += 1;

    System.out.println(key + " > " + count);

    context.write(key, new IntWritable(count));




    I think the code's execution is halting for some reason, as I am not getting any output file (part-r-00000) in the output folder!










    share|improve this question
























      0












      0








      0








      So I noticed an odd behavior of my map-reduce code today. Spent 3 hours trying to figure it out, still got nothing.



      I am trying to find the answers for 3 simple questions based on this dataset:



      The Academy Awards, 1927-2015



      • How many awards were given in a particular year?

      • Which actor/actress
        has received the most awards overall?

      • Which film has received the
        most awards in a ceremony?

      I wrote my map reduce code and noticed that the mapper is running for different number of lines for each code.



      For Q1 - 3251 lines, Q2 - 3251 lines, and Q3 - 33 lines!



      I don't understand why this is happening.



      Driver



      public static void main(String[] args) throws Exception 

      String inputFilePath = "./database.csv";
      String outputFilePath = "./<BASED_ON_QUESTION>";

      Configuration conf = new Configuration();
      Job job = Job.getInstance(conf, "<BASED_ON_QUESTION>");
      job.setJarByClass(YearlyAwards.class);
      job.setMapperClass(TokenizerMapper.class);
      job.setCombinerClass(IntSumReducer.class);
      job.setInputFormatClass(TextInputFormat.class);
      job.setOutputFormatClass(TextOutputFormat.class);
      job.setOutputKeyClass(Text.class);
      job.setOutputValueClass(IntWritable.class);

      FileInputFormat.addInputPath(job, new Path(inputFilePath));

      try
      File f = new File(outputFilePath);
      FileUtils.forceDelete(f);
      catch (Exception e)



      FileOutputFormat.setOutputPath(job, new Path(outputFilePath));
      System.exit(job.waitForCompletion(true) ? 0 : 1);



      Mapper



      Q1



      public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> 
      Integer count = 0;

      public void map(Object key, Text value, Context context) throws IOException, InterruptedException

      String[] quoteLessVal = value.toString().split(""");
      value = new Text(String.join("", quoteLessVal));
      String[] values = value.toString().split(",");
      String year = values[0];
      String win = values[3];
      count += 1;

      if (!win.equals(""))
      context.write(new Text(year), new IntWritable(new Integer(win)));



      @Override
      protected void cleanup(Mapper<Object, Text, Text, IntWritable>.Context context)
      throws IOException, InterruptedException
      super.cleanup(context);
      System.out.println(count);




      Q1 Output



      Q2



      public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> 
      Integer count = 0;

      public void map(Object key, Text value, Context context) throws IOException, InterruptedException

      count += 1;

      String[] quoteLessVal = value.toString().split(""");
      value = new Text(String.join("", quoteLessVal));
      String[] values = value.toString().split(",");
      String name = values[4];
      String win = values[3];

      if (!win.equals(""))
      context.write(new Text(name), new IntWritable(new Integer(win)));



      @Override
      protected void cleanup(Mapper<Object, Text, Text, IntWritable>.Context context)
      throws IOException, InterruptedException
      super.cleanup(context);
      System.out.println(count);




      Q2 Output



      Q3



      public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> 
      Integer count = 0;

      public void map(Object key, Text value, Context context) throws IOException, InterruptedException

      count += 1;

      String[] quoteLessVal = value.toString().split(""");
      value = new Text(String.join("", quoteLessVal));
      String[] values = value.toString().split(",");
      String name = values[5];
      String win = values[3];
      count += 1;

      if (!win.equals(""))
      context.write(new Text(name), new IntWritable(new Integer(win)));



      @Override
      protected void cleanup(Mapper<Object, Text, Text, IntWritable>.Context context)
      throws IOException, InterruptedException
      super.cleanup(context);
      System.out.println(count);




      Q3 Output



      Reducer



      (Fairly standard for all Qs)



      public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> 

      public void reduce(Text key, Iterable<IntWritable> values, Context context)
      throws IOException, InterruptedException
      Integer count = 0;

      for (IntWritable val : values)
      count += 1;

      System.out.println(key + " > " + count);

      context.write(key, new IntWritable(count));




      I think the code's execution is halting for some reason, as I am not getting any output file (part-r-00000) in the output folder!










      share|improve this question














      So I noticed an odd behavior of my map-reduce code today. Spent 3 hours trying to figure it out, still got nothing.



      I am trying to find the answers for 3 simple questions based on this dataset:



      The Academy Awards, 1927-2015



      • How many awards were given in a particular year?

      • Which actor/actress
        has received the most awards overall?

      • Which film has received the
        most awards in a ceremony?

      I wrote my map reduce code and noticed that the mapper is running for different number of lines for each code.



      For Q1 - 3251 lines, Q2 - 3251 lines, and Q3 - 33 lines!



      I don't understand why this is happening.



      Driver



      public static void main(String[] args) throws Exception 

      String inputFilePath = "./database.csv";
      String outputFilePath = "./<BASED_ON_QUESTION>";

      Configuration conf = new Configuration();
      Job job = Job.getInstance(conf, "<BASED_ON_QUESTION>");
      job.setJarByClass(YearlyAwards.class);
      job.setMapperClass(TokenizerMapper.class);
      job.setCombinerClass(IntSumReducer.class);
      job.setInputFormatClass(TextInputFormat.class);
      job.setOutputFormatClass(TextOutputFormat.class);
      job.setOutputKeyClass(Text.class);
      job.setOutputValueClass(IntWritable.class);

      FileInputFormat.addInputPath(job, new Path(inputFilePath));

      try
      File f = new File(outputFilePath);
      FileUtils.forceDelete(f);
      catch (Exception e)



      FileOutputFormat.setOutputPath(job, new Path(outputFilePath));
      System.exit(job.waitForCompletion(true) ? 0 : 1);



      Mapper



      Q1



      public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> 
      Integer count = 0;

      public void map(Object key, Text value, Context context) throws IOException, InterruptedException

      String[] quoteLessVal = value.toString().split(""");
      value = new Text(String.join("", quoteLessVal));
      String[] values = value.toString().split(",");
      String year = values[0];
      String win = values[3];
      count += 1;

      if (!win.equals(""))
      context.write(new Text(year), new IntWritable(new Integer(win)));



      @Override
      protected void cleanup(Mapper<Object, Text, Text, IntWritable>.Context context)
      throws IOException, InterruptedException
      super.cleanup(context);
      System.out.println(count);




      Q1 Output



      Q2



      public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> 
      Integer count = 0;

      public void map(Object key, Text value, Context context) throws IOException, InterruptedException

      count += 1;

      String[] quoteLessVal = value.toString().split(""");
      value = new Text(String.join("", quoteLessVal));
      String[] values = value.toString().split(",");
      String name = values[4];
      String win = values[3];

      if (!win.equals(""))
      context.write(new Text(name), new IntWritable(new Integer(win)));



      @Override
      protected void cleanup(Mapper<Object, Text, Text, IntWritable>.Context context)
      throws IOException, InterruptedException
      super.cleanup(context);
      System.out.println(count);




      Q2 Output



      Q3



      public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> 
      Integer count = 0;

      public void map(Object key, Text value, Context context) throws IOException, InterruptedException

      count += 1;

      String[] quoteLessVal = value.toString().split(""");
      value = new Text(String.join("", quoteLessVal));
      String[] values = value.toString().split(",");
      String name = values[5];
      String win = values[3];
      count += 1;

      if (!win.equals(""))
      context.write(new Text(name), new IntWritable(new Integer(win)));



      @Override
      protected void cleanup(Mapper<Object, Text, Text, IntWritable>.Context context)
      throws IOException, InterruptedException
      super.cleanup(context);
      System.out.println(count);




      Q3 Output



      Reducer



      (Fairly standard for all Qs)



      public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> 

      public void reduce(Text key, Iterable<IntWritable> values, Context context)
      throws IOException, InterruptedException
      Integer count = 0;

      for (IntWritable val : values)
      count += 1;

      System.out.println(key + " > " + count);

      context.write(key, new IntWritable(count));




      I think the code's execution is halting for some reason, as I am not getting any output file (part-r-00000) in the output folder!







      java hadoop mapreduce bigdata kaggle






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Mar 22 at 17:27









      Parth TamaneParth Tamane

      407




      407






















          0






          active

          oldest

          votes












          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55304903%2fmapper-processing-different-number-of-lines%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55304903%2fmapper-processing-different-number-of-lines%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

          Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

          Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript