Mapper processing different number of linesDifferences between HashMap and Hashtable?What is the difference between public, protected, package-private and private in Java?Difference between StringBuilder and StringBufferInitialization of an ArrayList in one lineWhat's the difference between @Component, @Repository & @Service annotations in Spring?Type mismatch in key from map: expected .. Text, received … LongWritableWhy is it faster to process a sorted array than an unsorted array?Use hive custom outputformat to handle log fileswhy spill failure happens for Custom Data Type in HadoopMap Reduce Array Out of Bounds Exception

What is the most expensive material in the world that could be used to create Pun-Pun's lute?

Discriminated by senior researcher because of my ethnicity

Why does Mind Blank stop the Feeblemind spell?

Multiple options vs single option UI

Extension of 2-adic valuation to the real numbers

Is Diceware more secure than a long passphrase?

Can I criticise the more senior developers around me for not writing clean code?

How come there are so many candidates for the 2020 Democratic party presidential nomination?

How could Tony Stark make this in Endgame?

Why do games have consumables?

"Whatever a Russian does, they end up making the Kalashnikov gun"? Are there any similar proverbs in English?

How to pronounce 'c++' in Spanish

A ​Note ​on ​N!

Can someone publish a story that happened to you?

Aligning equation numbers vertically

Is there any official lore on the Far Realm?

What's the polite way to say "I need to urinate"?

What is the smallest unit of eos?

On The Origin of Dissonant Chords

How did Captain America manage to do this?

How much cash can I safely carry into the USA and avoid civil forfeiture?

Why was the Spitfire's elliptical wing almost uncopied by other aircraft of World War 2?

Don’t seats that recline flat defeat the purpose of having seatbelts?

What happened to Captain America in Endgame?



Mapper processing different number of lines


Differences between HashMap and Hashtable?What is the difference between public, protected, package-private and private in Java?Difference between StringBuilder and StringBufferInitialization of an ArrayList in one lineWhat's the difference between @Component, @Repository & @Service annotations in Spring?Type mismatch in key from map: expected .. Text, received … LongWritableWhy is it faster to process a sorted array than an unsorted array?Use hive custom outputformat to handle log fileswhy spill failure happens for Custom Data Type in HadoopMap Reduce Array Out of Bounds Exception






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








0















So I noticed an odd behavior of my map-reduce code today. Spent 3 hours trying to figure it out, still got nothing.



I am trying to find the answers for 3 simple questions based on this dataset:



The Academy Awards, 1927-2015



  • How many awards were given in a particular year?

  • Which actor/actress
    has received the most awards overall?

  • Which film has received the
    most awards in a ceremony?

I wrote my map reduce code and noticed that the mapper is running for different number of lines for each code.



For Q1 - 3251 lines, Q2 - 3251 lines, and Q3 - 33 lines!



I don't understand why this is happening.



Driver



public static void main(String[] args) throws Exception 

String inputFilePath = "./database.csv";
String outputFilePath = "./<BASED_ON_QUESTION>";

Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "<BASED_ON_QUESTION>");
job.setJarByClass(YearlyAwards.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

FileInputFormat.addInputPath(job, new Path(inputFilePath));

try
File f = new File(outputFilePath);
FileUtils.forceDelete(f);
catch (Exception e)



FileOutputFormat.setOutputPath(job, new Path(outputFilePath));
System.exit(job.waitForCompletion(true) ? 0 : 1);



Mapper



Q1



public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> 
Integer count = 0;

public void map(Object key, Text value, Context context) throws IOException, InterruptedException

String[] quoteLessVal = value.toString().split(""");
value = new Text(String.join("", quoteLessVal));
String[] values = value.toString().split(",");
String year = values[0];
String win = values[3];
count += 1;

if (!win.equals(""))
context.write(new Text(year), new IntWritable(new Integer(win)));



@Override
protected void cleanup(Mapper<Object, Text, Text, IntWritable>.Context context)
throws IOException, InterruptedException
super.cleanup(context);
System.out.println(count);




Q1 Output



Q2



public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> 
Integer count = 0;

public void map(Object key, Text value, Context context) throws IOException, InterruptedException

count += 1;

String[] quoteLessVal = value.toString().split(""");
value = new Text(String.join("", quoteLessVal));
String[] values = value.toString().split(",");
String name = values[4];
String win = values[3];

if (!win.equals(""))
context.write(new Text(name), new IntWritable(new Integer(win)));



@Override
protected void cleanup(Mapper<Object, Text, Text, IntWritable>.Context context)
throws IOException, InterruptedException
super.cleanup(context);
System.out.println(count);




Q2 Output



Q3



public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> 
Integer count = 0;

public void map(Object key, Text value, Context context) throws IOException, InterruptedException

count += 1;

String[] quoteLessVal = value.toString().split(""");
value = new Text(String.join("", quoteLessVal));
String[] values = value.toString().split(",");
String name = values[5];
String win = values[3];
count += 1;

if (!win.equals(""))
context.write(new Text(name), new IntWritable(new Integer(win)));



@Override
protected void cleanup(Mapper<Object, Text, Text, IntWritable>.Context context)
throws IOException, InterruptedException
super.cleanup(context);
System.out.println(count);




Q3 Output



Reducer



(Fairly standard for all Qs)



public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> 

public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException
Integer count = 0;

for (IntWritable val : values)
count += 1;

System.out.println(key + " > " + count);

context.write(key, new IntWritable(count));




I think the code's execution is halting for some reason, as I am not getting any output file (part-r-00000) in the output folder!










share|improve this question




























    0















    So I noticed an odd behavior of my map-reduce code today. Spent 3 hours trying to figure it out, still got nothing.



    I am trying to find the answers for 3 simple questions based on this dataset:



    The Academy Awards, 1927-2015



    • How many awards were given in a particular year?

    • Which actor/actress
      has received the most awards overall?

    • Which film has received the
      most awards in a ceremony?

    I wrote my map reduce code and noticed that the mapper is running for different number of lines for each code.



    For Q1 - 3251 lines, Q2 - 3251 lines, and Q3 - 33 lines!



    I don't understand why this is happening.



    Driver



    public static void main(String[] args) throws Exception 

    String inputFilePath = "./database.csv";
    String outputFilePath = "./<BASED_ON_QUESTION>";

    Configuration conf = new Configuration();
    Job job = Job.getInstance(conf, "<BASED_ON_QUESTION>");
    job.setJarByClass(YearlyAwards.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setInputFormatClass(TextInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);

    FileInputFormat.addInputPath(job, new Path(inputFilePath));

    try
    File f = new File(outputFilePath);
    FileUtils.forceDelete(f);
    catch (Exception e)



    FileOutputFormat.setOutputPath(job, new Path(outputFilePath));
    System.exit(job.waitForCompletion(true) ? 0 : 1);



    Mapper



    Q1



    public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> 
    Integer count = 0;

    public void map(Object key, Text value, Context context) throws IOException, InterruptedException

    String[] quoteLessVal = value.toString().split(""");
    value = new Text(String.join("", quoteLessVal));
    String[] values = value.toString().split(",");
    String year = values[0];
    String win = values[3];
    count += 1;

    if (!win.equals(""))
    context.write(new Text(year), new IntWritable(new Integer(win)));



    @Override
    protected void cleanup(Mapper<Object, Text, Text, IntWritable>.Context context)
    throws IOException, InterruptedException
    super.cleanup(context);
    System.out.println(count);




    Q1 Output



    Q2



    public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> 
    Integer count = 0;

    public void map(Object key, Text value, Context context) throws IOException, InterruptedException

    count += 1;

    String[] quoteLessVal = value.toString().split(""");
    value = new Text(String.join("", quoteLessVal));
    String[] values = value.toString().split(",");
    String name = values[4];
    String win = values[3];

    if (!win.equals(""))
    context.write(new Text(name), new IntWritable(new Integer(win)));



    @Override
    protected void cleanup(Mapper<Object, Text, Text, IntWritable>.Context context)
    throws IOException, InterruptedException
    super.cleanup(context);
    System.out.println(count);




    Q2 Output



    Q3



    public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> 
    Integer count = 0;

    public void map(Object key, Text value, Context context) throws IOException, InterruptedException

    count += 1;

    String[] quoteLessVal = value.toString().split(""");
    value = new Text(String.join("", quoteLessVal));
    String[] values = value.toString().split(",");
    String name = values[5];
    String win = values[3];
    count += 1;

    if (!win.equals(""))
    context.write(new Text(name), new IntWritable(new Integer(win)));



    @Override
    protected void cleanup(Mapper<Object, Text, Text, IntWritable>.Context context)
    throws IOException, InterruptedException
    super.cleanup(context);
    System.out.println(count);




    Q3 Output



    Reducer



    (Fairly standard for all Qs)



    public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> 

    public void reduce(Text key, Iterable<IntWritable> values, Context context)
    throws IOException, InterruptedException
    Integer count = 0;

    for (IntWritable val : values)
    count += 1;

    System.out.println(key + " > " + count);

    context.write(key, new IntWritable(count));




    I think the code's execution is halting for some reason, as I am not getting any output file (part-r-00000) in the output folder!










    share|improve this question
























      0












      0








      0








      So I noticed an odd behavior of my map-reduce code today. Spent 3 hours trying to figure it out, still got nothing.



      I am trying to find the answers for 3 simple questions based on this dataset:



      The Academy Awards, 1927-2015



      • How many awards were given in a particular year?

      • Which actor/actress
        has received the most awards overall?

      • Which film has received the
        most awards in a ceremony?

      I wrote my map reduce code and noticed that the mapper is running for different number of lines for each code.



      For Q1 - 3251 lines, Q2 - 3251 lines, and Q3 - 33 lines!



      I don't understand why this is happening.



      Driver



      public static void main(String[] args) throws Exception 

      String inputFilePath = "./database.csv";
      String outputFilePath = "./<BASED_ON_QUESTION>";

      Configuration conf = new Configuration();
      Job job = Job.getInstance(conf, "<BASED_ON_QUESTION>");
      job.setJarByClass(YearlyAwards.class);
      job.setMapperClass(TokenizerMapper.class);
      job.setCombinerClass(IntSumReducer.class);
      job.setInputFormatClass(TextInputFormat.class);
      job.setOutputFormatClass(TextOutputFormat.class);
      job.setOutputKeyClass(Text.class);
      job.setOutputValueClass(IntWritable.class);

      FileInputFormat.addInputPath(job, new Path(inputFilePath));

      try
      File f = new File(outputFilePath);
      FileUtils.forceDelete(f);
      catch (Exception e)



      FileOutputFormat.setOutputPath(job, new Path(outputFilePath));
      System.exit(job.waitForCompletion(true) ? 0 : 1);



      Mapper



      Q1



      public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> 
      Integer count = 0;

      public void map(Object key, Text value, Context context) throws IOException, InterruptedException

      String[] quoteLessVal = value.toString().split(""");
      value = new Text(String.join("", quoteLessVal));
      String[] values = value.toString().split(",");
      String year = values[0];
      String win = values[3];
      count += 1;

      if (!win.equals(""))
      context.write(new Text(year), new IntWritable(new Integer(win)));



      @Override
      protected void cleanup(Mapper<Object, Text, Text, IntWritable>.Context context)
      throws IOException, InterruptedException
      super.cleanup(context);
      System.out.println(count);




      Q1 Output



      Q2



      public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> 
      Integer count = 0;

      public void map(Object key, Text value, Context context) throws IOException, InterruptedException

      count += 1;

      String[] quoteLessVal = value.toString().split(""");
      value = new Text(String.join("", quoteLessVal));
      String[] values = value.toString().split(",");
      String name = values[4];
      String win = values[3];

      if (!win.equals(""))
      context.write(new Text(name), new IntWritable(new Integer(win)));



      @Override
      protected void cleanup(Mapper<Object, Text, Text, IntWritable>.Context context)
      throws IOException, InterruptedException
      super.cleanup(context);
      System.out.println(count);




      Q2 Output



      Q3



      public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> 
      Integer count = 0;

      public void map(Object key, Text value, Context context) throws IOException, InterruptedException

      count += 1;

      String[] quoteLessVal = value.toString().split(""");
      value = new Text(String.join("", quoteLessVal));
      String[] values = value.toString().split(",");
      String name = values[5];
      String win = values[3];
      count += 1;

      if (!win.equals(""))
      context.write(new Text(name), new IntWritable(new Integer(win)));



      @Override
      protected void cleanup(Mapper<Object, Text, Text, IntWritable>.Context context)
      throws IOException, InterruptedException
      super.cleanup(context);
      System.out.println(count);




      Q3 Output



      Reducer



      (Fairly standard for all Qs)



      public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> 

      public void reduce(Text key, Iterable<IntWritable> values, Context context)
      throws IOException, InterruptedException
      Integer count = 0;

      for (IntWritable val : values)
      count += 1;

      System.out.println(key + " > " + count);

      context.write(key, new IntWritable(count));




      I think the code's execution is halting for some reason, as I am not getting any output file (part-r-00000) in the output folder!










      share|improve this question














      So I noticed an odd behavior of my map-reduce code today. Spent 3 hours trying to figure it out, still got nothing.



      I am trying to find the answers for 3 simple questions based on this dataset:



      The Academy Awards, 1927-2015



      • How many awards were given in a particular year?

      • Which actor/actress
        has received the most awards overall?

      • Which film has received the
        most awards in a ceremony?

      I wrote my map reduce code and noticed that the mapper is running for different number of lines for each code.



      For Q1 - 3251 lines, Q2 - 3251 lines, and Q3 - 33 lines!



      I don't understand why this is happening.



      Driver



      public static void main(String[] args) throws Exception 

      String inputFilePath = "./database.csv";
      String outputFilePath = "./<BASED_ON_QUESTION>";

      Configuration conf = new Configuration();
      Job job = Job.getInstance(conf, "<BASED_ON_QUESTION>");
      job.setJarByClass(YearlyAwards.class);
      job.setMapperClass(TokenizerMapper.class);
      job.setCombinerClass(IntSumReducer.class);
      job.setInputFormatClass(TextInputFormat.class);
      job.setOutputFormatClass(TextOutputFormat.class);
      job.setOutputKeyClass(Text.class);
      job.setOutputValueClass(IntWritable.class);

      FileInputFormat.addInputPath(job, new Path(inputFilePath));

      try
      File f = new File(outputFilePath);
      FileUtils.forceDelete(f);
      catch (Exception e)



      FileOutputFormat.setOutputPath(job, new Path(outputFilePath));
      System.exit(job.waitForCompletion(true) ? 0 : 1);



      Mapper



      Q1



      public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> 
      Integer count = 0;

      public void map(Object key, Text value, Context context) throws IOException, InterruptedException

      String[] quoteLessVal = value.toString().split(""");
      value = new Text(String.join("", quoteLessVal));
      String[] values = value.toString().split(",");
      String year = values[0];
      String win = values[3];
      count += 1;

      if (!win.equals(""))
      context.write(new Text(year), new IntWritable(new Integer(win)));



      @Override
      protected void cleanup(Mapper<Object, Text, Text, IntWritable>.Context context)
      throws IOException, InterruptedException
      super.cleanup(context);
      System.out.println(count);




      Q1 Output



      Q2



      public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> 
      Integer count = 0;

      public void map(Object key, Text value, Context context) throws IOException, InterruptedException

      count += 1;

      String[] quoteLessVal = value.toString().split(""");
      value = new Text(String.join("", quoteLessVal));
      String[] values = value.toString().split(",");
      String name = values[4];
      String win = values[3];

      if (!win.equals(""))
      context.write(new Text(name), new IntWritable(new Integer(win)));



      @Override
      protected void cleanup(Mapper<Object, Text, Text, IntWritable>.Context context)
      throws IOException, InterruptedException
      super.cleanup(context);
      System.out.println(count);




      Q2 Output



      Q3



      public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> 
      Integer count = 0;

      public void map(Object key, Text value, Context context) throws IOException, InterruptedException

      count += 1;

      String[] quoteLessVal = value.toString().split(""");
      value = new Text(String.join("", quoteLessVal));
      String[] values = value.toString().split(",");
      String name = values[5];
      String win = values[3];
      count += 1;

      if (!win.equals(""))
      context.write(new Text(name), new IntWritable(new Integer(win)));



      @Override
      protected void cleanup(Mapper<Object, Text, Text, IntWritable>.Context context)
      throws IOException, InterruptedException
      super.cleanup(context);
      System.out.println(count);




      Q3 Output



      Reducer



      (Fairly standard for all Qs)



      public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> 

      public void reduce(Text key, Iterable<IntWritable> values, Context context)
      throws IOException, InterruptedException
      Integer count = 0;

      for (IntWritable val : values)
      count += 1;

      System.out.println(key + " > " + count);

      context.write(key, new IntWritable(count));




      I think the code's execution is halting for some reason, as I am not getting any output file (part-r-00000) in the output folder!







      java hadoop mapreduce bigdata kaggle






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Mar 22 at 17:27









      Parth TamaneParth Tamane

      407




      407






















          0






          active

          oldest

          votes












          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55304903%2fmapper-processing-different-number-of-lines%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55304903%2fmapper-processing-different-number-of-lines%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

          SQL error code 1064 with creating Laravel foreign keysForeign key constraints: When to use ON UPDATE and ON DELETEDropping column with foreign key Laravel error: General error: 1025 Error on renameLaravel SQL Can't create tableLaravel Migration foreign key errorLaravel php artisan migrate:refresh giving a syntax errorSQLSTATE[42S01]: Base table or view already exists or Base table or view already exists: 1050 Tableerror in migrating laravel file to xampp serverSyntax error or access violation: 1064:syntax to use near 'unsigned not null, modelName varchar(191) not null, title varchar(191) not nLaravel cannot create new table field in mysqlLaravel 5.7:Last migration creates table but is not registered in the migration table

          은진 송씨 목차 역사 본관 분파 인물 조선 왕실과의 인척 관계 집성촌 항렬자 인구 같이 보기 각주 둘러보기 메뉴은진 송씨세종실록 149권, 지리지 충청도 공주목 은진현