Mapper processing different number of linesDifferences between HashMap and Hashtable?What is the difference between public, protected, package-private and private in Java?Difference between StringBuilder and StringBufferInitialization of an ArrayList in one lineWhat's the difference between @Component, @Repository & @Service annotations in Spring?Type mismatch in key from map: expected .. Text, received … LongWritableWhy is it faster to process a sorted array than an unsorted array?Use hive custom outputformat to handle log fileswhy spill failure happens for Custom Data Type in HadoopMap Reduce Array Out of Bounds Exception
What is the most expensive material in the world that could be used to create Pun-Pun's lute?
Discriminated by senior researcher because of my ethnicity
Why does Mind Blank stop the Feeblemind spell?
Multiple options vs single option UI
Extension of 2-adic valuation to the real numbers
Is Diceware more secure than a long passphrase?
Can I criticise the more senior developers around me for not writing clean code?
How come there are so many candidates for the 2020 Democratic party presidential nomination?
How could Tony Stark make this in Endgame?
Why do games have consumables?
"Whatever a Russian does, they end up making the Kalashnikov gun"? Are there any similar proverbs in English?
How to pronounce 'c++' in Spanish
A Note on N!
Can someone publish a story that happened to you?
Aligning equation numbers vertically
Is there any official lore on the Far Realm?
What's the polite way to say "I need to urinate"?
What is the smallest unit of eos?
On The Origin of Dissonant Chords
How did Captain America manage to do this?
How much cash can I safely carry into the USA and avoid civil forfeiture?
Why was the Spitfire's elliptical wing almost uncopied by other aircraft of World War 2?
Don’t seats that recline flat defeat the purpose of having seatbelts?
What happened to Captain America in Endgame?
Mapper processing different number of lines
Differences between HashMap and Hashtable?What is the difference between public, protected, package-private and private in Java?Difference between StringBuilder and StringBufferInitialization of an ArrayList in one lineWhat's the difference between @Component, @Repository & @Service annotations in Spring?Type mismatch in key from map: expected .. Text, received … LongWritableWhy is it faster to process a sorted array than an unsorted array?Use hive custom outputformat to handle log fileswhy spill failure happens for Custom Data Type in HadoopMap Reduce Array Out of Bounds Exception
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
So I noticed an odd behavior of my map-reduce code today. Spent 3 hours trying to figure it out, still got nothing.
I am trying to find the answers for 3 simple questions based on this dataset:
The Academy Awards, 1927-2015
- How many awards were given in a particular year?
- Which actor/actress
has received the most awards overall? - Which film has received the
most awards in a ceremony?
I wrote my map reduce code and noticed that the mapper is running for different number of lines for each code.
For Q1 - 3251 lines, Q2 - 3251 lines, and Q3 - 33 lines!
I don't understand why this is happening.
Driver
public static void main(String[] args) throws Exception
String inputFilePath = "./database.csv";
String outputFilePath = "./<BASED_ON_QUESTION>";
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "<BASED_ON_QUESTION>");
job.setJarByClass(YearlyAwards.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(inputFilePath));
try
File f = new File(outputFilePath);
FileUtils.forceDelete(f);
catch (Exception e)
FileOutputFormat.setOutputPath(job, new Path(outputFilePath));
System.exit(job.waitForCompletion(true) ? 0 : 1);
Mapper
Q1
public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>
Integer count = 0;
public void map(Object key, Text value, Context context) throws IOException, InterruptedException
String[] quoteLessVal = value.toString().split(""");
value = new Text(String.join("", quoteLessVal));
String[] values = value.toString().split(",");
String year = values[0];
String win = values[3];
count += 1;
if (!win.equals(""))
context.write(new Text(year), new IntWritable(new Integer(win)));
@Override
protected void cleanup(Mapper<Object, Text, Text, IntWritable>.Context context)
throws IOException, InterruptedException
super.cleanup(context);
System.out.println(count);
Q2
public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>
Integer count = 0;
public void map(Object key, Text value, Context context) throws IOException, InterruptedException
count += 1;
String[] quoteLessVal = value.toString().split(""");
value = new Text(String.join("", quoteLessVal));
String[] values = value.toString().split(",");
String name = values[4];
String win = values[3];
if (!win.equals(""))
context.write(new Text(name), new IntWritable(new Integer(win)));
@Override
protected void cleanup(Mapper<Object, Text, Text, IntWritable>.Context context)
throws IOException, InterruptedException
super.cleanup(context);
System.out.println(count);
Q3
public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>
Integer count = 0;
public void map(Object key, Text value, Context context) throws IOException, InterruptedException
count += 1;
String[] quoteLessVal = value.toString().split(""");
value = new Text(String.join("", quoteLessVal));
String[] values = value.toString().split(",");
String name = values[5];
String win = values[3];
count += 1;
if (!win.equals(""))
context.write(new Text(name), new IntWritable(new Integer(win)));
@Override
protected void cleanup(Mapper<Object, Text, Text, IntWritable>.Context context)
throws IOException, InterruptedException
super.cleanup(context);
System.out.println(count);
Reducer
(Fairly standard for all Qs)
public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable>
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException
Integer count = 0;
for (IntWritable val : values)
count += 1;
System.out.println(key + " > " + count);
context.write(key, new IntWritable(count));
I think the code's execution is halting for some reason, as I am not getting any output file (part-r-00000) in the output folder!
java hadoop mapreduce bigdata kaggle
add a comment |
So I noticed an odd behavior of my map-reduce code today. Spent 3 hours trying to figure it out, still got nothing.
I am trying to find the answers for 3 simple questions based on this dataset:
The Academy Awards, 1927-2015
- How many awards were given in a particular year?
- Which actor/actress
has received the most awards overall? - Which film has received the
most awards in a ceremony?
I wrote my map reduce code and noticed that the mapper is running for different number of lines for each code.
For Q1 - 3251 lines, Q2 - 3251 lines, and Q3 - 33 lines!
I don't understand why this is happening.
Driver
public static void main(String[] args) throws Exception
String inputFilePath = "./database.csv";
String outputFilePath = "./<BASED_ON_QUESTION>";
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "<BASED_ON_QUESTION>");
job.setJarByClass(YearlyAwards.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(inputFilePath));
try
File f = new File(outputFilePath);
FileUtils.forceDelete(f);
catch (Exception e)
FileOutputFormat.setOutputPath(job, new Path(outputFilePath));
System.exit(job.waitForCompletion(true) ? 0 : 1);
Mapper
Q1
public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>
Integer count = 0;
public void map(Object key, Text value, Context context) throws IOException, InterruptedException
String[] quoteLessVal = value.toString().split(""");
value = new Text(String.join("", quoteLessVal));
String[] values = value.toString().split(",");
String year = values[0];
String win = values[3];
count += 1;
if (!win.equals(""))
context.write(new Text(year), new IntWritable(new Integer(win)));
@Override
protected void cleanup(Mapper<Object, Text, Text, IntWritable>.Context context)
throws IOException, InterruptedException
super.cleanup(context);
System.out.println(count);
Q2
public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>
Integer count = 0;
public void map(Object key, Text value, Context context) throws IOException, InterruptedException
count += 1;
String[] quoteLessVal = value.toString().split(""");
value = new Text(String.join("", quoteLessVal));
String[] values = value.toString().split(",");
String name = values[4];
String win = values[3];
if (!win.equals(""))
context.write(new Text(name), new IntWritable(new Integer(win)));
@Override
protected void cleanup(Mapper<Object, Text, Text, IntWritable>.Context context)
throws IOException, InterruptedException
super.cleanup(context);
System.out.println(count);
Q3
public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>
Integer count = 0;
public void map(Object key, Text value, Context context) throws IOException, InterruptedException
count += 1;
String[] quoteLessVal = value.toString().split(""");
value = new Text(String.join("", quoteLessVal));
String[] values = value.toString().split(",");
String name = values[5];
String win = values[3];
count += 1;
if (!win.equals(""))
context.write(new Text(name), new IntWritable(new Integer(win)));
@Override
protected void cleanup(Mapper<Object, Text, Text, IntWritable>.Context context)
throws IOException, InterruptedException
super.cleanup(context);
System.out.println(count);
Reducer
(Fairly standard for all Qs)
public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable>
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException
Integer count = 0;
for (IntWritable val : values)
count += 1;
System.out.println(key + " > " + count);
context.write(key, new IntWritable(count));
I think the code's execution is halting for some reason, as I am not getting any output file (part-r-00000) in the output folder!
java hadoop mapreduce bigdata kaggle
add a comment |
So I noticed an odd behavior of my map-reduce code today. Spent 3 hours trying to figure it out, still got nothing.
I am trying to find the answers for 3 simple questions based on this dataset:
The Academy Awards, 1927-2015
- How many awards were given in a particular year?
- Which actor/actress
has received the most awards overall? - Which film has received the
most awards in a ceremony?
I wrote my map reduce code and noticed that the mapper is running for different number of lines for each code.
For Q1 - 3251 lines, Q2 - 3251 lines, and Q3 - 33 lines!
I don't understand why this is happening.
Driver
public static void main(String[] args) throws Exception
String inputFilePath = "./database.csv";
String outputFilePath = "./<BASED_ON_QUESTION>";
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "<BASED_ON_QUESTION>");
job.setJarByClass(YearlyAwards.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(inputFilePath));
try
File f = new File(outputFilePath);
FileUtils.forceDelete(f);
catch (Exception e)
FileOutputFormat.setOutputPath(job, new Path(outputFilePath));
System.exit(job.waitForCompletion(true) ? 0 : 1);
Mapper
Q1
public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>
Integer count = 0;
public void map(Object key, Text value, Context context) throws IOException, InterruptedException
String[] quoteLessVal = value.toString().split(""");
value = new Text(String.join("", quoteLessVal));
String[] values = value.toString().split(",");
String year = values[0];
String win = values[3];
count += 1;
if (!win.equals(""))
context.write(new Text(year), new IntWritable(new Integer(win)));
@Override
protected void cleanup(Mapper<Object, Text, Text, IntWritable>.Context context)
throws IOException, InterruptedException
super.cleanup(context);
System.out.println(count);
Q2
public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>
Integer count = 0;
public void map(Object key, Text value, Context context) throws IOException, InterruptedException
count += 1;
String[] quoteLessVal = value.toString().split(""");
value = new Text(String.join("", quoteLessVal));
String[] values = value.toString().split(",");
String name = values[4];
String win = values[3];
if (!win.equals(""))
context.write(new Text(name), new IntWritable(new Integer(win)));
@Override
protected void cleanup(Mapper<Object, Text, Text, IntWritable>.Context context)
throws IOException, InterruptedException
super.cleanup(context);
System.out.println(count);
Q3
public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>
Integer count = 0;
public void map(Object key, Text value, Context context) throws IOException, InterruptedException
count += 1;
String[] quoteLessVal = value.toString().split(""");
value = new Text(String.join("", quoteLessVal));
String[] values = value.toString().split(",");
String name = values[5];
String win = values[3];
count += 1;
if (!win.equals(""))
context.write(new Text(name), new IntWritable(new Integer(win)));
@Override
protected void cleanup(Mapper<Object, Text, Text, IntWritable>.Context context)
throws IOException, InterruptedException
super.cleanup(context);
System.out.println(count);
Reducer
(Fairly standard for all Qs)
public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable>
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException
Integer count = 0;
for (IntWritable val : values)
count += 1;
System.out.println(key + " > " + count);
context.write(key, new IntWritable(count));
I think the code's execution is halting for some reason, as I am not getting any output file (part-r-00000) in the output folder!
java hadoop mapreduce bigdata kaggle
So I noticed an odd behavior of my map-reduce code today. Spent 3 hours trying to figure it out, still got nothing.
I am trying to find the answers for 3 simple questions based on this dataset:
The Academy Awards, 1927-2015
- How many awards were given in a particular year?
- Which actor/actress
has received the most awards overall? - Which film has received the
most awards in a ceremony?
I wrote my map reduce code and noticed that the mapper is running for different number of lines for each code.
For Q1 - 3251 lines, Q2 - 3251 lines, and Q3 - 33 lines!
I don't understand why this is happening.
Driver
public static void main(String[] args) throws Exception
String inputFilePath = "./database.csv";
String outputFilePath = "./<BASED_ON_QUESTION>";
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "<BASED_ON_QUESTION>");
job.setJarByClass(YearlyAwards.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(inputFilePath));
try
File f = new File(outputFilePath);
FileUtils.forceDelete(f);
catch (Exception e)
FileOutputFormat.setOutputPath(job, new Path(outputFilePath));
System.exit(job.waitForCompletion(true) ? 0 : 1);
Mapper
Q1
public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>
Integer count = 0;
public void map(Object key, Text value, Context context) throws IOException, InterruptedException
String[] quoteLessVal = value.toString().split(""");
value = new Text(String.join("", quoteLessVal));
String[] values = value.toString().split(",");
String year = values[0];
String win = values[3];
count += 1;
if (!win.equals(""))
context.write(new Text(year), new IntWritable(new Integer(win)));
@Override
protected void cleanup(Mapper<Object, Text, Text, IntWritable>.Context context)
throws IOException, InterruptedException
super.cleanup(context);
System.out.println(count);
Q2
public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>
Integer count = 0;
public void map(Object key, Text value, Context context) throws IOException, InterruptedException
count += 1;
String[] quoteLessVal = value.toString().split(""");
value = new Text(String.join("", quoteLessVal));
String[] values = value.toString().split(",");
String name = values[4];
String win = values[3];
if (!win.equals(""))
context.write(new Text(name), new IntWritable(new Integer(win)));
@Override
protected void cleanup(Mapper<Object, Text, Text, IntWritable>.Context context)
throws IOException, InterruptedException
super.cleanup(context);
System.out.println(count);
Q3
public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>
Integer count = 0;
public void map(Object key, Text value, Context context) throws IOException, InterruptedException
count += 1;
String[] quoteLessVal = value.toString().split(""");
value = new Text(String.join("", quoteLessVal));
String[] values = value.toString().split(",");
String name = values[5];
String win = values[3];
count += 1;
if (!win.equals(""))
context.write(new Text(name), new IntWritable(new Integer(win)));
@Override
protected void cleanup(Mapper<Object, Text, Text, IntWritable>.Context context)
throws IOException, InterruptedException
super.cleanup(context);
System.out.println(count);
Reducer
(Fairly standard for all Qs)
public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable>
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException
Integer count = 0;
for (IntWritable val : values)
count += 1;
System.out.println(key + " > " + count);
context.write(key, new IntWritable(count));
I think the code's execution is halting for some reason, as I am not getting any output file (part-r-00000) in the output folder!
java hadoop mapreduce bigdata kaggle
java hadoop mapreduce bigdata kaggle
asked Mar 22 at 17:27
Parth TamaneParth Tamane
407
407
add a comment |
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55304903%2fmapper-processing-different-number-of-lines%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55304903%2fmapper-processing-different-number-of-lines%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown