Run your Mapreduce programme in standlone Mode and PseudoDistributed mode

Posted By Md Qasim Siddiqui | 28-Dec-2014

In this blog, you will read how to run your mapreduce programme in pseudodistributed mode. First of all configure some hadoop configuration files.

  • Core-site.xml
 

  hadoop.tmp.dir
  /app/hadoop/tmp
  A base for other temporary directories.


  fs.default.name
  hdfs://hostname:9000
  The name of the default file system. A URI whose
  scheme and authority determine the FileSystem implementation.
  

 

 

  • Mapred-site.xml


mapred.job.tracker
    hostname:8021


Now write wordcount mapreduce programme.

 package org.apache.hadoop.examples;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {

public static class TokenizerMapper 
   extends Mapper{

private final static IntWritable one = new IntWritable(1);
private Text word = new Text();

public void map(Object key, Text value, Context context
                ) throws IOException, InterruptedException {
  StringTokenizer itr = new StringTokenizer(value.toString());
  while (itr.hasMoreTokens()) {
    word.set(itr.nextToken());
    context.write(word, one);
    }
  }
}

public static class IntSumReducer 
   extends Reducer {
private IntWritable result = new IntWritable();

public void reduce(Text key, Iterable values, 
                   Context context
                   ) throws IOException, InterruptedException {
  int sum = 0;
  for (IntWritable val : values) {
    sum += val.get();
  }
  result.set(sum);
  context.write(key, result);
  }
}

public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();

// Add below lines , to create pseudo distributed mode

Job job = new Job(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(“/home/user/input/”));
FileOutputFormat.setOutputPath(job, new Path(“/home/user/output/”));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
 

This programme will run in standlone mode and take input from your local machine(“/home/user/input/”) and create output in “/home/user/output/”.

To run this programme in Pseudo distributed mode just make small changes in above programme, add these lines in your programme.

 

 conf.set("fs.default.name","hdfs://hostname:9000");
    conf.set("mapred.job.tracker","hostname:8021");
 

Well, after adding these lines your mapreduce programme will run in pseudo distributed mode . Your programme will take input from HDFS directory and also create output in HDFS too.

So, before giving input and output path in above programme create input directory in HDFS and add file into it which you want to process. 

Hope, This solution will help your to run programme in pseudo-distributed mode. 

Request for Proposal

Recaptcha is required.

Sending message..