Run your Mapreduce programme in standlone Mode and PseudoDistributed mode

Posted By : Md Qasim Siddiqui | 28-Dec-2014

In this blog, you will read how to run your mapreduce programme in pseudodistributed mode. First of all configure some hadoop configuration files.

  • Core-site.xml
 

  hadoop.tmp.dir
  /app/hadoop/tmp
  A base for other temporary directories.


  fs.default.name
  hdfs://hostname:9000
  The name of the default file system. A URI whose
  scheme and authority determine the FileSystem implementation.
  

 

 

  • Mapred-site.xml


mapred.job.tracker
    hostname:8021


Now write wordcount mapreduce programme.

 package org.apache.hadoop.examples;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {

public static class TokenizerMapper 
   extends Mapper{

private final static IntWritable one = new IntWritable(1);
private Text word = new Text();

public void map(Object key, Text value, Context context
                ) throws IOException, InterruptedException {
  StringTokenizer itr = new StringTokenizer(value.toString());
  while (itr.hasMoreTokens()) {
    word.set(itr.nextToken());
    context.write(word, one);
    }
  }
}

public static class IntSumReducer 
   extends Reducer {
private IntWritable result = new IntWritable();

public void reduce(Text key, Iterable values, 
                   Context context
                   ) throws IOException, InterruptedException {
  int sum = 0;
  for (IntWritable val : values) {
    sum += val.get();
  }
  result.set(sum);
  context.write(key, result);
  }
}

public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();

// Add below lines , to create pseudo distributed mode

Job job = new Job(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(“/home/user/input/”));
FileOutputFormat.setOutputPath(job, new Path(“/home/user/output/”));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
 

This programme will run in standlone mode and take input from your local machine(“/home/user/input/”) and create output in “/home/user/output/”.

To run this programme in Pseudo distributed mode just make small changes in above programme, add these lines in your programme.

 

 conf.set("fs.default.name","hdfs://hostname:9000");
    conf.set("mapred.job.tracker","hostname:8021");
 

Well, after adding these lines your mapreduce programme will run in pseudo distributed mode . Your programme will take input from HDFS directory and also create output in HDFS too.

So, before giving input and output path in above programme create input directory in HDFS and add file into it which you want to process. 

Hope, This solution will help your to run programme in pseudo-distributed mode. 

About Author

Author Image
Md Qasim Siddiqui

Qasim is an experienced web app developer with expertise in groovy and grails,Hadoop , Hive, Mahout, AngularJS and Spring frameworks. He likes to listen music in idle time and plays counter strike.

Request for Proposal

Name is required

Comment is required

Sending message..