• Prequisite:
        Install hadoop.

    Hadoop installation
                      1) downloaded tar file of hadoop from apache and set hadoop path in .bashrc .

    NOTE: using hadoop 2.6.0 and mongodbConnector r1.4.0-rc0

    if you are using maven to build your project then follow these steps to process mongodb data with hadoop . 

    step 1 - Add dependency into your pom.xml file and also download jars which will be required later to run mapreduce programme from command line
                 click here to download mongodbConnector jars

    step 2 - Create maven based java project 'HadoopWithMongo'

    step 3 - Add mongo-hadoop-core-1.4-rc0 dependency into pom.xml file

    step 4 - Add hadoop liberaries into your project classpath

                 NOTE: hadoop lib folder location vary on the basis of hadoop version.
                 In Hadoop-2.6.0 use this path "hadoop/share/hadoop/common/lib" , ignore this path "hadooop/lib direcotry"

    step 5 - Create java class MongoConnector

    step 6 - Write a MapReduce programme

    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.mapreduce.Mapper;
    import org.apache.hadoop.mapreduce.Reducer;
    import org.apache.hadoop.util.ToolRunner;
    import org.bson.BSONObject;
    import com.mongodb.hadoop.MongoConfig;
    import com.mongodb.hadoop.MongoInputFormat;
    import com.mongodb.hadoop.MongoOutputFormat;
    import com.mongodb.hadoop.util.MapredMongoConfigUtil;
    import com.mongodb.hadoop.util.MongoConfigUtil;
    import com.mongodb.hadoop.util.MongoTool;
    public class MongoConnector extends MongoTool{
    	public static class Map extends Mapper{
    		public void map(final Object key, final BSONObject value, final Context context) throws IOException, InterruptedException{
    			 * write your mapper logic
    			context.write(new Text(), new IntWritable(1));	
    	public static class Reduce extends Reducer{
    		public void reduce(Text key,Iterable values,Context context) throws IOException, InterruptedException{
    			 * write your reducer logic
    			context.write(new Text(), new IntWritable(1));
    	public MongoConnector(){
    		Configuration conf = new Configuration();
    		MongoConfig mongoConfig = new MongoConfig(conf);
    		if (MongoTool.isMapRedV1()) {
                MapredMongoConfigUtil.setInputFormat(getConf(), com.mongodb.hadoop.mapred.MongoInputFormat.class);
                MapredMongoConfigUtil.setOutputFormat(getConf(), com.mongodb.hadoop.mapred.MongoOutputFormat.class);
            } else {
                MongoConfigUtil.setInputFormat(getConf(), MongoInputFormat.class);
                MongoConfigUtil.setOutputFormat(getConf(), MongoOutputFormat.class);
    		mongoConfig.setMapper((Class) Map.class);
    	public static void main(String[] args) throws Exception {
    		 System.exit( MongoConnector(), args));

    NOTE: your mongo instance should be started.

    Your connection is setup successfully if you want to run mapreduce programme using jar then follow these steps

    step 1 - First of all put mongo connector jars downloaded in first step in hadoop lib directory
    step 2 - start hadoop services 
    step 3 - create jar file of above java project
    step 4 - Hit this command - hadoop jar HadoopWithMongo.jar MongoConnector

        This will start your mapreduce programme.

    Hope this Blog will help you in establishing connection between hadoop and mongo! 

Tags: hadoop , mongo , mongoDBConnector

