Hadoop connection with mongodb using mongoDBConnector

Posted By Md Qasim Siddiqui | 25-Jun-2015

Prequisite:
    Install hadoop.

Hadoop installation
                  1) downloaded tar file of hadoop from apache and set hadoop path in .bashrc .


NOTE: using hadoop 2.6.0 and mongodbConnector r1.4.0-rc0


if you are using maven to build your project then follow these steps to process mongodb data with hadoop . 

step 1 - Add dependency into your pom.xml file and also download jars which will be required later to run mapreduce programme from command line
             click here to download mongodbConnector jars https://github.com/mongodb/mongo-hadoop/releases

step 2 - Create maven based java project 'HadoopWithMongo'

step 3 - Add mongo-hadoop-core-1.4-rc0 dependency into pom.xml file

step 4 - Add hadoop liberaries into your project classpath


             NOTE: hadoop lib folder location vary on the basis of hadoop version.
             In Hadoop-2.6.0 use this path "hadoop/share/hadoop/common/lib" , ignore this path "hadooop/lib direcotry"

step 5 - Create java class MongoConnector

step 6 - Write a MapReduce programme

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.util.ToolRunner;
import org.bson.BSONObject;

import com.mongodb.hadoop.MongoConfig;
import com.mongodb.hadoop.MongoInputFormat;
import com.mongodb.hadoop.MongoOutputFormat;
import com.mongodb.hadoop.util.MapredMongoConfigUtil;
import com.mongodb.hadoop.util.MongoConfigUtil;
import com.mongodb.hadoop.util.MongoTool;


public class MongoConnector extends MongoTool{
	public static class Map extends Mapper{
		public void map(final Object key, final BSONObject value, final Context context) throws IOException, InterruptedException{
			System.out.println(value);
			/**
			 * write your mapper logic
			 */
			context.write(new Text(), new IntWritable(1));	
		}
	}
	
	public static class Reduce extends Reducer{
		public void reduce(Text key,Iterable values,Context context) throws IOException, InterruptedException{
			/**
			 * write your reducer logic
			 */
			context.write(new Text(), new IntWritable(1));
		}
	}
	
	public MongoConnector(){
		Configuration conf = new Configuration();
		MongoConfig mongoConfig = new MongoConfig(conf);
		setConf(conf);
		if (MongoTool.isMapRedV1()) {
            MapredMongoConfigUtil.setInputFormat(getConf(), com.mongodb.hadoop.mapred.MongoInputFormat.class);
            MapredMongoConfigUtil.setOutputFormat(getConf(), com.mongodb.hadoop.mapred.MongoOutputFormat.class);
        } else {
            MongoConfigUtil.setInputFormat(getConf(), MongoInputFormat.class);
            MongoConfigUtil.setOutputFormat(getConf(), MongoOutputFormat.class);
        }
		mongoConfig.setInputFormat(MongoInputFormat.class);
		mongoConfig.setInputURI("mongodb://localhost:27017/dbName.collectionName");
		mongoConfig.setMapper((Class) Map.class);
		mongoConfig.setReducer(Reduce.class);
		mongoConfig.setMapperOutputKey(Text.class);
		mongoConfig.setMapperOutputValue(IntWritable.class);
		mongoConfig.setOutputKey(Text.class);
		mongoConfig.setOutputValue(IntWritable.class);
		mongoConfig.setOutputURI("mongodb://localhost:27017/dbName.outputCollectionName");
		mongoConfig.setOutputFormat(MongoOutputFormat.class);
	}
	
	public static void main(String[] args) throws Exception {
		 System.exit(ToolRunner.run(new MongoConnector(), args));
	}
}

NOTE: your mongo instance should be started.

Your connection is setup successfully if you want to run mapreduce programme using jar then follow these steps

step 1 - First of all put mongo connector jars downloaded in first step in hadoop lib directory
step 2 - start hadoop services 
step 3 - create jar file of above java project
step 4 - Hit this command - hadoop jar HadoopWithMongo.jar MongoConnector

    This will start your mapreduce programme.


Hope this Blog will help you in establishing connection between hadoop and mongo! 

Request for Proposal

Recaptcha is required.

Sending message..