Blog

  • Prequisite:
        Install hadoop.

    Hadoop installation
                      1) downloaded tar file of hadoop from apache and set hadoop path in .bashrc .


    NOTE: using hadoop 2.6.0 and mongodbConnector r1.4.0-rc0


    if you are using maven to build your project then follow these steps to process mongodb data with hadoop . 

    step 1 - Add dependency into your pom.xml file and also download jars which will be required later to run mapreduce programme from command line
                 click here to download mongodbConnector jars https://github.com/mongodb/mongo-hadoop/releases

    step 2 - Create maven based java project 'HadoopWithMongo'

    step 3 - Add mongo-hadoop-core-1.4-rc0 dependency into pom.xml file

    step 4 - Add hadoop liberaries into your project classpath


                 NOTE: hadoop lib folder location vary on the basis of hadoop version.
                 In Hadoop-2.6.0 use this path "hadoop/share/hadoop/common/lib" , ignore this path "hadooop/lib direcotry"

    step 5 - Create java class MongoConnector

    step 6 - Write a MapReduce programme

    import java.io.IOException;
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Mapper;
    import org.apache.hadoop.mapreduce.Reducer;
    import org.apache.hadoop.util.ToolRunner;
    import org.bson.BSONObject;
    
    import com.mongodb.hadoop.MongoConfig;
    import com.mongodb.hadoop.MongoInputFormat;
    import com.mongodb.hadoop.MongoOutputFormat;
    import com.mongodb.hadoop.util.MapredMongoConfigUtil;
    import com.mongodb.hadoop.util.MongoConfigUtil;
    import com.mongodb.hadoop.util.MongoTool;
    
    
    public class MongoConnector extends MongoTool{
    	public static class Map extends Mapper{
    		public void map(final Object key, final BSONObject value, final Context context) throws IOException, InterruptedException{
    			System.out.println(value);
    			/**
    			 * write your mapper logic
    			 */
    			context.write(new Text(), new IntWritable(1));	
    		}
    	}
    	
    	public static class Reduce extends Reducer{
    		public void reduce(Text key,Iterable values,Context context) throws IOException, InterruptedException{
    			/**
    			 * write your reducer logic
    			 */
    			context.write(new Text(), new IntWritable(1));
    		}
    	}
    	
    	public MongoConnector(){
    		Configuration conf = new Configuration();
    		MongoConfig mongoConfig = new MongoConfig(conf);
    		setConf(conf);
    		if (MongoTool.isMapRedV1()) {
                MapredMongoConfigUtil.setInputFormat(getConf(), com.mongodb.hadoop.mapred.MongoInputFormat.class);
                MapredMongoConfigUtil.setOutputFormat(getConf(), com.mongodb.hadoop.mapred.MongoOutputFormat.class);
            } else {
                MongoConfigUtil.setInputFormat(getConf(), MongoInputFormat.class);
                MongoConfigUtil.setOutputFormat(getConf(), MongoOutputFormat.class);
            }
    		mongoConfig.setInputFormat(MongoInputFormat.class);
    		mongoConfig.setInputURI("mongodb://localhost:27017/dbName.collectionName");
    		mongoConfig.setMapper((Class) Map.class);
    		mongoConfig.setReducer(Reduce.class);
    		mongoConfig.setMapperOutputKey(Text.class);
    		mongoConfig.setMapperOutputValue(IntWritable.class);
    		mongoConfig.setOutputKey(Text.class);
    		mongoConfig.setOutputValue(IntWritable.class);
    		mongoConfig.setOutputURI("mongodb://localhost:27017/dbName.outputCollectionName");
    		mongoConfig.setOutputFormat(MongoOutputFormat.class);
    	}
    	
    	public static void main(String[] args) throws Exception {
    		 System.exit(ToolRunner.run(new MongoConnector(), args));
    	}
    }

    NOTE: your mongo instance should be started.

    Your connection is setup successfully if you want to run mapreduce programme using jar then follow these steps

    step 1 - First of all put mongo connector jars downloaded in first step in hadoop lib directory
    step 2 - start hadoop services 
    step 3 - create jar file of above java project
    step 4 - Hit this command - hadoop jar HadoopWithMongo.jar MongoConnector

        This will start your mapreduce programme.


    Hope this Blog will help you in establishing connection between hadoop and mongo! 

Tags: hadoop , mongo , mongoDBConnector

Mobile Applications

Video Content

Bigdata & NoSQL

SaaS Applications

Miscellaneous

Archives


Alexa Certified Site Stats for www.oodlestechnologies.com