Mongo hadoop connector

Posted By : Nishtha Singh | 01-Apr-2014

When to use Hadoop and MongoDB together

MongoDB itself has the aggregation functionality since it is a document-oriented database which helps in data analysis but if there is a need of complex data analysis, complex data aggregation is required which is provided by Hadoop. This is referred as batch-aggregation which is the most useful feature provided by Hadoop.

In batch-aggregation, data is fetched from MongoDB, processed via Hadoop and results are written back to MongoDB. But if the data is not too large or complex its better to avoid the use of hadoop as HDFS is not native to MongoDB and also MongoDB has its own way of scaling data and working with data stored over multiple machines.

Also, considering the production environment where application’s data is stored in multiple datacentres having their own functionality and query language, Hadoop can solve this problem by acting as the centralized repository and MapReduce jobs can be used to load data from MongoDb to Hadoop for processing it.

What is Mongo-Hadoop connector

The MongoDB-Hadoop Connector is an open-source plugin for Hadoop that allows MongoDB to be used, instead of HDFS, as a source and sink of data. Using this connector, user to specify a query and breaks the results of that query into input splits for Hadoop. Results are written back to MongoDB by the Hadoop reducer and also there is no role of HDFS in any one of these operations.

Why use Mongo-Hadoop connector

An alternative approach to using Mongo-hadoop connector is :

either running MapReduce in MongoDB directly, or performing a three-stage operation i.e loading the data from MongoDB to HDFS,running Hadoop MapReduce, and importing the output back into MongoDB. Both of these approaches have drawbacks for complex operations on large data sets. The problems with the Mongo-MapReduce approach are:

(1) the language for MR scripts is JavaScript, which is slow and has poor analytics libraries, and

(2) the SpiderMonkey Javascript implementation used by MongoDB, is not thread-safe, so only one MapReduce program can run at a time

Also the three-stage approach is inconvenient and requires a large database and HDFS I/O. The MongoDB-Hadoop Connector, which allows the user to leave the input data in database, is thus an attractive option to explore. The connector can optionally leave the output in HDFS, which allows for different combinations of read and write resources

Steps to use Mongo-Hadoop connector:

1)Setup MongoDB version 2.4.9

2) Setup Hadoop on your system from one of the following versions:

0.20/0.20.x
1.0/1.0.x
0.21/0.21.x
CDH3configure Mongo
CDH4

and follow the link Install and setup Hadoop to setup hadoop on your system.

3) Next step is to build the Mongo-Hadoop adapter. The prequisite is hadoop should be up and running. Also git and JDK 1.6 should be installed.

4)The below link is a step-by-step guide for running Map-Reduce on some of the examples on git using Mongo-Hadoop connector. Just follow the steps and you will be able to run Map-Reduce using Mongo-Hadoop connector.

Mongo-Hadoop Connector

Related Tags

MachineLearning

artificial Intelligence

About Author

Nishtha Singh

Nistha is a bright Groovy and Grails developer and have worked on development of various SaaS applications using Grails technologies. Nistha's hobbies are poetry and glass painting.

Ready to innovate? Let's get in touch

Attach files

Recaptcha is required.

Backend

Full Stack

Frontend

Blockchain

Mobile

Video Streaming

E-commerce

ERP

CMS

Devops

AR/VR

Software Development Services

Metaverse Innovation & Consulting

Digital Experience

Digital Trivergence

Data Services

Scaffold

Company

Mongo hadoop connector

Posted By : Nishtha Singh | 01-Apr-2014

Related Tags

About Author

Nishtha Singh

More From Oodles

Fair Play & Machine Learning : The Path to Next Level Game Development

In this article, we'll unveil the key concepts of ML for ensuring fair play, its challenges and how it addresses these issues to provide players with a secure gaming experience.

Arpita Pal | 11-Oct-2024

Smart Gaming : Impact of Reinforcement Learning on Mobile Games

This blog delves into the significant role of reinforcement learning in revolutionizing mobile game AI, exploring its applications, challenges, and future prospects.

Arpita Pal | 15-Apr-2024

Driving Strategic Growth With AI In Marketing Analytics

This blog delves into the multifaceted impact of AI, its significance and the technologies utilized in AI in marketing analytics.

Arpita Pal | 16-Feb-2024

Ready to innovate? Let's get in touch

Valued Services

Resources

Expertise

Connect with us

© Copyright 2026 Oodles Technologies Pvt Ltd. All rights reserved.