Run a simple map reduce job in hadoop pseudo distributed setup.

Posted By : Rohan Jain | 17-Nov-2014

hadoop

"In this blog I will describe, how you can run a simple map reduce job in a single-node Hadoop cluster. for this i am going to use a WordCountexample which reads text files and counts how often words occur. The input is text files and the output is text files, each line of which contains a word and the count of how often it occured, separated by a tab. - if you don't have hadoop setup, Read -

http://www.oodlestechnologies.com/blogs/Install-%26-Configure-Apache-Hadoop-2.x.x-On-Ubuntu-%28Single-Node-Cluster-or-Pseudo-Distributed-Setup%29 "

First step for running a map reduce job is that you should have an example input data (a large text file)

1.You can download one using the following links (download plain text format).

http://www.gutenberg.org/cache/epub/20417/pg20417.txt
http://www.gutenberg.org/cache/epub/5000/pg5000.txt

you can supply your own input text file.

Store the file in a local directory of your choice (e.g. /home/hadoopinput)

2.login as hduser change to dir /usr/local/hadoop/bin

Start your Hadoop cluster if it’s not running already.

3. create a dir in hdfs

hduser@rohan-Vostro-3446:/usr/local/hadoop/bin$ hadoop fs -mkdir /hadoopinput

check for your newly created dir

hduser@rohan-Vostro-3446:/usr/local/hadoop/bin$ hadoop fs -ls /

4.Copy local example data to HDFS Before we run the actual MapReduce job, we first have to copy the files from our local file system to Hadoop’s HDFS

hduser@rohan-Vostro-3446:/usr/local/hadoop/bin$ hadoop fs -copyFromLocal /home/rohan/hadoopinput /hdfsinput

Check your hdfs dir for input file

hduser@rohan-Vostro-3446:/usr/local/hadoop/bin$ hadoop fs -ls /hdfsinput

5.Run the Mapreduce job

hduser@rohan-Vostro-3446:/usr/local/hadoop/bin$ hadoop jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar wordcount /hadoopinput /hdfsoutput

This command will read all the files in the HDFS directory /hadoopinput, process it, and store the result in the HDFS directory you specified.

Output of the previous command

14/11/17 13:43:17 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/11/17 13:43:18 INFO input.FileInputFormat: Total input paths to process : 1
14/11/17 13:43:18 INFO mapreduce.JobSubmitter: number of splits:1
14/11/17 13:43:18 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1416199619656_0004
14/11/17 13:43:19 INFO impl.YarnClientImpl: Submitted application application_1416199619656_0004
14/11/17 13:43:19 INFO mapreduce.Job: The url to track the job: http://rohan-Vostro-3446:8088/proxy/application_1416199619656_0004/
14/11/17 13:43:19 INFO mapreduce.Job: Running job: job_1416199619656_0004
14/11/17 13:43:26 INFO mapreduce.Job: Job job_1416199619656_0004 running in uber mode : false
14/11/17 13:43:26 INFO mapreduce.Job:  map 0% reduce 0%
14/11/17 13:43:34 INFO mapreduce.Job:  map 100% reduce 0%
14/11/17 13:43:42 INFO mapreduce.Job:  map 100% reduce 100%
14/11/17 13:43:43 INFO mapreduce.Job: Job job_1416199619656_0004 completed successfully
14/11/17 13:43:43 INFO mapreduce.Job: Counters: 49
    File System Counters
        FILE: Number of bytes read=267013
        FILE: Number of bytes written=728217
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=661905
        HDFS: Number of bytes written=196183
        HDFS: Number of read operations=6
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Launched reduce tasks=1
        Data-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=6012
        Total time spent by all reduces in occupied slots (ms)=5116
        Total time spent by all map tasks (ms)=6012
        Total time spent by all reduce tasks (ms)=5116
        Total vcore-seconds taken by all map tasks=6012
        Total vcore-seconds taken by all reduce tasks=5116
        Total megabyte-seconds taken by all map tasks=6156288
        Total megabyte-seconds taken by all reduce tasks=5238784
    Map-Reduce Framework
        Map input records=12760
        Map output records=109844
        Map output bytes=1086544
        Map output materialized bytes=267013
        Input split bytes=98
        Combine input records=109844
        Combine output records=18039
        Reduce input groups=18039
        Reduce shuffle bytes=267013
        Reduce input records=18039
        Reduce output records=18039
        Spilled Records=36078
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=92
        CPU time spent (ms)=7090
        Physical memory (bytes) snapshot=439283712
        Virtual memory (bytes) snapshot=1424547840
        Total committed heap usage (bytes)=355467264
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=661807
    File Output Format Counters 
        Bytes Written=196183

Check your hdfs dir for file

hduser@rohan-Vostro-3446:/usr/local/hadoop/bin$ hadoop fs -ls /hdfsoutput

Found 2 items
-rw-r--r--   1 hduser supergroup          0 2014-11-06 23:21 /hdfsoutput/_SUCCESS
-rw-r--r--   1 hduser supergroup      40923 2014-11-06 23:21 /hdfsoutput/part-r-00000

You can see the result which is stored in hdfs directory /hadoopoutput

hduser@rohan-Vostro-3446:/usr/local/hadoop/bin$ hadoop fs -cat  /hdfsoutput/part-r-00000

i have used the link http://www.gutenberg.org/cache/epub/20417/pg20417.txt for input file.

The output of previous command would be (only a part of output is shown)

works--the    1
works.    6
works;    1
world    45
world!    1
world's    1
world,    11
world--even    1
world--weighs    1
world-cloud    1

Related Tags

MachineLearning

artificial Intelligence

About Author

Rohan Jain

Rohan is a bright and experienced web app developer with expertise in Groovy and Grails development.

Ready to innovate? Let's get in touch

Attach files

Recaptcha is required.

Backend

Full Stack

Frontend

Blockchain

Mobile

Video Streaming

E-commerce

ERP

CMS

Devops

AR/VR

Software Development Services

Metaverse Innovation & Consulting

Digital Experience

Digital Trivergence

Data Services

Scaffold

Company

Run a simple map reduce job in hadoop pseudo distributed setup.

Posted By : Rohan Jain | 17-Nov-2014

Related Tags

About Author

Rohan Jain

More From Oodles

Fair Play & Machine Learning : The Path to Next Level Game Development

In this article, we'll unveil the key concepts of ML for ensuring fair play, its challenges and how it addresses these issues to provide players with a secure gaming experience.

Arpita Pal | 11-Oct-2024

Smart Gaming : Impact of Reinforcement Learning on Mobile Games

This blog delves into the significant role of reinforcement learning in revolutionizing mobile game AI, exploring its applications, challenges, and future prospects.

Arpita Pal | 15-Apr-2024

Driving Strategic Growth With AI In Marketing Analytics

This blog delves into the multifaceted impact of AI, its significance and the technologies utilized in AI in marketing analytics.

Arpita Pal | 16-Feb-2024

Ready to innovate? Let's get in touch

Valued Services

Resources

Expertise

Connect with us

© Copyright 2025 Oodles Technologies Pvt Ltd. All rights reserved.