Don't get discouraged by the length of this post, the whole procedure literally takes only a few minutes! There are only three steps to install Hadoop on a single machine. First, make sure Java (version 6 or later) is installed and that Hadoop knows where to find it. Second, setup your machine to accept ssh logins (this is needed for Hadoop's pseudo-distributed mode). Third, configure Hadoop. We will proceed explaining each of these steps, differentiating depending on the UNIX operating system in use.
Java
As mentioned, we will need to install Java SDK and have the environment variable JAVA_HOME point to a suitable Java installation. Usually, this variable is set in a shell startup file, such as ~/.bash_profile or ~/.bashrc (or ~/.zshrc if you use zsh as shell). We will use .bashrc in this tutorial. The location of Java home varies depending on the system. In most cases this folder should contain a folder named include containing a file jni.h.Mac OS X
Mac OS X comes by default with Java 6 SDK. It is enough to set JAVA_HOME in ~/.bash_profile or ~/.bashrc. From a terminal run the following two commands.echo "export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6.0/Home" >> ~/.bashrc
source ~/.bashrc
Ubuntu/Debian
Run the following command to install Java 6 OpenJDK:sudo apt-get update
sudo apt-get -y install openjdk-6-jdk openjdk-6-jre
The JAVA_HOME variable can be set as follows:
JAVA_BIN=`update-alternatives --list java | cut -d' ' -f1`
echo "export JAVA_HOME=`dirname $( dirname $( dirname $JAVA_BIN ) )`" >> ~/.bashrc
source ~/.bashrc
CentOS/Red Hat
Run the following to install Java 6 OpenJDK:sudo yum -y install java-1.6.0-openjdk java-1.6.0-openjdk-devel
This should install Java in a subdirectory of /usr/lib/jvm/java which is going to be our JAVA_HOME:
echo "export JAVA_HOME=/usr/lib/jvm/java" >> ~/.bashrc
source ~/.bashrc
SSH
Hadoop does not distinguish between fully-distributed mode (i.e. when deployed on a cluster) and pseudo-distributed mode (i.e. when installed on a single machine). It simply starts the required daemons on the machine(s) listed in the $HADOOP_INSTALL/conf/slaves, by logging in these machines and starting the processes. By default, the slaves file contains localhost (i.e. by default Hadoop is configured for single-machine mode), so we need to enable SSH login to our machine.Mac OS X
Go into System Preferences -> Sharing and enable Remote Login for (at least) the current user. Then go to section SSH password-less login.Ubuntu/Debian
Install ssh with the following command, then go to section SSH password-less login.sudo apt-get install -y ssh
CentOS/Red Hat
Install ssh with the following command, then go to section SSH password-less login.sudo yum -y install ssh
SSH password-less login
First of all, don't you worry :) Password-less login does not mean that everybody can login into your machine without a password. It simply means that we will setup your machine to login into itself without a password.To enable password-less login, generate a new SSH key with an empty passphrase, and add it to the authorized keys and known hosts:
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
ssh-keyscan -H localhost, localhost >> ~/.ssh/known_hosts
ssh-keyscan -H localhost, 0.0.0.0 >> ~/.ssh/known_hosts
Test this with:
ssh localhost
If everything was setup correctly, you should have logged in without having to type in any password.
Hadoop
Download a release of Hadoop here. The common question is "Which version??". In this post we assume we want to run MapReduce v1 (i.e. no YARN, as it's not production-ready yet). I would suggest you to go either with version 0.20.2 (the last legacy version with all usable features), or with the latest 1.x version (1.2.1 at the time of writing this post). Despite the naming, the 1.x versions are stable continuations of the 0.20 branch, in fact 1.0 is a simple renaming of 0.20.205. Once you decided the release x.y.z to go with, navigate in the corresponding folder and download the file hadoop-x.y.z.tar.gz and unpack it somewhere in your filesystem:tar xzf hadoop-x.y.z.tar.gz
It is useful to have an environmental variable HADOOP_INSTALL pointing to the Hadoop installation folder and to add the Hadoop binary subfolder bin to the command-line path. The following commands assume Hadoop is in your home folder, also change x.y.z with your version.
echo "export HADOOP_INSTALL=~/hadoop-x.y.z" >> ~/.bashrc
echo "export PATH=\$PATH:\$HADOOP_INSTALL/bin" >> ~/.bashrc
source ~/.bashrc
At this point you should be able to run Hadoop. Test this with:
hadoop version
This step may be redundant but usually solves the "JAVA_HOME is not set" issue. Set JAVA_HOME also in the file $HADOOP_INSTALL/conf/hadoop-env.sh. You can either edit the file yourself looking for JAVA_HOME and setting it to the right value, or run the following:
echo "export JAVA_HOME=$JAVA_HOME" >> $HADOOP_INSTALL/conf/hadoop-env.sh
Let's now move on with configuring Hadoop for pseudo-distributed mode. By default Hadoop is configured for standalone (sometimes called local) mode. In standalone mode, a submitted job is actually not being executed by Hadoop's daemons, but rather by a MapReduce simulator (indeed, everything runs in a single JVM). While this can be useful for basic debugging, this mode does not reflect some other important Hadoop aspects that should be debugged, such as multiple reducers, or serialization between map and reduce. In pseudo-distributed mode, everything runs as in fully-distributed mode, except that the cluster has only one machine.
The files we are going to change are $HADOOP_INSTALL/conf/{mapred, core, hdfs}-site.xml. It is convenient to save the default files if you later want to switch back to standalone mode.
mkdir $HADOOP_INSTALL/conf/standalone
cp $HADOOP_INSTALL/conf/*-site.xml $HADOOP_INSTALL/conf/standalone
MapReduce framework
We start with modifying the file mapred-site.xml to instruct Hadoop to launch a JobTracker daemon, which basically implements the MapReduce framework. The file should have the following content: This is actually enough to run jobs in pseudo-distributed mode. The question is if you also want to use HDFS rather than your local file-system (since you're running Hadoop on only one machine, the local filesystem will work fine). If you want to stick with your local filesystem, you can skip the following section and go directly to section Running the daemons.HDFS
If you want to run HDFS as well, first setup HDFS to be Hadoop's default filesystem by modifying the file core-site.xml: And set a block replication of 1 in hdfs-site.xml: Finally we initialize (format) HDFS.hadoop namenode -format
You can see that a folder /tmp/hadoop-${user.name}/dfs has been created. If you want to change the location where HDFS stores metadata and data, you need to set the properties dfs.namenode.name.dir and dfs.datanode.data.dir in hdfs-site.xml and re-format.
Running the daemons
If you decided to run Hadoop without HDFS then you can start the MapReduce daemons (JobTracker and TaskTracker) with the following command:start-mapred.sh
You can access the JobTracker UI at http://localhost:50030.
If instead you decided to also use HDFS, you can start the HDFS and MapReduce daemons (NameNode, DataNode, JobTracker, TaskTracker) as follows:
start-dfs.sh
start-mapred.sh
You can access the NameNode UI at http://localhost:50070.
To stop the daemons run the corresponding stop-mapred.sh and stop-dfs.sh.
A quick test
We will run a quick test that counts the word occurrences in a file. The Hadoop examples jar file contains several examples, among which the typical word count. Running the following command:hadoop jar ${HADOOP_INSTALL}/hadoop-*examples*.jar
returns as output:
An example program must be given as the first argument.
Valid program names are:
aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
dbcount: An example job that count the pageview counts from a database.
grep: A map/reduce program that counts the matches of a regex in the input.
join: A job that effects a join over sorted, equally partitioned datasets
multifilewc: A job that counts words from several files.
pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
pi: A map/reduce program that estimates Pi using monte-carlo method.
randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
randomwriter: A map/reduce program that writes 10GB of random data per node.
secondarysort: An example defining a secondary sort to the reduce.
sleep: A job that sleeps at each map and reduce task.
sort: A map/reduce program that sorts the data written by the random writer.
sudoku: A sudoku solver.
teragen: Generate data for the terasort
terasort: Run the terasort
teravalidate: Checking results of terasort
wordcount: A map/reduce program that counts the words in the input files.
As input, we will use a plain text version of "Moby-Dick" by Herman Melville, downloadable from project Gutenberg.
wget http://www.gutenberg.org/cache/epub/2489/pg2489.txt
If you have decided to use HDFS, copy the file to HDFS with:
hadoop fs -copyFromLocal pg2489.txt .
Now run the following command to run the word count job. You can check the progress on the JobTracker UI.
hadoop jar ${HADOOP_INSTALL}/hadoop-*examples*.jar wordcount pg2489.txt out
The output is going to be in the files part-r-0000* inside the out folder. If you're not using HDFS, that folder has been created in the folder from which you launched the command.
Note that there's only one file part-r-00000; this is because by default Hadoop uses a single reducer. If you want to use multiple reducers (say 2), then you can modify the previous command to:
If you are not using HDFS, you can print the output content as follows:
hadoop jar ${HADOOP_INSTALL}/hadoop-*examples*.jar wordcount -D mapred.reduce.tasks=2 pg2489.txt out
cat out/part-r-00000
If you are using HDFS, you can use the following command:
hadoop fs -cat out/part-r-00000
Alternatively, you can copy the output folder to your local file-system:
hadoop fs -copyToLocal out .
References
- A nice history of Hadoop releases by the Cloudera folks
- Our intro to MapReduce and Hadoop.
Thanks for your post it was a great write-up, especially for so many different platforms. I had some problems getting this to work on MacOS 10.9, here are some of the steps that I had to change to get it to work:
ReplyDelete- in hadoop-env.sh changed export JAVA_HOME=$(/usr/libexec/java_home)
- in hadoop-env.sh set export HADOOP_OPTS="-Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk -Djava.net.preferIPv4Stack=true"
This got rid of the errors and warnings and I was able to run sample examples. I also noted that many have increased the heap, I did this just to be safe in hadoop-env.sh:
# The maximum amount of heap to use, in MB. Default is 1000.
export HADOOP_HEAPSIZE=2000
As we also follow this blog along with attending hadoop online training center, our knowledge about the hadoop increased in manifold ways. Thanks for the way information is presented on this blog.
ReplyDeleteThank you for sharing such a nice and interesting blog with us. I have seen that all will say the same thing repeatedly. But in your blog, I had a chance to get some useful and unique information. I would like to suggest your blog in my dude circle.
ReplyDeleteInvisalign Treatment In Chennai
It’s the best time to make some plans for the future and it is time to be happy. I’ve read this post and if I could I want to suggest you few interesting things or suggestions.You can write next articles referring to this article. I desire to read even more things about it..
ReplyDeleteOffice Interior Designers in Coimbatore
Office Interior Designers in Bangalore
Office Interior Designers in Hyderabad
I am really happy to say it’s an interesting post to read . I learn new information from your article , you are doing a great job . Keep it up
ReplyDeleteHadoop Training in Hyderabad
Data Science Training in Hyderabad
Hi,
ReplyDeleteI could understand various concepts explained here and it is easier to grasp because of step by step instructions given here. This is the reason i love this post.Thanks for sharing:) it's very useful.
SEO Company in Chennai
SEO Company in India
Digital Marketing company in Chennai
Digital Marketing Company in India
Web Development Company in India
Web Design Company in Chennai
Good work Sir, Thanks for the proper explanation about HDFS. I found one of the good resource related to HDFS and Hadoop. It is providing in-depth knowledge on HDFS and HDFS Architecture. which I am sharing a link with you where you can get more clear on HDFS and Hadoop. To know more Just have a look at Below link
ReplyDeleteHDFS
Hadoop
HDFS Architecture
Thanks for your wonderful information..
ReplyDeleteSAP Basis Training in Chennai
ReplyDeleteThis is a very interesting web page and I have enjoyed reading many of the articles and posts
contained on the website, keep up the good work and hope to read some more interesting content in the
future.
PHP certfication in
chennai
Superb post presented by i really liked it Big data hadoop online Course
ReplyDeleteThis is an awesome post. Really very informative and creative. This sharing concept is a good way to enhance the knowledge. Thank you very much for this post. I like this site very much. I like it and it help me to development very well...
ReplyDeleteSoftware Testing Training in Chennai
SEO Training in Chennai
Informatica Training in Chennai
Digital Marketing Training in Chennai
very informative blog and useful article thank you for sharing with us
ReplyDeleteBig data hadoop online Training Bangalore
thanks sir for this post
ReplyDeletethanks for this post sir i really like your work.
Top CDS Coaching in Dehradun
CLAT Coaching in Dehradun
Best SSC Coaching in Dehradun
Today Match Bhavishyavani
KPL 2018 match prediction
KPL 2018 all match prediction
Today match prediction
Wow its a great blog on Hadoop topic. I really like this mater peace. Keep sharing and thanks...!
ReplyDeleteBig Data Testing Classes
Hadoop Big Data Classes in Pune
Big Data Training Institutes in Pune
Hadoop Training in Pune
Hadoop Pune
ReplyDeleteReally it was an awesome article… very interesting to read…
Thanks for sharing.........
Tableau online training in Chennai
Tableau training in mumbai
Best Tableau online training in delhi
Nice and good article. It is very useful for me to learn and understand easily. Thanks for sharing your valuable information and time. Please keep updating Hadoop administration Online Training
ReplyDeletenice post thanks
ReplyDeleteappvn apk ios
tutuapp apk ios
Nice information
ReplyDeletebest android training center in Marathahalli
best android development institute in Marathahalli
android training institutes in Marathahalli
ios training in Marathahalli
android training in Marathahalli
mobile app development training in Marathahalli
India Tour Packages
ReplyDeleteHoliday Tour Packages
IPL Match Prediction 2019
IPL Match Astrology 2019
IPL 2019 All Match Prediction
IPL 2019 All Match Astrology
Today IPL Toss Prediction 2019
Vivo IPL 2019 Match Schedule
World Cup 2019 Match Prediction
World Cup 2019 Match Astrology
World Cup 2019 All Match Prediction
Superb blog I visit this blog it's really awesome. The important thing is that in this blog content written clearly and understandable. The content of information is very informative.
ReplyDeleteOracle Fusion HCM Online Training
Oracle Fusion SCM Online Training
Oracle Fusion Financials Online Training
Big Data and Hadoop Training In Hyderabad
oracle fusion financials classroom training
Workday HCM Online Training
Oracle Fusion HCM Classroom Training
Workday HCM Online Training
This comment has been removed by the author.
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteWhat a thrilling post, you have pointed out some excellent points, I as well believe this is a superb website. I have planned to visit it again and again.
ReplyDeletethis content
have a peek at these guys
check my blog
nice message
ReplyDeleteinformatica Training in Bangalore
Azure DevOps training in Bangalore
Google Cloud Training in Bangalore
Blue Prism Training in Bangalore
MERN StackTraining in Bangalore
RPA Training in Bangalore
Qlikview Training in Bangalore
Qlik Sense Training in Bangalore
Great Post. The information provided is of great use as I got to learn new things. Keep Blogging.
ReplyDeleteHADOOP Training Institutes in Bangalore
Machine learning solution providers should understand the need of Data warehouses, and they should work to build more appropriate warehouses to meet the requirements of their clients.
ReplyDeleteNice information, valuable and excellent in Job, as share good stuff with good ideas and concepts, lots of great information and inspiration, both of which I need, thanks to offer such a helpful information here.
ReplyDeletemobile phone repair in Canton
iphone repair in Canton
cell phone repair in Canton
tablet repair in Canton
ipad repair in Canton
phone repair in Canton
mobile phone repair canton
iphone repair canton
cell phone repair canton
phone repair canton
Here is the site(bcomexamresult.in) where you get all Bcom Exam Results. This site helps to clear your all query.
ReplyDeleteBA 3rd year Result 2019-20
Calcutta University BCOM 3rd Year Result 2020
Awesome article, it was exceptionally helpful! I simply began in this and I'm becoming more acquainted with it better! Cheers, keep doing awesome!
ReplyDeleteSEO Gloucester
SEO Cheltenham
Local SEO Agency Gloucester
I am very very impressed with your blog, I hope you will have more blogs or more articles to bring to readers. You are doing a very good job.
ReplyDeleteVBSPU BA 1st Year Result
VBSPU BA 2nd Year Result
VPSPU BA 3rd Year Result
Smm Panel
ReplyDeletesmm panel
iş ilanları
İnstagram takipçi satın al
Hirdavatci Burada
www.beyazesyateknikservisi.com.tr
servis
TİKTOK JETON HİLESİ İNDİR
maltepe bosch klima servisi
ReplyDeletebeykoz arçelik klima servisi
üsküdar arçelik klima servisi
tuzla vestel klima servisi
kartal mitsubishi klima servisi
ümraniye mitsubishi klima servisi
beykoz toshiba klima servisi
üsküdar toshiba klima servisi
beykoz beko klima servisi