step 1: Extracting Hadoop tarball
We are creating a user home for Hadoop installation.Here we are using ‘/home/hduser/Utilities’ as user home.You need to extract the tarball in this location and change the permissions recursively on the extracted directory.
Here we are using hadoop-1.0.3.tar
mkdir -p /home/hduser/Utilities
cd /home/hduser/Utilities
sudo tar -xzvf hadoop-1.0.3.tar.gz
sudo chown -R hduser:hadoop hadoop-1.0.3
step 2:Configuring Hadoop on environment variables
We are adding HADOOP_HOME as environment variables on bash.bashrc files.By doing this Hadoop commands can access every user.
sudo nano /etc/bash.bashrc
Append the following lines to add HADOOP_HOME to PATH.
#set HADOOP_HOME
export HADOOP_HOME=/home/hduser/Utilities/hadoop-1.0.3
export PATH=$HADOOP_HOME/bin:$PATH
step 3: Configuring java for Hadoop
sudo nano /home/hduser/Utilities/hadoop-1.0.3/conf/hadoop-env.sh
JAVA_HOME will be commented by default.Edit the value for JAVA_HOME with your installation path and uncomment the line.The bin folder should not contain this JAVA_HOME path.
# The Java implementation to use.
export JAVA_HOME=<absolute path to java directory>
Step 4:Configuring Hadoop properties
In Hadoop, we have three configuration files core-site.xml,mapred-site.xml,hdfs-site.xml present in HADOOP_HOME/conf directory.
Editing the Configuration files
1. Core-site.xml
‘hadoop.tmp.dir’ , the directory specified by this property is used to store file system Meta information by namenode and block information by datanode.By default two directories by the name and data will be created in the mp dir.We need to ensure that ‘hduser’ has sufficient permission on the newly provided ‘hadoop.tmp.dir‘. We are configuring it to ‘/home/hduser/app/hadoop/tmp’.
The property ‘fs.default.name’ is required to provide the hostname and port of the namenode.Creating the directory and changing the ownership and permission to ‘hduser’
cd /home/hduser/Utilities
sudo mkdir -p /app/hadoop/tmp
sudo chown hduser:hadoop /app/hadoop/tmp
sudo chmod 755 /app/hadoop/tmp
Setting ownership and permission is very important.If you forget this, you will get into some exceptions while formatting the namenode.
Open the core-site.xml, you can see empty configuration tags.Add the following lines between the configuration tags.
sudo nano /home/hduser/Utilities/hadoop-1.0.3/conf/core-site.xml
| 123456789101112131415161718192021 | <configuration><property><name>hadoop.tmp.dir</name><value>/home/hduser/Utilities/app/hadoop/tmp</value><description>A base forother temporary directories.<description></property><name>fs.default.name</name><value>hdfs://<hostname/Ipaddress of the system where namenode is installed>:54310</value><description>The name of the default filesystem</description><property></configuration> |
2. hdfs-site.xml
It is used for file systems and storage.In the hdfs-site.xml, add the following property between the configuration tags.
sudo nano /home/hduser/Utilities/hadoop-1.0.3/conf/hdfs-site.xml
| 12345678910111213 | <configuration><property><name>dfs.replication<name><value>1</value><description>Default block replication</description><property></configuration> |
3.mapred-site.xml
This is used for processing.In the mapred-site.xml, we need to provide the hostname and port for jobtracker as tasktracker would be using this for their communication.
sudo nano /home/hduser/Utilities/hadoop-1.0.3/conf/mapred-site.xml
| 12345678910111213 | <configuration><property><name>mapred.job.tracker</name><value>hostname/ipaddressof the system where jobtracker is installed>:54311</value><description>The host and port that the mapreduce jobtracker runs</description></property></configuration> |
Step 5: Formatting Namenode
Before starting the hdfs daemons like Namenode for the first time, it is mandatory that you format the Namenode.This is only for the first run.For subsequent runs, the formatting of namenode will lose all data.Be careful not to format an already running cluster,even if you need to restart the namenode daemon.
Namenode can be formatted as
/home/hduser/Utilities/hadoop/bin/hadoop namenode -format
Step 6: Starting Hadoop Daemons
/home/hduser/Utilities/hadoop-1.0.3/bin/start-all.sh
This will run all the hadoop daemons namenode,datanode, secondarynamenode, jobtracker and tasktracker.
For stopping Hadoop daemons
we are using the command
/home/hduser/Utilities/hadoop-1.0.3/bin/stop-all.sh
This will stop all the Hadoop daemons.Only jps is showing after stopping the Hadoop daemons.