Content

Saturday, August 18, 2018

Hadoop 1.2.1 Installation and Configuration on Single Node

I have experienced difficulties in installing and configuring Hadoop so I want to make one easy guide for installation and configuration of Hadoop 1x. I am assuming that readers have a knowledge of basic Linux commands, hence, I am not going to explain those commands in detail.
I have used Hadoop-1.2.1, jdk-7, and Ubuntu(Linux) in our setup.
Install SSH:
  • We require SSH for remote login to different machines for Map Reduce task to run on Hadoop cluster
Commands:
  • sudo apt-get update
    • updates list of packages
    •  
  • sudo apt-get install openssh-server
    • Installs OpenSSH Server

     

Generate Keys:
  • Hadoop logged in to remote machines many times while running a MapReduce task. Therefore, we need to make a passwordless entry for Hadoop to all the nodes in our cluster.
Commands:
  • ssh CSE-DBMS-04
    • instead of CSE-DBMS-04, you should write your system's hostname or localhost. It asks for a password


  • ssh-keygen
    • generates SSH Keys
  • ENTER FILE NAME:
    • no need to write anything simply press enter as we want to use default settings
  • ENTER PASSPHRASE:
    • no need to write anything simply press enter as we want to use default settings
  • cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
    • Copy id_rsa.pub key to authorized keys to make a passwordless entry for the user
  • ssh CSE-DBMS-04
    • Now it should not ask for a password

Install Java
  • I prefer offline Java installation. So I have already downloaded Java tarball and place it in my Downloads directory
Commands:
  • sudo mkdir -p /usr/lib/jvm/
    • create a directory for Java

  • sudo tar -xvf ~/Downloads/jdk-7u67-linux-x64.tar.gz -C /usr/lib/jvm
    • extract and copy the content of Java tarball to Java directory

  • cd /usr/lib/jvm
    • go to Java directory

  • sudo ln -s jdk1.7.0_67 java-1.7.0-sun-amd64
    • generate a symbolic link to jdk directory which will be used in Hadoop configuration

  • sudo update-alternatives --config java
    • checking and setting Java alternatives

  • sudo nano $HOME/.bashrc
    • setting the Java path. Add following two lines at the end of this file
      • export JAVA_HOME="/usr/lib/jvm/jdk1.7.0_67"
      • export PATH="$PATH:$JAVA_HOME/bin"

  • exec bash
    • restarts bash(terminal)

  • java
    • it should not show "command not found!" error

Install Hadoop:
  • First, we need to download the required tarball of Hadoop.
Commands:
  • sudo mkdir -p /usr/local/hadoop/
    • create Hadoop directory

  • sudo tar -xvf ~/Download/hadoop-1.2.1-bin.tar.gz
  • cd ~/Download
  • sudo cp -r hadoop-1.2.1/* /usr/local/hadoop/
    • extract and copy Hadoop files from tarball to the Hadoop directory

  •  sudo nano $HOME/.bashrc
    • setting the Hadoop path. Add following lines at the end of this file
      • export HADOOP_PREFIX=/usr/local/hadoop
      • export PATH=$PATH:$HADOOP_PREFIX/bin

  • exec bash
    • restarts bash(terminal)

  • hadoop
    • it should not show "command not found!" error

Configuration of Hadoop
  • We are setting some environment variables and changing some configuration file according to our cluster setup.
Commands:
  • cd /usr/local/hadoop/conf
    • go to configuration directory of Hadoop

  • sudo nano hadoop-env.sh
    • open environment variable file and add following two lines at its respective place in file. They are already available in a file with some different values so keep it as it is and add this lines after it respectively
      • export JAVA_HOME=/usr/lib/jvm/java-1.7.0-sun-amd64
      • export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true

  • sudo nano core-site.xml
    • open HDFS configuration file and set name server address and tmp dir value as shown in below figure. you have to use your hostname instead of "CSE-DBMS-04" 
    • fs.default.name   hdfs://CSE-DBMS-04:10001
    • hadoop.tmp.dir   /usr/local/hadoop/tmp

  • sudo nano mapred-site.xml
    • Open Map Reduce configuration file and set job tracker value. you have to write your hostname instead of "CSE-DBMS-04"
    • mapred.job.tracker   CSE-DBMS-04:10002

  • sudo mkdir /usr/local/hadoop/tmp
    • create tmp directory to store all files on the data node

  • sudo chown hadoop /usr/local/hadoop/tmp
    • change owner of the directory to avoid access control issues. write your username instead of "hadoop"

  • sudo chown hadoop /usr/local/hadoop
    • change owner of the directory to avoid access control issues. write your username instead of "hadoop"

Format DFS (skip this step if you are going for Multi-Node setup):
  • Now we are ready to format our Distributed File System (DFS)
Command:
  • hadoop namenode -format
    • Check for the message "namenode successfully formatted"

Start all process:
  • We are ready to start our hadoop cluster (though only single node)
Commands:
  • start-all.sh
    • to start all (name-node, secondary name node, data node, job tracker, task tracker)

  • jps
    • to check whether all services (i.e. name node, secondary name node, data node, job tracker, task tracker) started or not

To check cluster details on web interface:
  • Open any browser and got following address:
    • http://CSE-DBMS-04:50070/dfshealth.jsp
      • for DFS (name-node) details. Write your hostname instead of "CSE-DBMS-04"

    • http://CSE-DBMS-04:50030/jobtracker.jsp
      • for Map Reduce (job tracker) details. Write your hostname instead of "CSE-DBMS-04"

Stop all processes:
  • If you want to stop (shut down) all your Hadoop cluster services
Command:
  • stop-all.sh

For any queries you can write in a comment or mail me at "brijeshbmehta@gmail.com"

Courtesy: Mr. Anand Kumar, NIT, Trichy

Originally posted on 14/12/2016 at https://brijeshbmehta.wordpress.com/2016/12/14/hadoop-1x-installation-and-configuration-on-single-node/