There are few changes one has to make for multi-node setup from single node. First you need to complete single node setup up to DFS formatting step.
Mainly there are five steps to follow for multi-node setup from single node:
STEPS:
- SSH COPY ID to all nodes
- Configure masters and slaves
- Configure CORE-SITE.XML and MAPRED-SITE.XML
- Format DFS
- START-ALL.SH
Now I am going to explain this steps in detail:
Step-1 SSH COPY ID to all nodes:
From NAME NODE, We need to generate SSH KEY and distribute it to all the SLAVE NODES and also SECONDARY NAME NODE (if any)
Command:
ssh-copy-id -i $HOME/.ssh/id_rsa.pub hadoop@datanode1
here "hadoop" is an user name and "datanode1" is a system name, which you need to change according to your setup.
COPY FINGERPRINT : GIVE YES
Do the same for all DATA NODES and for SECONDARY NAME NODE (if any)
Check whether it is successfully copied or not
ssh datanode1
it should not ask for password
eg.
ssh-copy-id -i $HOME/.ssh/id_rsa.pub hadoop@datanode1
ssh-copy-id -i $HOME/.ssh/id_rsa.pub hadoop@datanode2
ssh-copy-id -i $HOME/.ssh/id_rsa.pub hadoop@datanode3
Step-2 Configure masters and slaves:
We need to do it on NAME NODE alone (not on the DATA NODES and SECONDARY NAME NODE)
Go to NAME NODE
Command:
cd /usr/local/hadoop/conf
Find the two files: masters, and slaves
Masters for NAME NODE and SECONDARY NAME NODE
Slaves for DATA NODES
Command:
sudo nano /usr/local/hadoop/conf/masters
By default it contains 'localhost', Change it to the name of NAME NODE (i.e. namenode in my case)
Ctrl + o to save
Enter
Ctrl + x to exit
sudo nano /usr/local/hadoop/conf/slaves
By default it contains 'localhost', Change it to contain names of all DATA NODES one per line, in my case
datanode1datanode2
datanode3
Ctrl + o to save
Enter
Ctrl + x to exit
Step-3 Configure CORE-SITE.XML and MAPRED-SITE.XML
go to SLAVES/SECONDARY NAME NODE and we need to make them point to the master
Command:
sudo nano /usr/local/hadoop/conf/core-site.xml
Check whether it is pointing to NAME NODE (i.e. namenode in my case) in 'FS.DEFAULT.NAME', if it is pointing to localhost:10001, update localhost with namenode
Ctrl + o to save
Enter
Ctrl + x to exit
Same way for MAPRED-SITE.XML
Command:
sudo nano /usr/local/hadoop/conf/mapred-site.xml
Check whether it is pointing to the JOB TRACKER / NAME NODE (i.e. namenode, in my case)
If it is 'localhost:10002', update it as 'namenode:10002'
Remove LOCAL HOST entries from /ETC/HOSTS file
Command:
sudo nano /etc/hosts
remove localhost and entries for 127.0.0.1
Step-4 Format DFS:
If converting the existing single node installation then you must delete the /USR/LOCAL/HADOOP/TMP and then create it again in all the nodes and then format it from NAME NODE alone. skip up to formatting steps if you haven't formatted your HDFS with single node setup.
Command:
To remove directory:
sudo rm -r /usr/local/hadoop/tmp
Create tmp directory
sudo mkdir /usr/local/hadoop/tmp
Changing ownership of tmp as well as hadoop directory
sudo chown hadoop /usr/local/hadoop/tmp
sudo chown hadoop /usr/local/hadoop
Format NAME NODE
hadoop namenode -format
Check for 'name node successfully formatted' message
Step-5 START-ALL.SH
To start hadoop cluster with multi-node, we have to run this command from NAME NODE and it starts respective services on all NODES
Command:
start-all.sh
jps
check each system separately to find specific JVMs running on them
Check number of live nodes in web GUI (it will take few minuets)
stop-all.sh
Courtesy: Mr. Anand Kumar, NIT, Trichy
No comments:
Post a Comment