Hadoop Installation for Windows..
Steps :
A. Prequisites
1. Install Putty. http://www.chiark.greenend.org.uk/~sgtatham/putty/
2.Get Ec2 Instance.
3.Go to EC2 instance and check the public DNS name.
4.Get Private key(.pem)
B.Convert .pem to .ppk
1. Start PuTTYgen.
2.Under Type of key to generate, select SSH-2 RSA.
3.Click Load. By default, PuTTYgen displays only files with the extension .ppk. To locate your .pem file, select the option to display files of all types. Generate PPK from Pem file you obtained from ec2.
4a. Click Load if already saved Config
sudo apt-get install openjdk-7-jdk
6.Install OpenSSH for SSH
sudo apt-get install openssh-server
7.Create Hadoop user. So that you can run your
program
sudo addgroup hadoop // creating Hadoop Group
sudo adduser --ingroup hadoop hduser // adding user
to the group
sudo adduser hduser sudo
8. Generating Keys
$ ssh-keygen -t rsa -P ''
...
Your identification has been saved in
/home/hduser/.ssh/id_rsa.
...
$ cat ~/.ssh/id_rsa.pub >>
~/.ssh/authorized_keys
$ ssh localhost
9.Disabling IPV6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
9. Installing Hadoop
Cd /usr/local/
sudo wget http://archive.apache.org/dist/hadoop/core/hadoop-0.20.203.0/hadoop-0.20.203.0rc1.tar.gz
//Unzip the tar
sudo tar xzf hadoop-0.20.203.0rc1.tar.gz
//move
sudo mv hadoop-0.20.203.0rc1 hadoop
//set ownership
sudo chown -R hduser:hadoop hadoop
10.Update .bashrc with the proper path
cd ~
add java_home
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export HADOOP_HOME=/usr/local/hadoop
11.cd /usr/local/hadoop/etc/hadoop
vi hadoop-env.sh
JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
12.Create tmp for hdfs
$ sudo mkdir -p /app/hadoop/tmp
$ sudo chown hduser:hadoop /app/hadoop/tmp
# ...and if you want to tighten up security, chmod
from 755 to 750...
$ sudo chmod 750 /app/hadoop/tmp
13. Configure Hadoop - edi core-site.xml
cd /usr/local/hadoop/etc/hadoop
or
cd vi /usr/local/hadoop/cong/core-site.xml
or
cd vi /usr/local/hadoop/cong/core-site.xml
vi core-site.xml
For version <2
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system. A URI whose
scheme and
authority determine the FileSystem implementation. The
uri's
scheme determines the config property (fs.SCHEME.impl) naming
the
FileSystem implementation class. The
uri's authority is used to
determine
the host, port, etc. for a filesystem.</description>
</property>
For version >2
Bcz of Yarn
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property
13. update for version < 2
mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run
in-process as a single map
and reduce
task.
</description>
</property>
for version >2
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
14. Set Replication Factor
file conf/hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual
number of replications can be specified when the file is created.
The default
is used if replication is not specified in create time.
</description>
</property>
15. Format nAmenode
hadoop namenode -format
16.Start-all.sh
depecated for v >2 but will work Invokes both
dfs and mapred-site.xml
----- Till here single node ---
For multinode..
17.Everything is working fine..
18.Create few more instances of EC2 clusters in
same way
Update vi etc/hosts from home directory
add IP address and name which you will use
ex...
10.23.45.67 master10.11.11.11 slave1
10.12.12.12 slave2
19.Copy pub key of master to the list of authorized
key of client
vi /home/hduser/.ssh/id_rsa.pub
add the content to
vi /home/hduser/.ssh/id_rsa/authorized_keys of
slave1
20.you can ssh to both master and slave
21.once you type ssh slave you will be taken to
slave machine
exit it from maser
22.On master edit conf/master
master
23. On master edit conf/slaves
add the following
master
slave1
slave2
23. Update core-site.xml ALL MACHINE
<property>
<name>fs.default.name</name>
<value>hdfs://IPADDRESS OF MASTER HERE:54310</value>
<description>The
name of the default file system. A URI
whose
scheme and
authority determine the FileSystem implementation. The
uri's
scheme determines the config property (fs.SCHEME.impl) naming
the
FileSystem implementation class. The
uri's authority is used to
determine
the host, port, etc. for a filesystem.</description>
</property>
23. update mapred-site.xml for all machine
<property>
<name>mapred.job.tracker</name>
<value>masterIP ADDRESS:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run
in-process as a single map
and reduce
task.
</description>
</property>
24.set
replication update hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>2</value>
<description>Default
block replication.
The actual
number of replications can be specified when the file is created.
The default
is used if replication is not specified in create time.
</description>
</property>
25.format name node
26.start-all.sh
This is what i got from reading from various
blogs.. I have installed Hadoop many times
and feel like I should write the steps
Its the mix of both single and multinode cluster
I used
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/#java-io-ioexception-incompatible-namespaceids
but modified in many ways..