This is a rough draft. I will clean it up soon!
What I'm working with:
Macbook Air 1.7Ghz Intel Core i7 8GB 1600 MHz DDR3 OSX 10.8.5
VirtualBox 4.3.6
Prerequisites:
Build a VirtualBox machine to run Ubuntu 12.04.LTS 64-bit
Build Ubuntu Server 12.04LTS
From the newly installed server:
Create hadoop user and group:
sudo addgroup hadoop
sudo adduser --ingroup hadoop hadoop
sudo adduer hadoop sudo
Create ssh keys for the hadoop user (must be done as hadoop user):
ssh-keygen -t rsa -P “”
cp .ssh/id_rsa.php .ssh/authorized_keys
Update and install prerequisites:
ssh and rsync should already be installed.
sudo apt-get install python-software-properties
sudo apt-get install gcc
sudo apt-get install g++
wget htt://protobuf.googlecode.com/files/protobuf-2.5.0.tar.gz
tar -xvzf protobuf-2.5.0.tar.gz
cd protobuf-2.5.0
sudo ./configure
sudo make
sudo make check
sudo make install
sudo ldconfig
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java7-installer
Download hadoop from apache mirror:
wget http://www.eng.lsu.edu/mirrors/apache/hadoop/common/hadoop-2.2.0/hadoop-2.2.0.tar.gz
Unpack the hadoop tarball into /usr/local/:
sudo tar -xvzf hadoop-2.2.0.tar.gz -C /usr/local
Change the name of the hadoop directory in /usr/local/:
sudo mv /usr/local/hadoop-2.2.0 /usr/local/hadoop
Give the hadoop user and group ownership of the directory /usr/local/hadoop/:
sudo chown -R hadoop:hadoop /usr/local/hadoop
Edit hadoop users .bashrc and add the following to the end of the file:
# Hadoop Environment
export JAVA_HOME=/usr/lib/jvm/java-7-oracle/
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
Edit /usr/local/hadoop/etc/hadoop/hadoop-env.sh to add java path:
export JAVA_HOME=/usr/lib/jvm/java-7-oracle/
Login as the hadoop user and test if hadoop is installed:
hadoop version
Hadoop is now installed and needs to be configured.
Edit /usr/local/hadoop/etc/hadoop/core-site.xml and add the following inside the <configuration> tags:
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
Edit /usr/local/hadoop/etc/hadoop/yarn-site.xml and add the following inside the <configuration> tags:
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
Copy the mapreduce template to site xml
cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml
Edit the mapreduce-site.xml and add the following inside the <configuration> tags:
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
Create a temp datadirs in the hadoop users home directory for testing:
mkdir -p datadir/hdfs/namenode
mkdir -p datadir/hdfs/datanode
Edit /usr/local/hadoop/etc/hadoop/hdfs-site.xml and add the following inside the <configuration> tags:
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/datadir/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/datadir/hdfs/datanode</value>
</property>
Format the namenode as the hadoop user:
hdfs namenode -format
Start hadoop service:
start-dfs.sh
start-yarn.sh
Show hadoop services running:
jps
Run example for testing from /usr/local/hadoop as the hadoop user:
hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 2 5