Deploy Hadoop Cluster on Windows OS

1) Install JAVA SDK

1.1. Download Java SDK from the link

1.2. Set environmental variables.

Do: My computer -> Properties-> Advance system settings -> Advanced -> Environmental variables

User variable:

Variable: JAVA_HOME

Value: C:\Progra~1\Java\jdk1.8.0_121

System variable:

Variable: PATH

Value: C:\Progra~1\Java\jdk1.8.0_121\b

1.3. Check on cmd, see below: java -version

2) Install Apache Hadoop

  • Download Apache Hadoop from the original source

  • Put extracted Hadoop files into C drive: C:/Hadoop

  • Set environmental variables for HADOOP:

User variables:

Variable: HADOOP_HOME

Value: C:\hadoop

System variables:

Variable: PATH

Value: C:/hadoop/bin

  • Check on cmd: hadoop -version

3) Fill up Hadoop's configuration files

  • Go to C: Hadoop\etc\hadoop and edit the following files: core-site.xml, hdfs-site.xml, slaves, master:

A) master and slaves files

Edit master file in master and slaves machine as follows:

  • put IP addresses of master machine

Edit slaves file in master machine as follows:

  • put IP addresses of master and slaves node

Edit slaves file in slaves' machines as follows:

  • put IP addresses of slave node (itself)

B) core-site.xml file

Edit core-site.xml file in master and slaves machine as follows:

  • put the IP address of master node and default 9000 port (name node)
<configuration>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://10.8.41.146:9000</value>
  </property>
</configuration>

C) hdfs-site.xml file

  • Edit hdfs-site.xml file in master machine as follows:
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/hadoop/data/dfs/namenode</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/hadoop/data/dfs/datanode</value>
    </property>
    <property>
        <name>dfs.permissions</name>
        <value>false</value>
    </property>
    <property>
        <name>dfs.http.address</name>
        <value>127.0.0.1:50070</value>
    </property>
    <property>
        <name>dfs.webhdfs.enabled</name>
        <value>true</value>
        <description>to enable webhdfs</description>
        <final>true</final>
    </property>
</configuration>
  • Edit hdfs-site.xml file in slaves machines as follows:
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
        <name>dfs.permissions</name>
        <value>false</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/hadoop/data/dfs/datanode</value>
    </property>

4) Launch Hadoop Deamons

Start Hadoop HDFS by launching Hadoop Deamons on each node in cluster:

  • Go to the location: “D:\hadoop\sbin” and run the following files as administrator “start-dfs.cmd

  • You should see two windows: namenode and datanode

5) Check Hadoop HDFS state via Web UI

Go to web browser http://localhost:50070 (in master node) or http://masterIP:50070 (in slaves) in order to check HDFS state

results matching ""

    No results matching ""