Deploy Hadoop Cluster on Windows OS

1) Install JAVA SDK

1.1. Download Java SDK from the link

1.2. Set environmental variables.

Do: My computer -> Properties-> Advance system settings -> Advanced -> Environmental variables

User variable:

Variable: JAVA_HOME

Value: C:\Progra~1\Java\jdk1.8.0_121

System variable:

Variable: PATH

Value: C:\Progra~1\Java\jdk1.8.0_121\b

1.3. Check on cmd, see below: java -version

2) Install Apache Hadoop

Download Apache Hadoop from the original source
Put extracted Hadoop files into C drive: C:/Hadoop
Set environmental variables for HADOOP:

User variables:

Variable: HADOOP_HOME

Value: C:\hadoop

System variables:

Variable: PATH

Value: C:/hadoop/bin

Check on cmd: hadoop -version

3) Fill up Hadoop's configuration files

Go to C: Hadoop\etc\hadoop and edit the following files: core-site.xml, hdfs-site.xml, slaves, master:

A) master and slaves files

Edit master file in master and slaves machine as follows:

put IP addresses of master machine

Edit slaves file in master machine as follows:

put IP addresses of master and slaves node

Edit slaves file in slaves' machines as follows:

put IP addresses of slave node (itself)

B) core-site.xml file

Edit core-site.xml file in master and slaves machine as follows:

put the IP address of master node and default 9000 port (name node)

<configuration>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://10.8.41.146:9000</value>
  </property>
</configuration>

C) hdfs-site.xml file

Edit hdfs-site.xml file in master machine as follows:

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/hadoop/data/dfs/namenode</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/hadoop/data/dfs/datanode</value>
    </property>
    <property>
        <name>dfs.permissions</name>
        <value>false</value>
    </property>
    <property>
        <name>dfs.http.address</name>
        <value>127.0.0.1:50070</value>
    </property>
    <property>
        <name>dfs.webhdfs.enabled</name>
        <value>true</value>
        <description>to enable webhdfs</description>
        <final>true</final>
    </property>
</configuration>

Edit hdfs-site.xml file in slaves machines as follows:

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
        <name>dfs.permissions</name>
        <value>false</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/hadoop/data/dfs/datanode</value>
    </property>

4) Launch Hadoop Deamons

Start Hadoop HDFS by launching Hadoop Deamons on each node in cluster:

Go to the location: “D:\hadoop\sbin” and run the following files as administrator “start-dfs.cmd”
You should see two windows: namenode and datanode

5) Check Hadoop HDFS state via Web UI

Go to web browser http://localhost:50070 (in master node) or http://masterIP:50070 (in slaves) in order to check HDFS state

Deploy Hadoop Cluster on Windows OS

Deploy Hadoop Cluster on Windows OS

1) Install JAVA SDK

2) Install Apache Hadoop

3) Fill up Hadoop's configuration files

A) master and slaves files

B) core-site.xml file

C) hdfs-site.xml file

4) Launch Hadoop Deamons

5) Check Hadoop HDFS state via Web UI

results matching ""

No results matching ""