DevOps: “Configuration of Hadoop Cluster on the top of RedHat Using Ansible”

5 min readJan 25, 2021

What is Hadoop

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures. Learn more>>

What is Ansible

Ansible is a radically simple IT automation engine that automates cloud provisioning, configuration management, application deployment, intra-service orchestration, and many other IT needs.

Designed for multi-tier deployments since day one, Ansible models your IT infrastructure by describing how all of your systems inter-relate, rather than just managing one system at a time. Learn more

Now, We start the implementation of creating Hadoop-Cluster.

Pre-Requisite

Ansible-Controller-Node

We require two other systems one for Hadoop Namenode and another is Hadoop Datanode. Both will be used as ansible managed nodes.

Hadoop Namenode-192.168.43.48
Hadoop Datanode-192.168.43.157

We can confirm with the below image

Configuration of Control Node

In this section, we need to follow steps to do operation as follows

Creating an inventory file:

In this section, we need to create an inventory file we have to write the IP, username, and password of the managed nodes.

In this file, we assign group names(Which is helpful in writing playbooks and more) to each IP with names for NameNode(NN) and DataNode(DN) like the below image.

To create an inventory file use the following command

vim /root/ip.txt

Creating ansible.cfg file

To create this file we use the following command

mkdir /etc/ansible
vim /etc/ansible/amnsible.cfg

After creating both files successfully now, we check the connectivity between the controller node and managed node and how many managed nodes are there means connected.

We can list of hosts using the following command

ansible all --list-hosts

And for checking connectivity we use the command as follows

ansible all -m ping

Yeah, Till now we have configured the controller node successfully, After this, we come up to create Ansible-Playbook.

We’ll make an ansible-playbook saperatily for both namely namenode.yml and datanode.yml. So let’s create namenode.yml

Creating Ansible-Playbook for NameNode

To create an ansible-playbook for namenode, we need to follow steps as follows:

Steps-1: Create two variables “NN_DIR” create a directory for namenode and “port_no” assign the port to namenode.

To create a variable we use the following module “vars_prompt”

Steps-2: Copy software in the directory of hosts. To copy the file from the source to the destination we use the “copy” module.

Steps-3: Install the software in the directory of hosts. To install the software in the managed node we use the “package” module.

Steps-4: Create a directory for namenode. To create a file or directory we use the “file” module.

Steps-5: Now we configure the Hadoop files namely hdfs-site.xml and core-site.xml we use the “blockinfile” module.

Steps-6: After that, we need to format the directory and then, start namenode services.

To format and start services we use the following syntax given in the below image

Creating Ansible-Playbook for DataNode

To create an ansible-playbook for datanode, we need to follow steps as follows:

Steps-1: Create two variables “DN_DIR” create a directory for datanode and “port_no” assign the port to namenode and NN_IP to assign namenode IP..

To create a variable we use the following module “vars_prompt”

Steps-2: Copy software in the directory of hosts. To copy the file from the source to the destination we use the “copy” module.

Steps-3: Install the software in the directory of hosts. To install the software in the managed node we use the “package” module.

Steps-4: Create a directory for datanode. To create a file or directory we use the “file” module.

Steps-5: Now we configure the Hadoop files namely hdfs-site.xml and core-site.xml we use the “blockinfile” module.

Steps-6: After that, we need to start datanode services.

To start services we use the following syntax given in the below image

DevOps: “Configuration of Hadoop Cluster on the top of RedHat Using Ansible”

What is Hadoop

What is Ansible

Pre-Requisite

Configuration of Control Node

Creating an inventory file:

Creating ansible.cfg file

Creating Ansible-Playbook for NameNode

Creating Ansible-Playbook for DataNode

thanks for reading..

Written by Deepak Sharma

No responses yet

DevOps: “Configuration of Hadoop Cluster on the top of RedHat Using Ansible”​

What is Hadoop

What is Ansible

Pre-Requisite

Configuration of Control Node

Creating an inventory file:

Creating ansible.cfg file

Creating Ansible-Playbook for NameNode

Creating Ansible-Playbook for DataNode

thanks for reading..

Written by Deepak Sharma

No responses yet

DevOps: “Configuration of Hadoop Cluster on the top of RedHat Using Ansible”