How to integrate LVM with Hadoop and providing Elasticity to Data Node Storage?

7 min readDec 14, 2020

What is Hadoop?

Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. It provides massive storage for any kind of data, enormous processing power, and the ability to handle virtually limitless concurrent tasks or jobs.

What is LVM?

Logical volume management provides a higher-level view of the disk storage on a computer system than the traditional view of disks and partitions. This gives the system administrator much more flexibility in allocating storage to applications and users.

The logical volume manager also allows management of storage volumes in user-defined groups, allowing the system administrator to deal with sensibly named volume groups such as “development” and “sales” rather than physical disk names such as “sda” and “sdb”.

So, what we need let’s start…

If we have a Hadoop Distributed Cluster as we know in the Hadoop cluster we have Namenode(Which handles all processes of the cluster), Datanode(It’s responsible for sharing his storage to the Namenode), and Client.

In simple terms, DataNodes are used as a storage node. But, the DataNode’s storage is static, and the filesystem of DataNode is not too smart to increase their storage size in case of storage limit exceed.

Requirements:’

RHEL8
Installed LVM2
Created Hadoop Cluster

Attaching Harddisks with Datanode System:

We have 1 Datanode connected to the Namenode to implement Elasticity to Datanode through which that capable to increase his storage on the fly without any data loss in case of storage limit exceed.

To perform this practical I am using RHEL8 and attach two more harddisks with this OS. We need to follow the steps given in the below image to attach two more hard disks.

First, We go to Oracle VirtualBox and Select the OS in which we configured Datanode.

Click on the Datanode
Select the settings option.
After that select storage.
At last, attach two hard disks.

Attach hardisks

Till now we have attached two hard disks with our Operating System. To check available harddisks in the Operating System we can run fdisk -l command.

Implementation of LVM

To provide elasticity to Data Node, we need to understand the following commands before using them

pvcreate — This command converts the attached Disk or external disk into physical volume.
vgcreate — This command creates the volume groups for all the specified physical volumes.
vgdisplay — This command displays all the available or created volume groups.
lvcreate — This command creates the logical volume from the volume group.
lvdisplay — This command displays all the available or created logical volumes from the volume groups.
lvextend — This command extends the current logical volume size from the available space of the volume group.
resize2fs — This command extends the unallocated space of the partition without formatting it.

Converting Disk into Physical Volume

First of all, we need to check the volume name or directory before. To list volumes in Linux, we need to run the following command.

In this output, it can be seen that there are those two disks available that were created earlier in the previous steps.

Then, we need to convert them into physical volume using “pvcreate” command.

In this /dev/sdb and /dev/sdc both are disk name. Both are 10GB in size.

Now, run the following command to check or confirm whether physical volume(s) are created or not.

Creating Volume Group of Physical Volumes

To create a volume group of physical volumes, we need to run the following command.

In this command, the “taskseven” is the desired volume group name, and /dev/sdb/ /dev/sdc are the physical volumes.

We can also verify whether it is created or not by running the following command.

That’s all! Now the volume group “task seven” has a size of 19.99 GiB.

Creating Logical Volume from Volume Group

Now, the next step is to create logical volume from Volume Group which was created earlier in the previous steps. To do this, we need to run the following command.

In this command, I’ve used a size of 10 GB, and the desired name for the logical volume is “taskksevenlv1” and “taskseven” as a volume group from where logical volume will be created.

We can verify it by running the following command.

This command will show all the information about created and available Logical Volumes

Formatting the Logical Volume

To format the logical volume, we need to run the following command.

When you see it, that means the Logical Volume is formatted successfully.

Mounting the Logical Volume to the Hadoop Data Node Directory

This is a very important section, Before doing any operation, this information should be remembered that “all the volumes and disk(s) have their own path or directory, it’s like a folder and The Hadoop Data Node also uses a folder as Hadoop File System”. The conclusion is if we mount Logical Volume to the Hadoop DataNode Directory in our case the directory name (/dn1).

Before mount we’ll check our Name node status to check this we run the following command and check Name node service start or not?.

Oops, It is not started…

For start name node we run the following command

Now we check again Nam enode services started or not?

Yeah, Now Namenode service started successfully.

After Successfully started Namenode, We will check Datanode services with the jps command it is started or not?

Oops, it is also stopped.

For starting Datanode services we will run the following commands.

Now we check again datanode services with jps command.

Yeah, Now we can see datanode service start successfully.

Now we check how many Datanode connected with the Namenode with the following command.

Also, we’ll check how many storage before mounting the LV

Amazing, Till now we have to connect a data node with approx 47 GB shared size.

Now we mount the Logical Volume with the Hadoop Namenode Directory.

To verify whether it is mounted successfully or not, we need to run this command.

Great! The device “/dev/mapper/taskseven-taskksevenlv1” is successfully mounted to the Hadoop DataNode folder “/dn1”.

We can check the report of Hadoop in which we can verifying how many size sharing and how many Datanode connected to the Namenode.

Providing Elasticity to Hadoop DataNode using LVM “on the fly”

When we exceed the limit of Datanode of 10 GB which I shared till now after implementing LVM we have one LV with a size of 20 GB that is connected to the Hadoop Namenode Directory. That means it can fill up any time to using two commands we can easily extend the size of the LV partition on the fly.

To do this, we need to first run the following command.