Elasticity Using LVM (Logical Volume Management)

Aayushi Shah
13 min readNov 16, 2020

--

Below steps we will perform in this blog :-

🔅Integrating LVM with Hadoop and providing Elasticity to DataNode Storage

🔅Increase or Decrease the Size of Static Partition in Linux.

🔅Automating LVM Partition using Python-Script.

Below ScreenShots will give u detailed information to configure namenode

  1. Check whether jdk and hadoop package installed on namenode using command

rpm -q jdk

rpm -q hadoop

2. Go to the /root folder

3. list the .rpm files in /root folder using ls

4. Download jdk****.rpm file using rpm -ivh jdk ***.rpm

jdk**.rpm downloaded

4. Download hadoop****.rpm file using rpm -ivh hadoop***.rpm

hadoop***.rpm downloaded

5. Check both jdk and hadoop installed or not and version using commands

for jdk :- java -version

for hadoop :- hadoop version

6. Create a directory at whatever location you want (Like I have created nn named directory in root ) using mkdir

N OTE :- you should know the exact location of your created directory

7. Go to the /etc/hadoop/

8. List all files using ls command

9. Go inside hdfs-site.xml using command vim hdfs-site.xml

hdfs-site.xml content

10. Go inside core-site.xml using vim command

core-site.xml content

11. Use hadoop namenode -format to formate the folder given in hdfs-site.xml(e.g nn folder I have given in my hdfs-site.xml file) to store metadata

Format done !

netstat -tnlp to list whether hadoop service started on 9001 port

But as shown in below pic hadoop service yet not started

12. stop firewall using commnad systemctl stop firewalld

13. Use hadoop-daemon.sh start namenode command to start the namenode

14. use jps command to check name node started or not

Again use netstat -tnlp command to check hadoop service started

as shown in below picture that hadoop service started on 9001 port

Below ScreenShots will give u detailed information to configure datanode

  1. Check whether jdk and hadoop package installed on namenode using command

rpm -q jdk

rpm -q hadoop

2. Go to the /root folder

3. list the .rpm files in /root folder using ls

4. Download jdk****.rpm file using rpm -ivh jdk ***.rpm

jdk**.rpm downloaded

4. Download hadoop****.rpm file using rpm -ivh hadoop***.rpm

hadoop***.rpm downloaded

5. Check both jdk and hadoop installed or not and version using commands

for jdk :- java -version

for hadoop :- hadoop version

6. Create a directory at whatever location you want whose storage you want to contribute to namenode i.e master(Like I have created dn1 named directory in root ) using mkdir

NOTE :- you should know the exact location of your created directory

6. Go to the /etc/hadoop/

7. List all files using ls command

8. Go inside hdfs-site.xml using command vim hdfs-site.xml

9. Go inside core-site.xml using vim command

core-site.xml file content

10. Use hadoop-daemon.sh start datanode command to start the datanode

11. use jps command to check data node started or not

So after using this command my datanode started

Starting Connectivity

On Namenode Side

  1. Use hadoop-daemon.sh start namenode command to start the namenode
  2. Use jps command to check namenode started or not
  3. Use netstat -tnlp to check whether hadoop server started or not

In my case my both namenode and hadoop server started on namenode side

On Datanode Side

  1. Use hadoop-daemon.sh start datanode command to start the datanode
  2. Use jps command to check namenode started or not

In my case my datanode started

3. I have created /dn1 in root folder and root folder has 8.3 GB availailable space . As my dn1 folder created in root folder so it mean that this folder can use the whole storage available in root folder . It means on connectivity with namenode it will contribute 8.3GB to namenode

On Namenode Side

Use hadoop dfsadmin -report to list all the datnode connected and total how much storage getting from different data nodes connected.

In my case I have configured only one datanode and started it to conect to namenode . My datnode is contributing 8.3GB to namenode

As shown in below picture

  1. Only one datanode connected

2. Namenode is getting 8.3GB contribution from datanode

STEP 1 :-Integrating LVM with Hadoop and providing Elasticity to DataNode Storage

Now increasing datanode storage using lvm

On Datanode Side

  1. Use fdisk -l command to list all the hard disks attached to this vm.

As shown in picture I have 3 hard disks attached to my VM /dev/sda /dev/sdb /dev/sdc

2. Now we want to contribute first 4 GB then increase the size of contribution to 8GB to namenode. We need to implement LVM

To implement LVM

  1. Convert the virtual hard disk volume to physical volume using command pvcreate disk_name

I have converted both hard disk of size 3GB and 7 GB into physical volumes

2. To view the details of physical volumes created use command pvdisplay disk_name (whose physical volume details you want to see)

I have converted both disk into pv so I have displayed both details

3. Create the virtual group to group the storage space of different disk as a single storage space in one disk.

use command vgcreate vg_name disk_names

Give your choice vg name

4. To display created virtual group details use command vgdisplay vg_name

As shown in below picture I have created the virtual group named vg1 of size 10GB combining the storage space of two different disks of size 7GB and 3GB as a whole 10GB in vg1

Concluding from the vgdisplay vg1 command that right now vg1 is not partitioned or allocated any space

whole 10GB is available as free storage space

5. Creating logical volume

vg is like hard disk space and lv is like creating partition of our desired size out of whole vg space.

So here , I am creating the lv of 4GB out of 10GB

use commnad to create lv is lvcreate — — size size_in_GB/MB/KB — — name lv_name vg_name

6. use lvdisplay vg_name/lv_name to list the details of logical volume like listing lv size, lv name, vg name etc….

After creating lv 1 of 4GB in vg1 of 10 GB .

Again list the details of vg so it will show that now 4GB out of 10GB is allocated and 6 GB is free space

7. Format loical volume created

As logical volume is like partition and to provide partition space as storage we neesd to format that partition. Same with the logical volume. We want to contribute the our logical volume named lv1 4GB space as storage space from datanode to namenode.

to formate lv use command mkfs.ext4 /dev/vg_name/lv_name

8. Create folder of any name at any desired location

NOTE :- you should know the correct path of that folder

I have created folder by the name of lvm1 in the root folder

9.Mount the logical volume to folder created

use command mount /dev/vg_name/lv_name foldername_withpath

10. To confirm the folder mounted on our desired volume or not

use command df -h

My /dev/vg1/lv1 correctly mounted on /lvm1

/lvm1 also showing the size 4GB same as size of lv1

Continuing

3. Edit the content of hdfs-site.xml file.

Now I want to contribute only 4GB fom datanode to namenode .

Edited hdfs-site.xml content given below

Setting Connectivity

On Namenode Side

  1. Use hadoop-daemon.sh stop namenode command to stop the namenode
  2. Use hadoop-daemon.sh start namenode command to again start the namenode
  3. Use jps command to check namenode started or not
  4. Use netstat -tnlp to check whether hadoop server started or not

In my case my both namenode and hadoop server started on namenode side

On Datanode Side

  1. Use hadoop-daemon.sh stop datanode command to stop the datanode
  2. Use jps command to check datanode stoped or not.In my case my datanode stoped.
  3. Use hadoop-daemon.sh start datanode command to again start the datanode
  4. Use jps command to check datanode started or not

In my case my datanode started

5 . I have created /lvm1 in root folder and mounted on /dev/vg1/lv1 which has 3.7GB availailable space . From my datanode side I am providing /lvm1 space . It means on connectivity with namenode it will contribute 3.** GB to namenode.

On Namenode Side

Use hadoop dfsadmin -report to list all the datnode connected and total how much storage getting from different data nodes connected.

In my case I have configured only one datanode and started it to conect to namenode . Now My datanode is contributing 3.**GB of lvm1 to namenode

As shown in below picture

  1. Only one datanode connected

2. Namenode is getting 3.87GBGB contribution from datanode

STEP 2 :- Increase or Decrease the Size of Static Partition in Linux.

Extending lvm

On Datanode Side

  1. First datanode is contributing /lvm1 space (4GB can say 3.64 GB ) to namenode . Now we want to extend the space contributed by lvm1.
  2. To extend/increase the size of lvm1 use command lvextend — — size +increase_size /dev/vg_name/lv_name

3. use df -h to list that whether the size of lvm1 extended as it is mounted on lv1

But size not extended because df -h command only shows the space that is first formated and then mounted. but the last added 4GB size is not formated

so need to formate the next added 4GB to lv1

4. To format the lastly added 4GB

use command resize2fs /dev/vg_name/lv_name

5. use df -h to list that whether the size of lvm1 extended as it is mounted on lv1

So size extended to 8GB of lv1 and lv1 is mounted by lvm1 so indirectly now lvm1 is of size around 8GB and contributing 7.99 or 7.80GB to namenode from datanode

Setting Connectivity

On Namenode Side

  1. Use hadoop-daemon.sh stop namenode command to stop the namenode
  2. Use hadoop-daemon.sh start namenode command to again start the namenode
  3. Use jps command to check namenode started or not
  4. Use netstat -tnlp to check whether hadoop server started or not

In my case my both namenode and hadoop server started on namenode side

On Datanode Side

  1. Use hadoop-daemon.sh stop datanode command to stop the datanode
  2. Use jps command to check datanode stoped or not.In my case my datanode stoped.
  3. Use hadoop-daemon.sh start datanode command to again start the datanode
  4. Use jps command to check datanode started or not

In my case my datanode started

On Namenode Side

Use hadoop dfsadmin -report to list all the datnode connected and total how much storage getting from different data nodes connected.

In my case I have configured only one datanode and started it to conect to namenode . Now My datanode is contributing 7.**GB of lvm1 to namenode

As shown in below picture

  1. Only one datanode connected

2. Namenode is getting 7.81GB contribution from datanode

STEP 3 :- Automating LVM Partition using Python-Script.

This is an python-script code to automate LVM

--

--