Elasticity Using LVM (Logical Volume Management)
Below steps we will perform in this blog :-
🔅Integrating LVM with Hadoop and providing Elasticity to DataNode Storage
🔅Increase or Decrease the Size of Static Partition in Linux.
🔅Automating LVM Partition using Python-Script.
Below ScreenShots will give u detailed information to configure namenode
- Check whether jdk and hadoop package installed on namenode using command
rpm -q jdk
rpm -q hadoop
2. Go to the /root folder
3. list the .rpm files in /root folder using ls
4. Download jdk****.rpm file using rpm -ivh jdk ***.rpm
jdk**.rpm downloaded
4. Download hadoop****.rpm file using rpm -ivh hadoop***.rpm
hadoop***.rpm downloaded
5. Check both jdk and hadoop installed or not and version using commands
for jdk :- java -version
for hadoop :- hadoop version
6. Create a directory at whatever location you want (Like I have created nn named directory in root ) using mkdir
N OTE :- you should know the exact location of your created directory
7. Go to the /etc/hadoop/
8. List all files using ls command
9. Go inside hdfs-site.xml using command vim hdfs-site.xml
hdfs-site.xml content
10. Go inside core-site.xml using vim command
core-site.xml content
11. Use hadoop namenode -format to formate the folder given in hdfs-site.xml(e.g nn folder I have given in my hdfs-site.xml file) to store metadata
Format done !
netstat -tnlp to list whether hadoop service started on 9001 port
But as shown in below pic hadoop service yet not started
12. stop firewall using commnad systemctl stop firewalld
13. Use hadoop-daemon.sh start namenode command to start the namenode
14. use jps command to check name node started or not
Again use netstat -tnlp command to check hadoop service started
as shown in below picture that hadoop service started on 9001 port
Below ScreenShots will give u detailed information to configure datanode
- Check whether jdk and hadoop package installed on namenode using command
rpm -q jdk
rpm -q hadoop
2. Go to the /root folder
3. list the .rpm files in /root folder using ls
4. Download jdk****.rpm file using rpm -ivh jdk ***.rpm
jdk**.rpm downloaded
4. Download hadoop****.rpm file using rpm -ivh hadoop***.rpm
hadoop***.rpm downloaded
5. Check both jdk and hadoop installed or not and version using commands
for jdk :- java -version
for hadoop :- hadoop version
6. Create a directory at whatever location you want whose storage you want to contribute to namenode i.e master(Like I have created dn1 named directory in root ) using mkdir
NOTE :- you should know the exact location of your created directory
6. Go to the /etc/hadoop/
7. List all files using ls command
8. Go inside hdfs-site.xml using command vim hdfs-site.xml
9. Go inside core-site.xml using vim command
core-site.xml file content
10. Use hadoop-daemon.sh start datanode command to start the datanode
11. use jps command to check data node started or not
So after using this command my datanode started
Starting Connectivity
On Namenode Side
- Use hadoop-daemon.sh start namenode command to start the namenode
- Use jps command to check namenode started or not
- Use netstat -tnlp to check whether hadoop server started or not
In my case my both namenode and hadoop server started on namenode side
On Datanode Side
- Use hadoop-daemon.sh start datanode command to start the datanode
- Use jps command to check namenode started or not
In my case my datanode started
3. I have created /dn1 in root folder and root folder has 8.3 GB availailable space . As my dn1 folder created in root folder so it mean that this folder can use the whole storage available in root folder . It means on connectivity with namenode it will contribute 8.3GB to namenode
On Namenode Side
Use hadoop dfsadmin -report to list all the datnode connected and total how much storage getting from different data nodes connected.
In my case I have configured only one datanode and started it to conect to namenode . My datnode is contributing 8.3GB to namenode
As shown in below picture
- Only one datanode connected
2. Namenode is getting 8.3GB contribution from datanode
STEP 1 :-Integrating LVM with Hadoop and providing Elasticity to DataNode Storage
Now increasing datanode storage using lvm
On Datanode Side
- Use fdisk -l command to list all the hard disks attached to this vm.
As shown in picture I have 3 hard disks attached to my VM /dev/sda /dev/sdb /dev/sdc
2. Now we want to contribute first 4 GB then increase the size of contribution to 8GB to namenode. We need to implement LVM
To implement LVM
- Convert the virtual hard disk volume to physical volume using command pvcreate disk_name
I have converted both hard disk of size 3GB and 7 GB into physical volumes
2. To view the details of physical volumes created use command pvdisplay disk_name (whose physical volume details you want to see)
I have converted both disk into pv so I have displayed both details
3. Create the virtual group to group the storage space of different disk as a single storage space in one disk.
use command vgcreate vg_name disk_names
Give your choice vg name
4. To display created virtual group details use command vgdisplay vg_name
As shown in below picture I have created the virtual group named vg1 of size 10GB combining the storage space of two different disks of size 7GB and 3GB as a whole 10GB in vg1
Concluding from the vgdisplay vg1 command that right now vg1 is not partitioned or allocated any space
whole 10GB is available as free storage space
5. Creating logical volume
vg is like hard disk space and lv is like creating partition of our desired size out of whole vg space.
So here , I am creating the lv of 4GB out of 10GB
use commnad to create lv is lvcreate — — size size_in_GB/MB/KB — — name lv_name vg_name
6. use lvdisplay vg_name/lv_name to list the details of logical volume like listing lv size, lv name, vg name etc….
After creating lv 1 of 4GB in vg1 of 10 GB .
Again list the details of vg so it will show that now 4GB out of 10GB is allocated and 6 GB is free space
7. Format loical volume created
As logical volume is like partition and to provide partition space as storage we neesd to format that partition. Same with the logical volume. We want to contribute the our logical volume named lv1 4GB space as storage space from datanode to namenode.
to formate lv use command mkfs.ext4 /dev/vg_name/lv_name
8. Create folder of any name at any desired location
NOTE :- you should know the correct path of that folder
I have created folder by the name of lvm1 in the root folder
9.Mount the logical volume to folder created
use command mount /dev/vg_name/lv_name foldername_withpath
10. To confirm the folder mounted on our desired volume or not
use command df -h
My /dev/vg1/lv1 correctly mounted on /lvm1
/lvm1 also showing the size 4GB same as size of lv1
Continuing
3. Edit the content of hdfs-site.xml file.
Now I want to contribute only 4GB fom datanode to namenode .
Edited hdfs-site.xml content given below
Setting Connectivity
On Namenode Side
- Use hadoop-daemon.sh stop namenode command to stop the namenode
- Use hadoop-daemon.sh start namenode command to again start the namenode
- Use jps command to check namenode started or not
- Use netstat -tnlp to check whether hadoop server started or not
In my case my both namenode and hadoop server started on namenode side
On Datanode Side
- Use hadoop-daemon.sh stop datanode command to stop the datanode
- Use jps command to check datanode stoped or not.In my case my datanode stoped.
- Use hadoop-daemon.sh start datanode command to again start the datanode
- Use jps command to check datanode started or not
In my case my datanode started
5 . I have created /lvm1 in root folder and mounted on /dev/vg1/lv1 which has 3.7GB availailable space . From my datanode side I am providing /lvm1 space . It means on connectivity with namenode it will contribute 3.** GB to namenode.
On Namenode Side
Use hadoop dfsadmin -report to list all the datnode connected and total how much storage getting from different data nodes connected.
In my case I have configured only one datanode and started it to conect to namenode . Now My datanode is contributing 3.**GB of lvm1 to namenode
As shown in below picture
- Only one datanode connected
2. Namenode is getting 3.87GBGB contribution from datanode
STEP 2 :- Increase or Decrease the Size of Static Partition in Linux.
Extending lvm
On Datanode Side
- First datanode is contributing /lvm1 space (4GB can say 3.64 GB ) to namenode . Now we want to extend the space contributed by lvm1.
- To extend/increase the size of lvm1 use command lvextend — — size +increase_size /dev/vg_name/lv_name
3. use df -h to list that whether the size of lvm1 extended as it is mounted on lv1
But size not extended because df -h command only shows the space that is first formated and then mounted. but the last added 4GB size is not formated
so need to formate the next added 4GB to lv1
4. To format the lastly added 4GB
use command resize2fs /dev/vg_name/lv_name
5. use df -h to list that whether the size of lvm1 extended as it is mounted on lv1
So size extended to 8GB of lv1 and lv1 is mounted by lvm1 so indirectly now lvm1 is of size around 8GB and contributing 7.99 or 7.80GB to namenode from datanode
Setting Connectivity
On Namenode Side
- Use hadoop-daemon.sh stop namenode command to stop the namenode
- Use hadoop-daemon.sh start namenode command to again start the namenode
- Use jps command to check namenode started or not
- Use netstat -tnlp to check whether hadoop server started or not
In my case my both namenode and hadoop server started on namenode side
On Datanode Side
- Use hadoop-daemon.sh stop datanode command to stop the datanode
- Use jps command to check datanode stoped or not.In my case my datanode stoped.
- Use hadoop-daemon.sh start datanode command to again start the datanode
- Use jps command to check datanode started or not
In my case my datanode started
On Namenode Side
Use hadoop dfsadmin -report to list all the datnode connected and total how much storage getting from different data nodes connected.
In my case I have configured only one datanode and started it to conect to namenode . Now My datanode is contributing 7.**GB of lvm1 to namenode
As shown in below picture
- Only one datanode connected
2. Namenode is getting 7.81GB contribution from datanode
STEP 3 :- Automating LVM Partition using Python-Script.
This is an python-script code to automate LVM