Contribute limited/specific amount of storage as a slave to the Hadoop cluster
What is Hadoop?
Hadoop is an open source framework that is used to store and process a large amount data efficiently. Now a days, data is something that is helping any company to produce a good product but in order to benefited from the data industries are collecting extremely large amount of data. And to store and process that is really a tough task and hence Hadoop role come in play.
With out collecting all the data on a single computation unit, Hadoop breaks the data in different parts and store that in multiple computation units. This group of multiple computational units known as Hadoop cluster.
in Hadoop cluster there is one unit called namenode, it is something like master node which keeps all the spaces shared by different computational units. The different computational unit that shared it’s data is known as datanode. There is one more unit known as clientnode, it helps in uploading or getting back the data.
CREATION OF HADOOP CLUSTER
Hadoop File-System(HDFS): HDFS is a distributed filesystem used by Hadoop to allow multiple files to be stored simultaneously and with high speed in the Hadoop cluster.
Name/Master Node: Name node is the main centerpiece in HDFS, which manages the metadata of the data stored in the cluster. It doesn’t store the data, but it stores all the data nodes’ data. The name node is also a single point of failure in the Hadoop filesystem.
Data/Slave Node: Data node stores the Hadoop filesystem data and is constantly connecting with the name node and receiving and giving data when required. All data nodes in the cluster store replica in them so that if one is down, the filesystem doesn’t fail.
Problem Statement:
In a Hadoop cluster, find how to contribute a limited/specific amount of storage as a slave to the cluster?
Prerequisite:
- Hadoop Cluster configured.
Steps to follow
- Add an additional volume to our datanode.
- Make Partition.
- Format and Mount.
How to add additional volume
If you are working on AWS, you have EBS as the service where you can create an extra volume and attach that to your instance. For CLI you can take help from this link.
In my case, I have RHEL8 virtualized over virtualbox and to add extra hard disk I open my virtualbox and select the virtual machine where to add the hard disk.
Click on setting and then on storage.
Simply click on add hard disk icon and follow the process according to these screenshots.
Once the hard disk is attached come to the next step
Make partition
Once a hard disk is attached in order to use that we have to make partition in that and therefore I boot up my virtual machine and to check if the hard disk is successfully connected or not use below command
fdisk -l
You will get the list of all the hard disk now copy the hard disk name in my case it is /dev/sdb.
fdisk /dev/sdb
Now you are inside the hard disk, for the new partition n is the command. Use p for primary partition. Partition number is 1. And now starting and ending sector are required.
You can also provide the size using +, and in my case I used +1G. This will provide me 1GB of partition to use. At the end w to save the partition.
Format and Mount
To format the partition I used below command
mkfs.ext4 /dev/sdb1
In my case, I am sharing my /dn1 directory to the hadoop cluster and therefore at the time of mounting I am going to use that directory.
mount /dev/sdb1 /dn1
Hurrah! we have done our practical part and to check if it works or not run
hadoop dfsadmin -report
Here you can clearly check that our hadoop cluster has around 1GB of space.