Monday, 15 August 2016

BIG DATA IMPORTANT POINTS

What is HDFS?
The Hadoop Distributed File System (HDFS) is the primary storage system used by Hadoop applications.
HDFS is based on GOOGLE's own filesystem which is the GFS.

What is a block and block scanner in HDFS?
Block - A block is the amount of data in which the HDFS strips the file and stores it into HDFS. In hadoop v1 the default block size is 64 MB i.e. if we store a file of size 640 MB in HDFS it will be stripped into 10 parts each 64 MB and then stored into the HDFS.

Block Scanner - Block Scanner tracks the list of blocks present on a DataNode.

 What is the port number for NameNode and Job Tracker?
NameNode 50070
Job Tracker 50030
WHAT IS METADATA?
Metadata is the most important part of any Hadoop Cluster. The Metadata is stored on the name node and it contains the information of all the datanodes and the whereabouts of all the data stored in each and every datanode. If Metadata is lost , the cluster is compromised.
WHAT IS THE SINGLE POINT OF FAILURE IN APACHE HADOOP?
The NAMENODE is the single point of failure in hadoop because if the namenode is lost then all the metadata and CLUSTER ID will be deleted and then there is no way we will get the data back therefore there are measures such as checkpointing and secondary namenode which we can use in order to keep the data intact.