Saturday, 17 September 2016

Getting Started with Hbase

As  you all know  about Bigdata and its  framework  hadoop

Limitations of hadoop :

hadoop can perform  only batch processing and data will be access in sequential
manner  That means one has to search the entire dataset even for the simplest of jobs.

Now  lets talk about solution :


Hadoop random access  database

Applications such as HBase, Cassandra, couchDB, Dynamo, and MongoDB are some of the databases that store huge amounts of data and access the data in a random manner. 


Hbase :

HBase is a distributed column-oriented database built on top of the Hadoop file system. It is an open-source project and is horizontally scalable.


Column Oriented and Row Oriented
Column-oriented databases are those that store data tables as sections of columns of data, rather than as rows of data. Shortly, they will have column families.

Row-Oriented Database     Column-Oriented Database
It is suitable for Online Transaction Process (OLTP).     It is suitable for Online Analytical Processing (OLAP).
Such databases are designed for small number of rows and columns.     Column-oriented databases are designed for huge tables.

The following image shows column families in a column-oriented database:
Table


HBase and RDBMS
HBase     RDBMS
HBase is schema-less, it doesn't have the concept of fixed columns schema; defines only column families.     An RDBMS is governed by its schema, which describes the whole structure of tables.
It is built for wide tables. HBase is horizontally scalable.     It is thin and built for small tables. Hard to scale.
No transactions are there in HBase.     RDBMS is transactional.
It has de-normalized data.     It will have normalized data.
It is good for semi-structured as well as structured data.     It is good for structured data.

Hbase  Architecture :-


HBase has three major components: the client library, a master server, and region servers. Region servers can be added or removed as per requirement.

MasterServer:

The master server - 

1. Assigns regions to the region servers and takes the help of Apache ZooKeeper for this task.

2. Handles load balancing of the regions across region servers. It unloads the busy servers and shifts the regions to less occupied servers.

3.  Maintains the state of the cluster by negotiating the load balancing.

 4. Is responsible for schema changes and other metadata operations such as creation of tables and column families.

Regions

Regions are nothing but tables that are split up and spread across the region servers.

Region server

The region servers have regions that -

   Communicate with the client and handle data-related operations.
   Handle read and write requests for all the regions under it.
   Decide the size of the region by following the region size thresholds.

Zookeeper

    Zookeeper is an open-source project that provides services like maintaining    configuration information, naming, providing distributed synchronization, etc.

    Zookeeper has ephemeral nodes representing different region servers. Master servers use these nodes to discover available servers.

    In addition to availability, the nodes are also used to track server failures or network partitions.

    Clients communicate with region servers via zookeeper.

    In pseudo and standalone modes, HBase itself will take care of zookeeper.


@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
 
Note:  Hbase  can be configured  in three mode like  hadoop

1.)  Standalone  Mode 
2.)  Pseudo Distributed  Mode
3.)   Fully Distributed Mode

If anyone want to do Bigdata Hadoop Training. Please visit on - http://www.bigdatahadoop.info/

No comments:

Post a Comment