Monday, 6 March 2017

Summer Internship – Why, Where and How?



Summer – the best time for internship:
Summer is the most suitable time for internship because this is the time when the students have some hours in their hands. Instead of wasting the vacations in useless activities, it is ideal to join an internship and utilise the vacations in most profitable way. Also, most of the companies and institutes have special courses and internship programs for summers.

Which internship program to join?
There is handful of options when it comes to select the ideal internship program. Internship is something which remains void of the school/college curriculum and students have the choice to select a program of their wish and their taste. One thing which you have to keep in mind while selecting any internship program is, the program must be something which enhances your overall skills and also helps you achieving your career goals. For instance, you may join a painting course but it will not help you in your career, until or unless you are actually serious about making a career in painting.

There are innumerable schools, companies and institutes in Jaipur that offer summer internship. Some of these institutes charge a limited fee for the program while some companies allow you to pursue your internship for free. To find the best organisation, you may need to make some market study. Ask your friends, roam around the streets or explore the internet. You can easily find many such organisations.

Create a profile to get hired:
Once you have completed your summer training in Jaipur, don’t forget to include it in your profile and CV. This is the reason I stated that the internship must be career oriented. If you have pursued the internship in a course which is demanded in the market, it may give you an extra edge over others. Companies always look for individuals who have practical knowledge of work. Summer internship gives you that experience and confidence.

The bottom line is, summer internship is an essential part of the career of every student. It may not give you extra marks in your mark sheet but it will surely give you some extra marks in your job interview. Pick the course and LinuxWorld Informatics Pvt. Ltd for summer internship and learn as much as you can.

Sunday, 5 February 2017

Big Data Hadoop Summer Training


Hadoop is an open-source framework that allows to store and process Big Data in a distributed environment across clusters of computer using simple programming models. Core components of Hadoop are HDFS and MapReduce. HDFS is basically used to store large data sets and MapReduce is used to process such large data sets.

Hadoop can be run in one of three modes:
  • 1        Standalone (or local) mode
  • 2        Pseudo-distributed mode
  • 3       Fully distributed mode
    A lot of companies are using Big Data Hadoop technology such as Cloudera, EMC, MapR, Hortonworks, Amazon, Facebook, eBay, Twitter, Google and so on, to capture, store, process, retrieve and analyze huge data base. Facebook is generating 500+ terabytes of data per day, NYSE (New York Stock Exchange) generates about 1 terabyte of new trade data per day, and a jet airline collects 10 terabytes of censor data for every 30 minutes of flying time. All these are day to day examples of Big Data Hadoop.


According to IBM, the three characteristics of Big Data are: Volume: Facebook generating 500+ terabytes of data per day. Velocity: Analyzing 2 million records each day to identify the reason for losses. Variety: images, audio, video, sensor data, log files, etc.

 
Advantage of Big data Hadoop Technology:
  • Capable to store enormous data of any kind of data.
  • Highly Scalable
  • Fault tolerance
  • flexible
  • Cost Effective
  • Better Operational Efficiency
LinuxWorld Informatics Pvt. Ltd. is starting Summer Training on Big Data Hadoop for all computer science students. Big Data Hadoop certification is good option for both fresher and experienced to get awesome career prospects. We provides Big Data Hadoop certification training course in Jaipur. The main advantage of Training program for C.S.E. trainee, they learn and work on live project in steerage of senior developers.




Wednesday, 1 February 2017

Summer Internship 2017 on Big Data Hadoop



Big Data Hadoop is nothing but an assortment of such a huge and complex data that it becomes very tedious to capture, store, process, retrieve and analyze it with the help of on-hand database management tools or traditional data processing techniques.

Hadoop framework is written in java. It is designed to solve problems that involve analyzing large data (e.g. petabytes). The programming model is based on Google’s MapReduce. The infrastructure is based on Google’s Big Data and Distributed File System. Hadoop handles large files/data throughput and support data intensive distributed applications. Hadoop is scalable as more nodes can be easily added to it.

Career Opportunities in Big Data Hadoop:

  Hadoop Developer - A Hadoop Developer is responsible for coding and developing of all Hadoop-related applications. He / She possess knowledge of Core Java, Databases and Scripting Languages.

  Hadoop Architect - A Hadoop Architect is in-charge of the complete planning and designing o big data system architectures. Such professionals handle the development of Hadoop application, along with their deployment.

  Hadoop Tester - The role of a Hadoop Tester is to create a number of scenarios and gauge the effectiveness of the application and look for any bugs that might cause a hindrance in the proper functioning of the application.

  Data Scientist - A Data Scientist possesses technical skills of a software programmer and analytical mind of an applied scientist, which help him to analyze humongous quantity of data and make intelligent.

  Hadoop Administrator - A Hadoop Administrator nothing but a system Administrator in the world of Hadoop. Responsibilities of a Hadoop Administrator include maintenance, back-up, recovery and the setting up of Hadoop clusters as well.


Hadoop is a Fruitful career – The plain fact is that Hadoop training opens up a number of career opportunities for software professionals which can act as a good platform to start.

LinuxWorld Informatics Pvt. Ltd. has the certified by ISO 9001:2008 and also research & development organization, we have also experience software developer trainer whom also associates with industry also, because only they know the real requirement of industry. We also have well equipped labs and assisting staff. We offer Summer Training for all Computer Science and I.T. students.










Monday, 26 December 2016

What is HDFS

Hadoop comes with a distributed file system called HDFS. In HDFS data is distributed over several machines and replicated to ensure their durability to failure and high availability to parallel application.
It is cost effective as it uses commodity hardware. It involves the concept of blocks, data nodes and node name.

Where to use HDFS

  • Very Large Files: Files should be of hundreds of megabytes, gigabytes or more.
  • Streaming Data Access: The time to read whole data set is more important than latency in reading the first. HDFS is built on write-once and read-many-times pattern.
  • Commodity Hardware:It works on low cost hardware.
To know more about BigData Hadoop Training

Where not to use HDFS

  • Low Latency data access: Applications that require very less time to access the first data should not use HDFS as it is giving importance to whole data rather than time to fetch the first record.
  • Lots Of Small Files:The name node contains the metadata of files in memory and if the files are small in size it takes a lot of memory for name node's memory which is not feasible.
  • Multiple Writes:It should not be used when we have to write multiple times.

HDFS Concepts

  1. Blocks: A Block is the minimum amount of data that it can read or write.HDFS blocks are 128 MB by default and this is configurable.Files n HDFS are broken into block-sized chunks,which are stored as independent units.Unlike a file system, if the file is in HDFS is smaller than block size, then it does not occupy full block?s size, i.e. 5 MB of file stored in HDFS of block size 128 MB takes 5MB of space only.The HDFS block size is large just to minimize the cost of seek.
  2. Name Node: HDFS works in master-worker pattern where the name node acts as master.Name Node is controller and manager of HDFS as it knows the status and the metadata of all the files in HDFS; the metadata information being file permission, names and location of each block.The metadata are small, so it is stored in the memory of name node,allowing faster access to data. Moreover the HDFS cluster is accessed by multiple clients concurrently,so all this information is handled bya single machine. The file system operations like opening, closing, renaming etc. are executed by it.
  3. Data Node: They store and retrieve blocks when they are told to; by client or name node. They report back to name node periodically, with list of blocks that they are storing. The data node being a commodity hardware also does the work of block creation, deletion and replication as stated by the name node.
HDFS DataNode and NameNode Image:
HDFS DataNode NameNode
HDFS Read Image:
HDFS Read
HDFS Write Image:
HDFS Write
Since all the metadata is stored in name node, it is very important. If it fails the file system can not be used as there would be no way of knowing how to reconstruct the files from blocks present in data node. To overcome this, the concept of secondary name node arises.

Secondary Name Node: It is a separate physical machine which acts as a helper of name node. It performs periodic check points.It communicates with the name node and take snapshot of meta data which helps minimize downtime and loss of data.

Friday, 11 November 2016

When to use Pig Latin versus Hive SQL?

Image credit: gerard79 at stock.xchng


Once your big data is loaded into Hadoop, what’s the best way to use that data?  You’ll need some way to filter and aggregate the data, and then apply the results for something useful.  Collecting terabytes and petabytes of web traffic data is not useful until you have a way to extract meaningful data insights out of it.

That’s where MapReduce comes in.  MapReduce permits you to filter and aggregate data from HDFS so that you can gain insights from the big data.  However, writing MapReduce code with basic Java may require you to write many lines of code laboriously, with additional time needed for code review and QA.

So instead of writing plain Java code to use MapReduce, you now have the options of using either the Pig Latin or Hive SQL languages to construct MapReduce programs.  (There’s also another option to use the Hadoop Streaming protocol with STDIN and STDOUT with any language such as Python or even BASH shell script, but we’ll explore that option more on another day.)  The benefit is that you only need to write much fewer lines of code, thus reducing overall development and testing time.  The rule of thumb is that writing Pig scripts takes 5% of the time compared to writing MapReduce programs in Java, while reducing runtime performance by only 50%.  Although Pig and Hive scripts generally don’t run as fast as native Java MapReduce programs, they are vastly superior in boosting productivity for data engineers and analysts.

When should you use Pig Latin and when should you use Hive?

Depending on where you work, you may need to simply use whatever standards your company has established.

For example, Hive is commonly used at Facebook for analytical purposes.  Facebook promotes the Hive language and their employees frequently speak about Hive at Big Data and Hadoop conferences.
However, Yahoo! is a big advocate for Pig Latin.  Yahoo! has one of the biggest Hadoop clusters in the world.  Their data engineers use Pig for data processing on their Hadoop clusters.

Alternatively, you may have a choice of Pig or Hive at your organization, especially if no standards have yet been established, or perhaps multiple standards have been set up.

If you know SQL, then Hive will be very familiar to you.  Since Hive uses SQL, you will feel at home with all the familiar select, where, group by, and order by clauses similar to SQL for relational databases.  You do, however, lose some ability to optimize the query, by relying on the Hive optimizer.  This seems to be the case for any implementation of SQL on any platform, Hadoop or traditional RDBMS, where hints are sometimes ironically needed to teach the automatic optimizer how to optimize properly.

However, compared to Hive, Pig needs some mental adjustment for SQL users to learn.  Pig Latin has many of the usual data processing concepts that SQL has, such as filtering, selecting, grouping, and ordering, but the syntax is a little different from SQL (particularly the group by and flatten statements!).  Pig requires more verbose coding, although it’s still a fraction of what straight Java MapReduce programs require.  Pig also gives you more control and optimization over the flow of the data than Hive does.

Personally, I use both Pig Latin and Hive, although for different purposes.  I learned Pig Latin first, and have used it to construct dataflows, where I typically have a scheduled job to periodically crunch the massive data from HDFS and to transfer the summarized data into a relational database for reporting, dashboarding, and ad-hoc analyses.  I also use Hive for some simple ad-hoc analytical queries into the data in HDFS, as Hive queries are a lot faster to write for those types of queries.  However, I don’t use Hive for the automated batch jobs that move data between HDFS and other systems.  I find that I can tune the dataflow process better using Pig than with Hive.  Additionally, some of the datasets that I need in Hadoop have not yet been structured with metadata schemas for use with Hive.  In those cases, Pig is much more flexible in reading those datasets than Hive is.

Hadoop expert Alan Gates has an excellent write-up comparing the differences between Pig Latin and Hive and when to use each of them.

If you are a data engineer, then you’ll likely feel like you’ll have better control over the dataflow (ETL) processes when you use Pig Latin, especially if you come from a procedural language background.  If you are a data analyst, however, you will likely find that you can ramp up on Hadoop faster by using Hive, especially if your previous experience was more with SQL than with a procedural programming language.  If you really want to become a Hadoop expert, then you should learn both Pig Latin and Hive for the ultimate flexibility.