Gigantic amount of data — Most critical problem
Introduction
Big data burst upon the scene in the first decade of the 21st century, and the first organizations to embrace it were online and startup firms. Arguably, firms like Google, eBay, LinkedIn, and Facebook were built around big data from the beginning. They didn’t have to reconcile or integrate big data with more traditional sources of data and the analytics performed upon them, because they didn’t have those traditional forms. They didn’t have to merge big data technologies with their traditional IT infrastructures because those infrastructures didn’t exist. Big data could stand alone, big data analytics could be the only focus of analytics, and big data technology architectures could be the only architecture.
HOW?
Big data may be new for startups and for online firms, but many large firms view it as something they have been wrestling with for a while. Some managers appreciate the innovative nature of big data, but more find it “business as usual” or part of a continuing evolution toward more data.
When these managers in large firms are impressed by big data, it’s not the “bigness” that impresses them. Instead it’s one of three other aspects of big data: the lack of structure, the opportunities presented, and low cost of the technologies involved.
It’s About Variety, not Volume: The survey indicates companies are focused on the variety of data, not its volume, both today and in three years. The most important goal and potential reward of Big Data initiatives is the ability to analyze diverse data sources and new data types, not managing very large data sets.
Companies like Google,Yahoo, Facebook, Microsoft and others started to work upon big data way back in 2003 and these days all the data related issues are being solved using Hadoop and big data insights.
Earlier all companies were using RDBMS to store their data in which we can read once and write n number of times which is not applicable to read huge amount of data. So Doug Cutting come-up with a solution as Hadoop which is based to work upon huge data sets in distributed and parallel fashion. In Hadoop we can write once and read N number of times HDFS(Hadoop Distributed File System) is used to store data into cluster(Group of nodes) and we can process the data accordingly to the need of the company using Map-Reduce. Sorting,Filtering,Partitioning and bucketing (if needed) of data is done to compute the result. This is the modern and precise way of companies to use their data.
In more simpler terms, we store such huge amount of data using a Distributed system which means, we have one master node and one or more slave nodes.
Master node here refers to the node which handles the data, where as slave node refers to the one which stores the data behind the scene for the master node.