BIG DATA

Gursimar Singh
15 min readNov 11, 2020

--

How big MNCs store, manage and manipulate thousands of terabytes of data with high speed and efficiency on daily basis?

Big Data has gained much attention from the academia and the IT industry. In the digital and computing world, information is generated and collected at a rate that rapidly exceeds the boundary range.

What is Big Data? Why it is in so demand?

Over past 20 years, data stream processing has been the one of the important research area. Numerous technological innovation are driving the dramatic increase in data and data gathering. The term big data is tossed around in the business and tech world pretty frequently.

Well, Bigdata is not a technique or any kind of software. It’s a problem which every big industry is facing and it occurs due to the huge amount of data indifferent forms have to be handled, processed and analyzed properly.

The concept of big data gained momentum in the early 2000’s when industry analyst Doug Laney articulated the now-mainstream definition of big data as the three V’s:

  1. Volume: Organizations collect data from a variety of sources, including business transactions, smart (IoT) devices, industrial equipment, videos, social media and more. In the past, storing it would have been a problem — but cheaper storage on platforms like data lakes and Hadoop have eased the burden. Size of data plays very crucial role in determining value out of data.
  2. Velocity: With the growth in the Internet of Things, data streams in to businesses at an unprecedented speed and must be handled in a timely manner. RFID tags, sensors and smart meters are driving the need to deal with these torrents of data in near-real time. Velocity refers to the speed of generation of data.
  3. Variety: Data comes in all types of formats — from structured, numeric data in traditional databases to unstructured text documents, emails, videos, audios, stock ticker data and financial transactions.

WHERE IS BIG DATA USED?

  • Big data is being used in industries that have high volume of unstructured data
    Facebook, Amazon, Microsoft, IBM all big companies are using Big Data
  • It’s can also be used in smaller companies as the software is open source and can be installed on commodity hardware as well

WHEN IS BIG DATA USED?

  • When there is high volume of unstructured data then big data is being used is almost every case in the world
  • Also, when there is large amounts of structured or semi-structured data then big data helps derive insights with analytics models so there also big data is being used
  • Big data also helps in structuring of data and getting the answers through queries so even in querying data, big data is being used.

WHO IS USING/USES BIG DATA?

  • All the industry segments from social media to health services are using it
  • Hospitality / Hotel / Travel — applications and websites are using to understand the customer needs and put their pricing models and travel packages accordingly
  • Health Industry — from predicting ailments to medication, for making health kits and health insurance packages and provide necessary health care, health industry is using big data
  • Retail business like amazon, Walmart and many FMCG companies are using big data to understand customer behavior and build suitable offers for the customers to increase their sales
  • Banking and Financial Serves — understanding patterns of customer and their transactions and provide loans/credit cards. For predicting fraud transactions and avoid them in real time
  • Government — Even with Aadhaar and now a huge database on population, one can understand that government also is using big data to do census calculation, provide subsidies etc.. and plan for government schemes using big data

Some of the Big Data Tools Used by Industry:

  1. Apache Hadoop
  2. Apache Spark
  3. Flink
  4. Apache Storm
  5. Apache Cassandra
  6. MongoDB
  7. Kafka
  8. Tableau

How big MNC’s like Google, Facebook, Instagram etc stores, manages and manipulate Thousands of Terabytes of data with High Speed and High Efficiency.

Developing Big Data applications has become increasingly important in the last few years. In fact, several organizations from different sectors depend increasingly on knowledge extracted from huge volumes of data. However, in Big Data context, traditional data techniques and platforms are less efficient. They show a slow responsiveness and lack of scalability, performance and accuracy. To face the complex Big Data challenges, much work has been carried out. As a result, various types of distributions and technologies have been developed.

Q)How much data is stored by various big Tech companies in a day?

1. GOOGLE:-

A data center normally holds petabytes to exabytes of data. Google currently processes over 20 petabytes of data per day through an average of 100,000 Map Reduce jobs spread across its massive computing clusters.
How much data does google handle??
This is one of those kind of questions whose answer can never be accurate. On a funnier note, it is like a child asking who come first hen or egg?? which is somewhat similar to asking “how much data does google handle??”
Commonly a PC holds 1TB of storage data and a smartphone holds about 64GB, but as days pass there are newer PCs and smartphones with bigger storage than this. We all know Google is the only one who can answer any kind of question!! We simply conclude that Google knows everything!! And Everything means Everything! Now you must be wondering how much data does google handle to answer all these questions!!??
Google now processes over 40,000 search queries every second on average, which translates to over 3.5 billion searches per day and 1.2 trillion searches per year worldwide.

2. YOUTUBE:-

We exist in a content hungry society. Every second person is creating and posting videos online, whether it’s for fun or for profit, and we are all devouring that content. Over 1 billion hours of YouTube is watched globally per day.

Whether you’re watching because your favorite vlogger dropped a new video, or you’re stuck on the side of the road learning how to change a tire — those videos are costing you data. Read on to find out just how much data YouTube uses.

How much data YouTube will use depends on the quality of your video playback. Watching a YouTube video at the standard 480p uses around 260MB per hour, while Full HD viewing can chew through 1.65GB. 4K video playback on YouTube will use as much as 2.7GB of data every hour.
That means that we as a global community use around 440,000 Terabytes of data on YouTube every day. And that doesn’t even include uploading videos.

3.FACEBOOK:-

Facebook revealed some big, big stats on big data to a few reporters at its HQ today, including that its system processes 2.5 billion pieces of content and 500+ terabytes of data each day. It’s pulling in 2.7 billion Like actions and 300 million photos per day, and it scans roughly 105 terabytes of data each half hour. Plus it gave the first details on its new “Project Prism”.

VP of Engineering Jay Parikh explained why this is so important to Facebook: “Big data really is about having insights and making an impact on your business. If you aren’t taking advantage of the data you’re collecting, then you just have a pile of data, you don’t have big data.” By processing data within minutes, Facebook can rollout out new products, understand user reactions, and modify designs in near real-time.

4. FLIPKART:-

Flipkart gets 10 terabytes of user data each day from browsing, searching, buying or not buying, as well as behavior and location. This jumps to 50 terabytes on Big Billion Day sales days. There’s also order data, shipping data, and other forms of data captured by different systems.

Here we have a list of industry specific big data challenges and how the challenges are overcome:

1. Banking and Securities

Industry-specific Big Data Challenges

A study of 16 projects in 10 top investment and retail banks shows that the challenges in this industry include: securities fraud early warning, tick analytics, card fraud detection, archival of audit trails, enterprise credit risk reporting, trade visibility, customer data transformation, social analytics for trading, IT operations analytics, and IT policy compliance analytics, among others.

Applications of Big Data in the Banking and Securities Industry

The Securities Exchange Commission (SEC) is using Big Data to monitor financial market activity. They are currently using network analytics and natural language processors to catch illegal trading activity in the financial markets.

Retail traders, Big banks, hedge funds, and other so-called ‘big boys’ in the financial markets use Big Data for trade analytics used in high-frequency trading, pre-trade decision-support analytics, sentiment measurement, Predictive Analytics, etc.

This industry also heavily relies on Big Data for risk analytics, including; anti-money laundering, demand enterprise risk management, “Know Your Customer,” and fraud mitigation.

Big Data providers are specific to this industry includes 1010data, Panopticon Software, Streambase Systems, Nice Actimize, and Quartet FS.

2. Communications, Media and Entertainment

Industry-specific Big Data Challenges

Since consumers expect rich media on-demand in different formats and a variety of devices, some Big Data challenges in the communications, media, and entertainment industry include:

  • Collecting, analyzing, and utilizing consumer insights
  • Leveraging mobile and social media content
  • Understanding patterns of real-time, media content usage

Applications of Big Data in the Communications, Media and Entertainment Industry

Organizations in this industry simultaneously analyze customer data along with behavioral data to create detailed customer profiles that can be used to:

  • Create content for different target audiences
  • Recommend content on demand
  • Measure content performance

A case in point is the Wimbeldon Championship (YouTube Video) that leverages Big Data to deliver detailed sentiment analysis on the tennis matches to TV, mobile, and web users in real-time.

Spotify, an on-demand music service, uses Hadoop Big Data analytics, to collect data from its millions of users worldwide and then uses the analyzed data to give informed music recommendations to individual users.

Amazon Prime, which is driven to provide a great customer experience by offering video, music, and Kindle books in a one-stop-shop, also heavily utilizes Big Data.

Big Data Providers in this industry include Infochimps, Splunk, Pervasive Software, and Visible Measures.

3. Healthcare Providers

Industry-specific Big Data Challenges

The healthcare sector has access to huge amounts of data but has been plagued by failures in utilizing the data to curb the cost of rising healthcare and by inefficient systems that stifle faster and better healthcare benefits across the board.

This is mainly because electronic data is unavailable, inadequate, or unusable. Additionally, the healthcare databases that hold health-related information have made it difficult to link data that can show patterns useful in the medical field.

Other challenges related to Big Data include the exclusion of patients from the decision-making process and the use of data from different readily available sensors.

Applications of Big Data in the Healthcare Sector

Some hospitals, like Beth Israel, are using data collected from a cell phone app, from millions of patients, to allow doctors to use evidence-based medicine as opposed to administering several medical/lab tests to all patients who go to the hospital. A battery of tests can be efficient, but it can also be expensive and usually ineffective.

Free public health data and Google Maps have been used by the University of Florida to create visual data that allows for faster identification and efficient analysis of healthcare information, used in tracking the spread of chronic disease.

Obamacare has also utilized Big Data in a variety of ways.

Big Data Providers in this industry include Recombinant Data, Humedica, Explorys, and Cerner.

4. Education

EducationIndustry-specific Big Data Challenges

From a technical point of view, a significant challenge in the education industry is to incorporate Big Data from different sources and vendors and to utilize it on platforms that were not designed for the varying data.

From a practical point of view, staff and institutions have to learn new data management and analysis tools.

On the technical side, there are challenges to integrating data from different sources on different platforms and from different vendors that were not designed to work with one another.

Politically, issues of privacy and personal data protection associated with Big Data used for educational purposes is a challenge.

Applications of Big Data in Education

Big data is used quite significantly in higher education. For example, The University of Tasmania. An Australian university with over 26000 students has deployed a Learning and Management System that tracks, among other things, when a student logs onto the system, how much time is spent on different pages in the system, as well as the overall progress of a student over time.

In a different use case of the use of Big Data in education, it is also used to measure teacher’s effectiveness to ensure a pleasant experience for both students and teachers. Teacher’s performance can be fine-tuned and measured against student numbers, subject matter, student demographics, student aspirations, behavioral classification, and several other variables.

On a governmental level, the Office of Educational Technology in the U. S. Department of Education is using Big Data to develop analytics to help correct course students who are going astray while using online Big data courses. Click patterns are also being used to detect boredom.

Big Data Providers in this industry include Knewton and Carnegie Learning and MyFit/Naviance.

5. Manufacturing and Natural Resources

Industry-specific Big Data Challenges

Increasing demand for natural resources, including oil, agricultural products, minerals, gas, metals, and so on, has led to an increase in the volume, complexity, and velocity of data that is a challenge to handle.

Similarly, large volumes of data from the manufacturing industry are untapped. The underutilization of this information prevents the improved quality of products, energy efficiency, reliability, and better profit margins.

Applications of Big Data in Manufacturing and Natural Resources

In the natural resources industry, Big Data allows for predictive modeling to support decision making that has been utilized for ingesting and integrating large amounts of data from geospatial data, graphical data, text, and temporal data. Areas of interest where this has been used include; seismic interpretation and reservoir characterization.

Big data has also been used in solving today’s manufacturing challenges and to gain a competitive advantage, among other benefits.

In the graphic below, a study by Deloitte shows the use of supply chain capabilities from Big Data currently in use and their expected use in the future.

6. Government

Industry-specific Big Data Challenges

In governments, the most significant challenges are the integration and interoperability of Big Data across different government departments and affiliated organizations.

Applications of Big Data in Government

In public services, Big Data has an extensive range of applications, including energy exploration, financial market analysis, fraud detection, health-related research, and environmental protection.

Some more specific examples are as follows:

  • Big data is being used in the analysis of large amounts of social disability claims made to the Social Security Administration (SSA) that arrive in the form of unstructured data. The analytics are used to process medical information rapidly and efficiently for faster decision making and to detect suspicious or fraudulent claims.
  • The Food and Drug Administration (FDA) is using Big Data to detect and study patterns of food-related illnesses and diseases. This allows for a faster response, which has led to more rapid treatment and less death.
  • The Department of Homeland Security uses Big Data for several different use cases. Big data is analyzed from various government agencies and is used to protect the country.

Big Data Providers in this industry include Digital Reasoning, Socrata, and HP.

7. Insurance

Industry-specific Big Data Challenges

Lack of personalized services, lack of personalized pricing, and the lack of targeted services to new segments and specific market segments are some of the main challenges.

In a survey conducted by Marketforce challenges identified by professionals in the insurance industry include underutilization of data gathered by loss adjusters and a hunger for better insight.

Applications of Big Data in the Insurance Industry

Big data has been used in the industry to provide customer insights for transparent and simpler products, by analyzing and predicting customer behavior through data derived from social media, GPS-enabled devices, and CCTV footage. The Big Data also allows for better customer retention from insurance companies.

When it comes to claims management, predictive analytics from Big Data has been used to offer faster service since massive amounts of data can be analyzed mainly in the underwriting stage. Fraud detection has also been enhanced.

Through massive data from digital channels and social media, real-time monitoring of claims throughout the claims cycle has been used to provide insights.

Big Data Providers in this industry include Sprint, Qualcomm, Octo Telematics, The Climate Corp.

8. Retail and Wholesale trade

Industry-specific Big Data Challenges

From traditional brick and mortar retailers and wholesalers to current day e-commerce traders, the industry has gathered a lot of data over time. This data, derived from customer loyalty cards, POS scanners, RFID, etc. are not being used enough to improve customer experiences on the whole. Any changes and improvements made have been quite slow.

Applications of Big Data in the Retail and Wholesale Industry

Big data from customer loyalty data, POS, store inventory, local demographics data continues to be gathered by retail and wholesale stores.

In New York’s Big Show retail trade conference in 2014, companies like Microsoft, Cisco, and IBM pitched the need for the retail industry to utilize Big Data for analytics and other uses, including:

  • Optimized staffing through data from shopping patterns, local events, and so on
  • Reduced fraud
  • Timely analysis of inventory

Social media use also has a lot of potential use and continues to be slowly but surely adopted, especially by brick and mortar stores. Social media is used for customer prospecting, customer retention, promotion of products, and more.

Big Data Providers in this industry include First Retail, First Insight, Fujitsu, Infor, Epicor, and Vistex.

9. Transportation

Industry-specific Big Data Challenges

In recent times, huge amounts of data from location-based social networks and high-speed data from telecoms have affected travel behavior. Regrettably, research to understand travel behavior has not progressed as quickly.

In most places, transport demand models are still based on poorly understood new social media structures.

Applications of Big Data in the Transportation Industry

Some applications of Big Data by governments, private organizations, and individuals include:

  • Governments use of Big Data: traffic control, route planning, intelligent transport systems, congestion management (by predicting traffic conditions)
  • Private-sector use of Big Data in transport: revenue management, technological enhancements, logistics and for competitive advantage (by consolidating shipments and optimizing freight movement)
  • Individual use of Big Data includes route planning to save on fuel and time, for travel arrangements in tourism, etc.

10. Energy and Utilities

Industry-specific Big Data Challenges

The image below shows some of the main challenges in the energy and utility industry.

Applications of Big Data in the Energy and Utility Industry

Smart meter readers allow data to be collected almost every 15 minutes as opposed to once a day with the old meter readers. This granular data is being used to analyze the consumption of utilities better, which allows for improved customer feedback and better control of utilities use.

In utility companies, the use of Big Data also allows for better asset and workforce management, which is useful for recognizing errors and correcting them as soon as possible before complete failure is experienced.

Big Data Providers in this industry include Alstom Siemens ABB and Cloudera.

--

--

Gursimar Singh
Gursimar Singh

Written by Gursimar Singh

Google Developers Educator | Speaker | Consultant | Author @ freeCodeCamp | DevOps | Cloud Computing | Data Science and more

No responses yet