Now-a-days we
are drowning in data but lacking of knowledge. Data means distinct pieces of information.
Besides this today’s world is technology based. There is passing a huge amount
of data through electronic medium. For example “Facebook” processes more than 500 terabytes of data daily and “Google” processes more than 20 petabytes of data a day (1 PB= 1024 TB).

There are different types of data. To understand “Big data” we need to understand them.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!

order now

Structured data: The raw data that resides in a fixed field like a
record or file are structured data.

 For example:
Relational data.

Semi Structured data: The data is formatted in some way is called semi
structured data.

For example: XML data.

Unstructured data: The data that is not formatted any
way is called unstructured data.

For example: Audio files, Videos, Photos, Sensor
data, Web data, Mobile data, GPS data.

Multi-structured data: The data that is derived from interactions between people
and machine are multi-structured data. It is the combination of the above three
types of data.

For example: web applications, social networks.

Why Big data?

The term Big Data was first uttered in 1950s. The concept of
big data has been around for years. Everywhere you go today, people are looking down at an
electronic device. They are online, browsing, collaborating, shopping for goods
and services and transacting business. These devices are also being used
extensively in business-to-business transactions. Organizations now understand
that if they capture all the data that streams into their businesses, they can
apply analytics and get significant value from it very rapidly. Because of this
thinking Big data concept has been come.

Big Data: Big data deals with the collection of large amount of data
from traditional and digital sources with a view to find out information and
analysis. The data can be both structured and unstructured that big data deals
with. Big data is in everywhere-

1.  Web data,

2.  at department/grocery

3. Bank/Credit Card transactions

4. Social Network

In another way we can say, Big Data is a technique to solve
data problems that are not solvable using traditional databases and tools.

It is a Technique to “Store, Process, Manage, Analysis and
Report” a huge amount of variety data at the required speed and within the
required time to allow Real-time Analysis and Reaction.


Uses: Big data is usable
in various sectors. Some of them are given below:

Aggregation and

warehouse and OLAP

2. Indexing, Searching, and Querying

based search


3. Knowledge discovery



Cost reduction: By identifying more efficient ways of doing business
and storing large amount of data it brings significant cost advantage.

Faster, better decision making: It has been quite easy to make a decision as all the
relevant data are stored along with.

New products and services: With the big data analytics, more companies are
creating new products to meet customers need.

There are three dimensions to big data. These are:





Volume: Volume means very large amount of data. Now that data is generated by machines, networks and human
interaction on systems, organizations collect data from a variety of sources including
business transactions, social media and information from sensor or
machine-to-machine data.

For example Google handle 15-20 EXABYTES of data (1024 PB= 1
EB), eBay has 6.5 PB of
user data + 50 TB/day, CERN’s Large Hydron Collider (LHC) generates 15 PB a
year. 2.7 Zetabytes of
data exist in the digital universe today. More than 5 billion people are calling,
texting, tweeting and browsing on mobile phones worldwide. YouTube users
upload 48 hours of new video every minute of the day.571 new websites are
created every minute of the day. Brands and organizations on Facebook receive
34,722 Likes every minute of the day.100 terabytes of data uploaded daily to
Facebook. According to Twitter’s own research in early 2012, it sees roughly
175 million tweets every day, and has more than 465 million accounts.

 IDC predicts that the
amount of data will be 20 ZB within 2020 and business transactions on the
internet- business-to-business and business-to-consumer will reach 450 billion
per day.

Velocity: Velocity means produce data at a very fast rate. The
electronic mediums have to handle a tsunami of data every day. For example
“Twitter” stores 250 million tweets a day.

Variety: Variety means different forms of data like structured,
numeric data, unstructured data, text documents, email, video, audio, stock
ticker data, financial transactions etc.

Data Types and sources: Big data involves with the data produced by different
devices and applications. Some examples are given below:

Box Data: It includes voices of
the flight crew, recordings of microphones and earphones and the performance
information of the aircraft.

Media Data: Social media’s
functions (Facebook, Twitter) such as posting, uploading photos, video, like,
comment etc.

Exchange Data: The information
about daily’s exchanges like ‘buy’, ‘sell’ of different companies made by the

Data: The information of
model, capacity, price, distance and availability of a vehicle.

Engine Data: Search engines
retrieve lots of data from different databases which include reverse searching,
keyword monitoring, search result and advertisement history, advertisement
spending statistics etc.

Sensor data: Analyzing data from smart technologies such as Global
Positioning System.

Relational Data :Tables, Transaction, Legacy Data.

Text Data:

Graph Data

Big Data Technologies

Big Data: It provides operational
capabilities for real time where data is primarily recorded and stored

Big Data: It includes massively
parallel processing database systems. It follows “MapReduce” algorithm.

Storage: Now-a-days Big data deals with very large unstructured data sets and is dependent on rapid
analytics, with answers provided in seconds. For examples of this are Facebook,
Google or Tweeter.

To design storage systems that can handle the requirements of big
data applications can
be divided into six parts. These are given below:

 Evaluating big data app requirements

 There are two general
categories. One is large-capacity applications that need to hold hundreds of
terabytes of data and second one is performance-intensive big data analytics

Big data
needs big protection


Making the
storage decision

To handle the requirements of associating with big data
architectures there are two ways. First one is object-based systems and second
one is scale-out file systems.

Gain new
insights with big data analytics

Big data analytics and data warehousing are both methods of
processing large amounts of data.

Extending the
uses for Hadoop in the data center

 IT administrators are
looking for ways to implement Hadoop systems for multiple applications.

around common Hadoop issues


Storage products: There are a multiple big data storage products on the
market. Some of them are given below:

1. Hitachi

Hitachi offers several routes to big data storage.

2. DDN

 DataDirect Networks
(DDN) has a collection of solutions concerning big data storage.

3. Spectra BlackPearl

Tools/ Framework:  Everyone
is processing Big Data
and it turns out that this processing can be abstracted to a degree that can be
dealt with by all sorts of Big Data processing frameworks. A few of these
frameworks are very well-known.


First up is the all-time classic, and one of the top frameworks
in use today. It’s an open-source
software framework for distributed storage of very large datasets on computer
clusters. Hadoop
processes often complex Big Data, and has a slew of tools that follow it around
like an entourage, Hadoop (and its underlying MapReduce) is actually quite


MongoDB is the modern, start-up approach to databases.
It’s good for managing data that changes frequently or data that is
unstructured or semi-structured.


Spark work in memory ,speeding up processing time


 Flink is aiming to provide facilities for distributed
computation over streams of data. Flink is effectively both a batch and real-time
processing framework. Flink provides a number of API, it also has its own
machine learning and graph processing libraries. 


 If one don’t know how to analyze and use big
data, it’s worthless. Teradata provide end-to-end solutions and services in
data warehousing, big data and analytics and marketing applications.

Security:  Big data security is
to keep out unauthorized users and intrusions with firewalls, strong user
authentication, end-user training, and intrusion protection systems and intrusion
detection systems. There are five pillars of security. These are:




Data Protection

Centralized Administration


Big Data Security Technologies

The scalability and the ability to secure multiple
types of data in different stages mainly differ the big data security tools.


  It aims to
secure data in transit and at-rest and they need to do it across massive data
volume.It operate different types of data, different analytical toolsets and
their output data.

Centralized Key

It includes policy-driven automation, logging, on-demand key
delivery and abstracting key management from key usage.

Access Control:

access control is the most basic network security tool. Multiple administrator
settings protect the big data platform against inside attack.

Detection and Prevention: 

Intrusion detection and prevention systems are security
workhorses. Big data’s value and distributed architecture lends itself to
intrusion attempts. IPS enables security admins to protect the big data
platform from intrusion.

Physical Security: 

 Physical security is
also important. It can be done with Video surveillance and security logs etc.

Big Data Security Tools:




HDFS Encryption

Wire Encryption


Big Data Security Companies

Now-a-days Big data owners are willing and able to spend
money to secure the valuable employments and vendors are responding. Some of representative big data security
companies name are given

Thales (Vormetric)






Data Recovery:

Big data is not mission-critical and mostly historical and
static. It’s applications tend to have massive data storage capacity. The
backup processes only need to run once on each snapshot. The most common backup
methods are given below:

Data replication: Data replication is a common backup approach. Data
are loaded into data warehouse or big data. So they are simultaneously shipped
to a backup process which loads a backup.

Virtual snapshot: If the hardware managing the storage subsystem takes
internal copies of files, then the database writes are disabled for a short
period. This coping is completing within in seconds. After coping is completed,
the database management system allowed to resume write operation. So it is a
hardware solution that allows media to create a virtual backup.

Local and remote copies: Local and remote copies is a classic method. It is
consists of disk and tape backups. It works on either physical disk drives or
database. Database administrators execute vendor utilities. They access the
data that is usually stored in a compressed and proprietary format. This type
of backup executes quickly and can be loaded quickly.

 Recovery Automation and Testing: It means automating the recovery
using standard processes or even vendor tools. The smart Database Administrator
automates as much as possible in order to minimize relatively slow human intervention.

Recovery Tools:

A software like DDI Utilities can
provide data protection for backup and recovery solutions.

Datos IO has developed an industry-leading
software product, RecoverX that is purpose-built, to meet the data protection