Now-a-days weare drowning in data but lacking of knowledge.
Data means distinct pieces of information.Besides this today’s world is technology based. There is passing a huge amountof data through electronic medium. For example “Facebook” processes more than 500 terabytes of data daily and “Google” processes more than 20 petabytes of data a day (1 PB= 1024 TB).There are different types of data. To understand “Big data” we need to understand them.
Structured data: The raw data that resides in a fixed field like arecord or file are structured data. For example:Relational data.Semi Structured data: The data is formatted in some way is called semistructured data. For example: XML data.
Unstructured data: The data that is not formatted anyway is called unstructured data.For example: Audio files, Videos, Photos, Sensordata, Web data, Mobile data, GPS data.Multi-structured data: The data that is derived from interactions between peopleand machine are multi-structured data. It is the combination of the above threetypes of data.
For example: web applications, social networks.Why Big data?The term Big Data was first uttered in 1950s. The concept ofbig data has been around for years.
Everywhere you go today, people are looking down at anelectronic device. They are online, browsing, collaborating, shopping for goodsand services and transacting business. These devices are also being usedextensively in business-to-business transactions. Organizations now understandthat if they capture all the data that streams into their businesses, they canapply analytics and get significant value from it very rapidly. Because of thisthinking Big data concept has been come.
Big Data: Big data deals with the collection of large amount of datafrom traditional and digital sources with a view to find out information andanalysis. The data can be both structured and unstructured that big data dealswith. Big data is in everywhere-1.
Web data,e-commerce2. at department/grocerystores3. Bank/Credit Card transactions4. Social NetworkIn another way we can say, Big Data is a technique to solvedata problems that are not solvable using traditional databases and tools.It is a Technique to “Store, Process, Manage, Analysis andReport” a huge amount of variety data at the required speed and within therequired time to allow Real-time Analysis and Reaction.
Uses: Big data is usablein various sectors. Some of them are given below:1. Aggregation andStatistics – Datawarehouse and OLAP2. Indexing, Searching, and Querying– Keywordbased search – Patternmatching 3. Knowledge discovery– DataMiningAdvantage: Cost reduction: By identifying more efficient ways of doing businessand storing large amount of data it brings significant cost advantage.Faster, better decision making: It has been quite easy to make a decision as all therelevant data are stored along with.
New products and services: With the big data analytics, more companies arecreating new products to meet customers need.There are three dimensions to big data. These are:1. Volume2.
Variety 3. Velocity Volume: Volume means very large amount of data. Now that data is generated by machines, networks and humaninteraction on systems, organizations collect data from a variety of sources includingbusiness transactions, social media and information from sensor ormachine-to-machine data. For example Google handle 15-20 EXABYTES of data (1024 PB= 1EB), eBay has 6.5 PB ofuser data + 50 TB/day, CERN’s Large Hydron Collider (LHC) generates 15 PB ayear.
2.7 Zetabytes ofdata exist in the digital universe today. More than 5 billion people are calling,texting, tweeting and browsing on mobile phones worldwide. YouTube usersupload 48 hours of new video every minute of the day.571 new websites arecreated every minute of the day.
Brands and organizations on Facebook receive34,722 Likes every minute of the day.100 terabytes of data uploaded daily toFacebook. According to Twitter’s own research in early 2012, it sees roughly175 million tweets every day, and has more than 465 million accounts.
IDC predicts that theamount of data will be 20 ZB within 2020 and business transactions on theinternet- business-to-business and business-to-consumer will reach 450 billionper day.Velocity: Velocity means produce data at a very fast rate. Theelectronic mediums have to handle a tsunami of data every day. For example”Twitter” stores 250 million tweets a day. Variety: Variety means different forms of data like structured,numeric data, unstructured data, text documents, email, video, audio, stockticker data, financial transactions etc.
Data Types and sources: Big data involves with the data produced by differentdevices and applications. Some examples are given below:• BlackBox Data: It includes voices ofthe flight crew, recordings of microphones and earphones and the performanceinformation of the aircraft.• SocialMedia Data: Social media’sfunctions (Facebook, Twitter) such as posting, uploading photos, video, like,comment etc.• StockExchange Data: The informationabout daily’s exchanges like ‘buy’, ‘sell’ of different companies made by thecustomers. • TransportData: The information ofmodel, capacity, price, distance and availability of a vehicle.• SearchEngine Data: Search enginesretrieve lots of data from different databases which include reverse searching,keyword monitoring, search result and advertisement history, advertisementspending statistics etc.• Sensor data: Analyzing data from smart technologies such as GlobalPositioning System.• Relational Data :Tables, Transaction, Legacy Data.
• Text Data:Web • Graph DataBig Data TechnologiesOperationalBig Data: It provides operationalcapabilities for real time where data is primarily recorded and stored(MongoDB).AnalyticalBig Data: It includes massivelyparallel processing database systems. It follows “MapReduce” algorithm.
Storage: Now-a-days Big data deals with very large unstructured data sets and is dependent on rapidanalytics, with answers provided in seconds. For examples of this are Facebook,Google or Tweeter.To design storage systems that can handle the requirements of bigdata applications canbe divided into six parts. These are given below:1. Evaluating big data app requirements There are two generalcategories. One is large-capacity applications that need to hold hundreds ofterabytes of data and second one is performance-intensive big data analyticsapps.
2. Big dataneeds big protection 3. Making thestorage decisionTo handle the requirements of associating with big dataarchitectures there are two ways. First one is object-based systems and secondone is scale-out file systems.4. Gain newinsights with big data analyticsBig data analytics and data warehousing are both methods ofprocessing large amounts of data.
5. Extending theuses for Hadoop in the data center IT administrators arelooking for ways to implement Hadoop systems for multiple applications.6. Workingaround common Hadoop issues Storage products: There are a multiple big data storage products on themarket. Some of them are given below: 1. HitachiHitachi offers several routes to big data storage.2. DDN DataDirect Networks(DDN) has a collection of solutions concerning big data storage.
3. Spectra BlackPearlTools/ Framework: Everyoneis processing Big Dataand it turns out that this processing can be abstracted to a degree that can bedealt with by all sorts of Big Data processing frameworks. A few of theseframeworks are very well-known.
1. HadoopFirst up is the all-time classic, and one of the top frameworksin use today. It’s an open-sourcesoftware framework for distributed storage of very large datasets on computerclusters. Hadoopprocesses often complex Big Data, and has a slew of tools that follow it aroundlike an entourage, Hadoop (and its underlying MapReduce) is actually quitesimple. 2. MongoDBMongoDB is the modern, start-up approach to databases.It’s good for managing data that changes frequently or data that isunstructured or semi-structured.
3. SparkSpark work in memory ,speeding up processing time4. Flink Flink is aiming to provide facilities for distributedcomputation over streams of data. Flink is effectively both a batch and real-timeprocessing framework.
Flink provides a number of API, it also has its ownmachine learning and graph processing libraries. 5. Teradata: If one don’t know how to analyze and use bigdata, it’s worthless. Teradata provide end-to-end solutions and services indata warehousing, big data and analytics and marketing applications.
Security: Big data security isto keep out unauthorized users and intrusions with firewalls, strong userauthentication, end-user training, and intrusion protection systems and intrusiondetection systems. There are five pillars of security. These are:1. Authentication2. Authorization3. Audit4. Data Protection5.
Centralized Administration Big Data Security TechnologiesThe scalability and the ability to secure multipletypes of data in different stages mainly differ the big data security tools.1. Encryption: It aims tosecure data in transit and at-rest and they need to do it across massive datavolume.It operate different types of data, different analytical toolsets andtheir output data. 2. Centralized KeyManagement: It includes policy-driven automation, logging, on-demand keydelivery and abstracting key management from key usage.
3. UserAccess Control: Useraccess control is the most basic network security tool. Multiple administratorsettings protect the big data platform against inside attack.
4. IntrusionDetection and Prevention: Intrusion detection and prevention systems are securityworkhorses. Big data’s value and distributed architecture lends itself tointrusion attempts. IPS enables security admins to protect the big dataplatform from intrusion.
Physical Security: Physical security isalso important. It can be done with Video surveillance and security logs etc.Big Data Security Tools:1. Knor2. Kerberos3. Ranger4. HDFS Encryption5.
Wire Encryption Big Data Security CompaniesNow-a-days Big data owners are willing and able to spendmoney to secure the valuable employments and vendors are responding. Some of representative big data securitycompanies name are givenbelow:1. Thales (Vormetric)2. Cloudwick3. IBM4. Logtrust Gemalto Data Recovery:Big data is not mission-critical and mostly historical andstatic. It’s applications tend to have massive data storage capacity.
Thebackup processes only need to run once on each snapshot. The most common backupmethods are given below:Data replication: Data replication is a common backup approach. Dataare loaded into data warehouse or big data. So they are simultaneously shippedto a backup process which loads a backup.Virtual snapshot: If the hardware managing the storage subsystem takesinternal copies of files, then the database writes are disabled for a shortperiod. This coping is completing within in seconds.
After coping is completed,the database management system allowed to resume write operation. So it is ahardware solution that allows media to create a virtual backup. Local and remote copies: Local and remote copies is a classic method. It isconsists of disk and tape backups.
It works on either physical disk drives ordatabase. Database administrators execute vendor utilities. They access thedata that is usually stored in a compressed and proprietary format. This typeof backup executes quickly and can be loaded quickly.
Recovery Automation and Testing: It means automating the recoveryusing standard processes or even vendor tools. The smart Database Administratorautomates as much as possible in order to minimize relatively slow human intervention.Recovery Tools:1. A software like DDI Utilities canprovide data protection for backup and recovery solutions.2.
Datos IO has developed an industry-leadingsoftware product, RecoverX that is purpose-built, to meet the data protectionrequirements.