A Survey on Big Data Concepts and Tools

Size: px
Start display at page:

Download "A Survey on Big Data Concepts and Tools"

Transcription

1 A Survey on Big Data Concepts and Tools D. Rajasekar 1, C. Dhanamani 2, S. K. Sandhya 3 1,3 PG Scholar, 2 Assistant Professor, Department of Computer Science and Engineering, Sri Krishna College of Engineering and Technology, Coimbatore, Tamilnadu, India, Abstract The term Big Data refers to the massive amount of structured, unstructured and semi-structured data that are so large and unwieldy the regular database management tools or statistics tools have difficulty among capturing, analysis, storing, sharing, visualize and managing the information. Big Data includes messages, snaps, business deals, surveillance video recordings, posts of social medias, mobile phone GPS signals and RFID readers, microphones, cameras, sensors and activity logs. That s why, the data that exceeds the processing capacity of conventional database systems. That data is very big, moves very fast or doesn t fit the structure of your traditional database architecture. This paper presents an overview of Big Data concept, challenges and its Tools. Keywords Big Data, variety, volume, value, veracity, velocity I. INTRODUCTION In earlier days, internet is used for only the purpose of sending s messages, browsing queries. But recently internets are used for more purpose. So the internet generates very huge amount of data at every seconds and it will increased day by day. So such type of data is difficult to process because it contains the billons records of million people. Information that includes the web sale, social networks, online transactions, s, audios, videos, images, sensor data, science research, health records, search queries, click streams, posts, cloud computing, mobile phones and their applications. It cannot be deals with traditional database like RDBMS, DBMS, and ORDBMS. The advantages of MapReduce over traditional systems are shown in Table 1. II. HOW BIG IS BIG DATA In the last twenty years, the data is increasing day by day across the world. There are more than 2 billion internet users in the world today and 4.6 billion mobile phones in Interestingly, approximately 80% of these data are unstructured, 20% of data are only a structured with this massive quantity of data, business need fast, authentic, deeper data insight. TABLE I RDBMS VS MAPREDUCE Traditional RDBMS MapReduce Data Size Gigabytes Petabytes Access Interactive and Batch Batch Updates Read and Write many times Write once, Read many times Structure Static Schema Dynamic Schema Integrity High Low Scaling Nonlinear Linear Some facts about the data are, there are more than 250,000 tweets every minute, more than 2 million searchings on Google every minute, more than 70 hours of videos are uploaded to YouTube, There are more than 100 million s are sent, more than 400 GB of data is processing in facebook and more than 570 websites are created every minute on internet. Facebook has more than 1 billion people active accounts from which 751 million using facebook from a mobile. During 2012, 2.5 quintillion bytes of data were created every day. Big data and its analysis are the center of recent science and business areas. Large amount of information is generated from the varied sources either in structured or unstructured kind. Such type of data stored in databases and then it became difficult to extract, transform and load. IBM indicates that 2.5 exabytes data is formed every day that is extremely tough to investigate. The estimation about the generated data is that till 2003 it was represented about 5 EB of data, then until 2012 is 2.7 ZB (Unit name and Sizes of data are show in Table 2) of data and till 2015 it is expected to increase 3 times. TABLE 2 UNIT NAME AND SIZES Unit Name Sizes in Bytes Kilobyte (KB) 1,000 Megabyte (MB) 1,000,000 Gigabytes (GB) 1,000,000,000 Terabyte (TB) 1,000,000,000,000 Petabyte (PB) 1,000,000,000,000,000 Exabyte (EB) 1,000,000,000,000,000,000 Zettabyte (ZB) 1,000,000,000,000,000,000,000 Yottabyte (YB) 1,000,000,000,000,000,000,000,000 80

2 III. CHALLENGES IN BIG DATA The need of big data generated from the large companies like YouTube, Facebook, Google, yahoo, etc for the purpose of analysis of enormous amount of data which is in structured form or even in unstructured form. Google contains the large amount of information. So; there is the need of Big Data Analytics that is the processing of the complex and massive datasets This data is different from structured data (which is stored in relational database systems) in terms of five parameters variety, volume, value, veracity and velocity (5V s) are the challenges of big data management are: 1. Volume Data is ever-growing day by day of all types ever Kilo Byte, Mega Byte, Peta Byte, Yotta Byte, Zetta Byte, Tera Byte of information. The data results into massive files. Excessive volume of information is main issues of storage. This main issue is resolved by reducing storage value. Data volumes are expected to grow more than 50 times by Variety Data sources (even in the same field or in distinct) are extremely heterogeneous. The files comes in various formats and of any type, it may be unstructured or structured such as text, audio, log files, videos and more. The varieties are endless, and the data enters the network without having been quantified or qualified in any way. 3. Velocity The data comes at high speed. Sometimes one minute is too late so big data is time sensitive. Most organisations data velocity is main challenge. The credit card transactions and social media messages done in millisecond and data generated by this putting in to databases. 4. Value Which addresses the requirement for valuation of enterprise data? It is a most important V in big data. Value is main buzz for big data because it is important for IT infrastructure system, businesses to store large amount of values in database. 5. Veracity The increase in the range of values typical of a large data set. When we dealing with high volume, velocity and variety of data, the all of data are not going 100% correct, there will be dirty data. Big data and analytics technologies work with these types of data. FIGURE 1 BIG DATA PARAMETERS IV. TOOLS AND TECHNIQUES Hadoop Hadoop is an open source project of the Apache Foundation that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is a framework written in java originally developed by Doug Cutting who named it after his son s toy elephant. Hadoop uses Google s Map Reduce and Google File System technologies as its foundation. It is optimised to handle massive quantities of data which could be structured, unstructured or semi structured, using commodity hardware, that is, relatively inexpensive computers. This massive parallel processing is done with great performance. However, it is a batch operation handling massive quantities of data, so the response time is not immediate. Hadoop replicates its data across different computers, so that if one goes down, the data are processed on one of the replicated computers. Hadoop is a framework that can run applications on systems with thousands of nodes and terabytes. It distributes the file among the nodes and allows to system continue work in case of a node failure. This approach reduces the risk of catastrophic system failure. In which application is broken into smaller parts (fragments or blocks).apache Hadoop consists of the Hadoop kernel, Hadoop Distributed File System (HDFS), map reduce and related projects are zookeeper, Hbase, Apache Hive. 81

3 Hadoop Distributed File System (HDFS) consists of three Components: the Name Node, Secondary Name Node and Data Node. The Multilevel Secure (MLS) environmental problems of Hadoop by using Security Enhanced Linux (SE Linux) protocol. In which multiple sources of Hadoop applications run at different levels. This protocol is a denotation of Hadoop Distributed File System (HDFS). Hadoop is commonly used for distributed batch index building; it is desirable to optimize the index capability in close continuous. Hadoop provides components for storage and analysis for large scale processing. Now a day s Hadoop used by hundreds of companies. 1. HADOOP DISTRIBUTED FILE SYSTEM HDFS is a block-structured distributed file system that holds the large amount of Big Data. In the HDFS the data is stored in blocks that are known as chunks. FIGURE 3 HDFS ARCHITECTURE FIGURE 2 HADOOP DISTRIBUTED FILE SYSTEM The advantage of Hadoop is Distributed storage & Computational capabilities, extremely scalable, optimized for high throughput, large block sizes, tolerant of software and hardware failure. The disadvantage of Hadoop is that it is master processes are single points of failure. Hadoop does not offer storage or network level encryption, inefficient for handling small files. Hadoop is not suitable for OnLine Transaction Processing workloads where data are randomly accessed on structured data like a relational database. Also it is not suitable for OnLine Analytical Processing or Decision Support System workloads where data are sequentially accessed on structured data like a relational database, to generate reports that provide business intelligence. Hadoop is used for Big Data. It complements OnLine Transaction Processing and OnLine Analytical Processing. It is NOT a replacement for a relational database system. Hadoop mainly consists of following two components: 1. Hadoop Distributed File System (HDFS) 2. Map Reduce Hadoop thus provides: a reliable shared storage and analysis system. The storage is provided by HDFS and analysis by Map Reduce. HDFS is client-server architecture comprises of NameNode and many DataNodes.The name node stores the metadata for the NameNode.NameNodes keeps track of the state of the DataNodes. NameNode is also responsible for the file system operations etc. When Name Node fails the Hadoop doesn t support automatic recovery, but the configuration of secondary nod is possible. 2. Map Reduce MapReduce is a programming framework for distributed computing which was created by Google using the divide and conquer method to break down complex big data problems into small units of work and process them in parallel. MapReduce can be divided into two stages: FIGURE 4 MAP REDUCE ARCHITECTURE 82

4 a) Map Step The master node data is chopped up into many smaller sub problems. A worker node processes some subset of the smaller problems under the control of the JobTracker node and stores the result in the local file system where a reducer is able to access it. b) Reduce Step This step analyzes and merges input data from the map steps. There can be multiple reduce tasks to parallelize the aggregation, and these tasks are executed on the worker nodes under the control of the JobTracker. Components of Hadoop: HDFS: A highly fault tolerant distributed file system that is responsible for storing data on the clusters. MapReduce: A powerful parallel programming technique for distributed processing on clusters. HBase: A scalable, distributed database for random read/write access. Pig: A high level data processing system for analyzing data sets that occurs a high level language. Hive: A data warehousing application that provides a SQL like interface and relational model. Sqoop: A project for transferring data between relational databases and Hadoop. Avro: A system of data serialization. Oozie: A workflow for dependent Hadoop jobs. Chukwa: A Hadoop subproject as data accumulation system for monitoring distributed systems. Flume: A reliable and distributed streaming log collection. ZooKeeper: A centralized service for providing distributed synchronization and group services along with maintenance of the configuration information and records. V. APPLICATIONS McKinsey Global Institute specified the potential of big data in five main topics: Healthcare: clinical decision support systems, individual analytics applied for patient profile, personalized medicine, performance based pricing for personnel, analyze disease patterns, improve public health Public sector: creating transparency by accessible related data, discover needs, improve performance, customize actions for suitable products and services, decision making with automated systems to decrease risks, innovating new products and services Retail: in store behavior analysis, variety and price optimization, product placement design, improve performance, labor inputs optimization, distribution and logistics optimization, web based markets Manufacturing: improved demand forecasting, supply chain planning, sales support, developed production operations, web search based applications Personal location data: smart routing, geo targeted advertising or emergency response, urban planning, new business models Web provides kind of opportunities for big data too. For example; social network analysis such as understanding user intelligence for more targeted advertising, marketing campaigns and capacity planning, customer behavior and buying patterns also sentiment analytics. According to these inference firms optimization their content and recommendation engine. Some companies such as Amazon and Google publishing articles related to their work. Inspired by the writings published, developers are developing similar technologies as open source software such as Lucene, Solr, Hadoop and HBase. Social Media such as Facebook, Twitter and LinkedIn are going a step further thereby publishing open source projects for big data like Cassandra, Hive, Pig, Voldemort, Storm, IndexTank. In addition, predictive analytics on traffic flows or identify guilties and threats from different video, audio and data feeds are advantages of big data again. In 2012, Obama government announced big data initiatives of more than $200 million in research and development investments for National Science Foundation, National Institutes of Health, Department of Defense, Department of Energy and United States Geological Survey. The investments were launched to take a step forward instruments and methods for access, organize and collect findings from vast volumes of digital data. VI. CONCLUSION In this article, an overview of big data's concept, tools, techniques, applications, advantages and challenges have been reviewed. The results have given away that regardless of the fact that accessible information, tools and techniques available in the literature, there are numerous focuses to be viewed as, discussed, analyzed, developed, improved, and so on. 83

5 Although this paper obviously has not resolved the complete subject about this substantial topic, emphatically it has provided some useful discussion and we can conclude that Hadoop is possibly one of the best solutions to maintain the Big Data. REFERENCES [1] Sagiroglu, Sinanc, "Big Data: A Review", /13 IEEE. [2] Sabia, Arora, "Technologies to Handle Big Data: A Survey". [3] Shilpa, Manjit Kaur, "Big Data and Methodology A review", Volume 3, Issue 10, October [4] (last access - 10/11/2014). [5] K. Bakshi, "Considerations for Big Data: Architecture and Approach", Aerospace Conference IEEE, Big Sky Montana, March [6] D. Garlasu, V. Sandulescu, I. Halcu, G.Neculoiu, "A Big Data implementation based on Grid Computing", Grid Computing, January [7] R.D. Schneider, "Hadoop for Dummies Special Edition", John Wiley&Sons Canada, , [8] Yuri Demchenko, "The Big Data Architecture Framework (BDAF)", Outcome of the Brainstorming Session at the University of Amsterdam, 17 July [9] Aditya B. Patel, Manashvi Birla, Ushma Nair, "Addressing Big Data Problem Using Hadoop and Map Reduce", Aug, [10] how much data is on the internet and generated online every minute. [11] [12] (last access 4/11/2014). [13] -introduction-to-mapreduce/ (last access 29/10/2014). [14] Definitions of Big Data - open tracker, [15] White paper BigData-as-a service, A Market & Technology Perspective-EMC Solution group [16] [17] [18] Rathod, Chauhan, "A Survey on Big Data Analysis Techniques" IJSRD - International Journal for Scientific Research & Development Vol. 1, Issue 9, 2013 ISSN (online): [19] [20] [21] 84

BIG DATA CHALLENGES AND PERSPECTIVES

BIG DATA CHALLENGES AND PERSPECTIVES BIG DATA CHALLENGES AND PERSPECTIVES Meenakshi Sharma 1, Keshav Kishore 2 1 Student of Master of Technology, 2 Head of Department, Department of Computer Science and Engineering, A P Goyal Shimla University,

More information

A Brief Outline on Bigdata Hadoop

A Brief Outline on Bigdata Hadoop A Brief Outline on Bigdata Hadoop Twinkle Gupta 1, Shruti Dixit 2 RGPV, Department of Computer Science and Engineering, Acropolis Institute of Technology and Research, Indore, India Abstract- Bigdata is

More information

BIG DATA TRENDS AND TECHNOLOGIES

BIG DATA TRENDS AND TECHNOLOGIES BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.

More information

Technologies to Handle Big Data: A Survey

Technologies to Handle Big Data: A Survey Sabia 1 and Love Arora 2 1,2 Department of Computer Science & Engineering, Guru Nanak Dev University, Regional Campus, Jalandhar, India E-mail: 1 sabiajal@gmail.com, 2 aroralove7@gmail.com Abstract Big

More information

BIG DATA TECHNOLOGY. Hadoop Ecosystem

BIG DATA TECHNOLOGY. Hadoop Ecosystem BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big

More information

Large scale processing using Hadoop. Ján Vaňo

Large scale processing using Hadoop. Ján Vaňo Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine

More information

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe

More information

Chapter 7. Using Hadoop Cluster and MapReduce

Chapter 7. Using Hadoop Cluster and MapReduce Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

More information

Hadoop implementation of MapReduce computational model. Ján Vaňo

Hadoop implementation of MapReduce computational model. Ján Vaňo Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed

More information

Are You Ready for Big Data?

Are You Ready for Big Data? Are You Ready for Big Data? Jim Gallo National Director, Business Analytics February 11, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?

More information

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give

More information

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012 Big Data Buzzwords From A to Z By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012 Big Data Buzzwords Big data is one of the, well, biggest trends in IT today, and it has spawned a whole new generation

More information

Big Data: Tools and Technologies in Big Data

Big Data: Tools and Technologies in Big Data Big Data: Tools and Technologies in Big Data Jaskaran Singh Student Lovely Professional University, Punjab Varun Singla Assistant Professor Lovely Professional University, Punjab ABSTRACT Big data can

More information

Big Data: Study in Structured and Unstructured Data

Big Data: Study in Structured and Unstructured Data Big Data: Study in Structured and Unstructured Data Motashim Rasool 1, Wasim Khan 2 mail2motashim@gmail.com, khanwasim051@gmail.com Abstract With the overlay of digital world, Information is available

More information

Are You Ready for Big Data?

Are You Ready for Big Data? Are You Ready for Big Data? Jim Gallo National Director, Business Analytics April 10, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?

More information

Hadoop Big Data for Processing Data and Performing Workload

Hadoop Big Data for Processing Data and Performing Workload Hadoop Big Data for Processing Data and Performing Workload Girish T B 1, Shadik Mohammed Ghouse 2, Dr. B. R. Prasad Babu 3 1 M Tech Student, 2 Assosiate professor, 3 Professor & Head (PG), of Computer

More information

How To Scale Out Of A Nosql Database

How To Scale Out Of A Nosql Database Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI

More information

Hadoop IST 734 SS CHUNG

Hadoop IST 734 SS CHUNG Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to

More information

Hadoop Architecture. Part 1

Hadoop Architecture. Part 1 Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,

More information

Hadoop Ecosystem B Y R A H I M A.

Hadoop Ecosystem B Y R A H I M A. Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open

More information

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi Getting Started with Hadoop Raanan Dagan Paul Tibaldi What is Apache Hadoop? Hadoop is a platform for data storage and processing that is Scalable Fault tolerant Open source CORE HADOOP COMPONENTS Hadoop

More information

UNDERSTANDING THE BIG DATA PROBLEMS AND THEIR SOLUTIONS USING HADOOP AND MAP-REDUCE

UNDERSTANDING THE BIG DATA PROBLEMS AND THEIR SOLUTIONS USING HADOOP AND MAP-REDUCE UNDERSTANDING THE BIG DATA PROBLEMS AND THEIR SOLUTIONS USING HADOOP AND MAP-REDUCE Mr. Swapnil A. Kale 1, Prof. Sangram S.Dandge 2 1 ME (CSE), First Year, Department of CSE, Prof. Ram Meghe Institute

More information

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014 Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014 Defining Big Not Just Massive Data Big data refers to data sets whose size is beyond the ability of typical database software tools

More information

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics Dr. Liangxiu Han Future Networks and Distributed Systems Group (FUNDS) School of Computing, Mathematics and Digital Technology,

More information

Data-Intensive Computing with Map-Reduce and Hadoop

Data-Intensive Computing with Map-Reduce and Hadoop Data-Intensive Computing with Map-Reduce and Hadoop Shamil Humbetov Department of Computer Engineering Qafqaz University Baku, Azerbaijan humbetov@gmail.com Abstract Every day, we create 2.5 quintillion

More information

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics

More information

So What s the Big Deal?

So What s the Big Deal? So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data

More information

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social

More information

Age of Big data. Presented by: Mohammad Iqbal BCM -2014

Age of Big data. Presented by: Mohammad Iqbal BCM -2014 Age of Presented by: Mohammad Iqbal BCM -2014 Agenda Big? Big evolution from Big? Name Symbol Value Kilobyte KB 10^3 BIG DATA Megabyte MB 10^6 Gigabyte GB 10^9 Terabyte TB 10^12 Petabyte PB 10^15 So large

More information

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets

More information

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Kanchan A. Khedikar Department of Computer Science & Engineering Walchand Institute of Technoloy, Solapur, Maharashtra,

More information

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12 Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using

More information

Introduction to the Mathematics of Big Data. Philippe B. Laval

Introduction to the Mathematics of Big Data. Philippe B. Laval Introduction to the Mathematics of Big Data Philippe B. Laval Fall 2015 Introduction In recent years, Big Data has become more than just a buzz word. Every major field of science, engineering, business,

More information

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Prepared By : Manoj Kumar Joshi & Vikas Sawhney Prepared By : Manoj Kumar Joshi & Vikas Sawhney General Agenda Introduction to Hadoop Architecture Acknowledgement Thanks to all the authors who left their selfexplanatory images on the internet. Thanks

More information

www.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage

www.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage www.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage If every image made and every word written from the earliest stirring of civilization

More information

MapReduce with Apache Hadoop Analysing Big Data

MapReduce with Apache Hadoop Analysing Big Data MapReduce with Apache Hadoop Analysing Big Data April 2010 Gavin Heavyside gavin.heavyside@journeydynamics.com About Journey Dynamics Founded in 2006 to develop software technology to address the issues

More information

Microsoft SQL Server 2012 with Hadoop

Microsoft SQL Server 2012 with Hadoop Microsoft SQL Server 2012 with Hadoop Debarchan Sarkar Chapter No. 1 "Introduction to Big Data and Hadoop" In this package, you will find: A Biography of the author of the book A preview chapter from the

More information

White Paper: What You Need To Know About Hadoop

White Paper: What You Need To Know About Hadoop CTOlabs.com White Paper: What You Need To Know About Hadoop June 2011 A White Paper providing succinct information for the enterprise technologist. Inside: What is Hadoop, really? Issues the Hadoop stack

More information

Introduction to Predictive Analytics. Dr. Ronen Meiri ronen@dmway.com

Introduction to Predictive Analytics. Dr. Ronen Meiri ronen@dmway.com Introduction to Predictive Analytics Dr. Ronen Meiri Outline From big data to predictive analytics Predictive Analytics vs. BI Intelligent platforms What can we do with it. The modeling process. Example

More information

Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies

Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com Image

More information

Manifest for Big Data Pig, Hive & Jaql

Manifest for Big Data Pig, Hive & Jaql Manifest for Big Data Pig, Hive & Jaql Ajay Chotrani, Priyanka Punjabi, Prachi Ratnani, Rupali Hande Final Year Student, Dept. of Computer Engineering, V.E.S.I.T, Mumbai, India Faculty, Computer Engineering,

More information

Transforming the Telecoms Business using Big Data and Analytics

Transforming the Telecoms Business using Big Data and Analytics Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe

More information

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop

More information

Open source Google-style large scale data analysis with Hadoop

Open source Google-style large scale data analysis with Hadoop Open source Google-style large scale data analysis with Hadoop Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory School of Electrical

More information

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org dhruba@facebook.com

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org dhruba@facebook.com Hadoop Distributed File System Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org dhruba@facebook.com Hadoop, Why? Need to process huge datasets on large clusters of computers

More information

Data Refinery with Big Data Aspects

Data Refinery with Big Data Aspects International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data

More information

Apache Hadoop FileSystem and its Usage in Facebook

Apache Hadoop FileSystem and its Usage in Facebook Apache Hadoop FileSystem and its Usage in Facebook Dhruba Borthakur Project Lead, Apache Hadoop Distributed File System dhruba@apache.org Presented at Indian Institute of Technology November, 2010 http://www.facebook.com/hadoopfs

More information

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW AGENDA What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story Hadoop PDW Our BIG DATA Roadmap BIG DATA? Volume 59% growth in annual WW information 1.2M Zetabytes (10 21 bytes) this

More information

Architectures for massive data management

Architectures for massive data management Architectures for massive data management Apache Kafka, Samza, Storm Albert Bifet albert.bifet@telecom-paristech.fr October 20, 2015 Stream Engine Motivation Digital Universe EMC Digital Universe with

More information

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org June 3 rd, 2008

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org June 3 rd, 2008 Hadoop Distributed File System Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org June 3 rd, 2008 Who Am I? Hadoop Developer Core contributor since Hadoop s infancy Focussed

More information

Keywords: Big Data, Hadoop, cluster, heterogeneous, HDFS, MapReduce

Keywords: Big Data, Hadoop, cluster, heterogeneous, HDFS, MapReduce Volume 5, Issue 9, September 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Study of

More information

Entering the Zettabyte Age Jeffrey Krone

Entering the Zettabyte Age Jeffrey Krone Entering the Zettabyte Age Jeffrey Krone 1 Kilobyte 1,000 bits/byte. 1 megabyte 1,000,000 1 gigabyte 1,000,000,000 1 terabyte 1,000,000,000,000 1 petabyte 1,000,000,000,000,000 1 exabyte 1,000,000,000,000,000,000

More information

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: simmibagga12@gmail.com

More information

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce Analytics in the Cloud Peter Sirota, GM Elastic MapReduce Data-Driven Decision Making Data is the new raw material for any business on par with capital, people, and labor. What is Big Data? Terabytes of

More information

CSE-E5430 Scalable Cloud Computing Lecture 2

CSE-E5430 Scalable Cloud Computing Lecture 2 CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing

More information

Hadoop and Map-Reduce. Swati Gore

Hadoop and Map-Reduce. Swati Gore Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data

More information

Finding Insights & Hadoop Cluster Performance Analysis over Census Dataset Using Big-Data Analytics

Finding Insights & Hadoop Cluster Performance Analysis over Census Dataset Using Big-Data Analytics Finding Insights & Hadoop Cluster Performance Analysis over Census Dataset Using Big-Data Analytics Dharmendra Agawane 1, Rohit Pawar 2, Pavankumar Purohit 3, Gangadhar Agre 4 Guide: Prof. P B Jawade 2

More information

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2016 MapReduce MapReduce is a programming model

More information

NoSQL and Hadoop Technologies On Oracle Cloud

NoSQL and Hadoop Technologies On Oracle Cloud NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath

More information

Hadoop Submitted in partial fulfillment of the requirement for the award of degree of Bachelor of Technology in Computer Science

Hadoop Submitted in partial fulfillment of the requirement for the award of degree of Bachelor of Technology in Computer Science A Seminar report On Hadoop Submitted in partial fulfillment of the requirement for the award of degree of Bachelor of Technology in Computer Science SUBMITTED TO: www.studymafia.org SUBMITTED BY: www.studymafia.org

More information

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Created by Doug Cutting and Mike Carafella in 2005. Cutting named the program after

More information

While a number of technologies fall under the Big Data label, Hadoop is the Big Data mascot.

While a number of technologies fall under the Big Data label, Hadoop is the Big Data mascot. While a number of technologies fall under the Big Data label, Hadoop is the Big Data mascot. Remember it stands front and center in the discussion of how to implement a big data strategy. Early adopters

More information

Internals of Hadoop Application Framework and Distributed File System

Internals of Hadoop Application Framework and Distributed File System International Journal of Scientific and Research Publications, Volume 5, Issue 7, July 2015 1 Internals of Hadoop Application Framework and Distributed File System Saminath.V, Sangeetha.M.S Abstract- Hadoop

More information

Doing Multidisciplinary Research in Data Science

Doing Multidisciplinary Research in Data Science Doing Multidisciplinary Research in Data Science Assoc.Prof. Abzetdin ADAMOV CeDAWI - Center for Data Analytics and Web Insights Qafqaz University aadamov@qu.edu.az http://ce.qu.edu.az/~aadamov 16 May

More information

REVIEW PAPER ON BIG DATA USING HADOOP

REVIEW PAPER ON BIG DATA USING HADOOP International Journal of Computer Engineering & Technology (IJCET) Volume 6, Issue 12, Dec 2015, pp. 65-71, Article ID: IJCET_06_12_008 Available online at http://www.iaeme.com/ijcet/issues.asp?jtype=ijcet&vtype=6&itype=12

More information

CSC590: Selected Topics BIG DATA & DATA MINING. Lecture 2 Feb 12, 2014 Dr. Esam A. Alwagait

CSC590: Selected Topics BIG DATA & DATA MINING. Lecture 2 Feb 12, 2014 Dr. Esam A. Alwagait CSC590: Selected Topics BIG DATA & DATA MINING Lecture 2 Feb 12, 2014 Dr. Esam A. Alwagait Agenda Introduction What is Big Data Why Big Data? Characteristics of Big Data Applications of Big Data Problems

More information

Hadoop & its Usage at Facebook

Hadoop & its Usage at Facebook Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System dhruba@apache.org Presented at the Storage Developer Conference, Santa Clara September 15, 2009 Outline Introduction

More information

Keywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop

Keywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop Volume 4, Issue 1, January 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Transitioning

More information

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing Data-Intensive Programming Timo Aaltonen Department of Pervasive Computing Data-Intensive Programming Lecturer: Timo Aaltonen University Lecturer timo.aaltonen@tut.fi Assistants: Henri Terho and Antti

More information

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics Overview Big Data in Apache Hadoop - HDFS - MapReduce in Hadoop - YARN https://hadoop.apache.org 138 Apache Hadoop - Historical Background - 2003: Google publishes its cluster architecture & DFS (GFS)

More information

Big Data. White Paper. Big Data Executive Overview WP-BD-10312014-01. Jafar Shunnar & Dan Raver. Page 1 Last Updated 11-10-2014

Big Data. White Paper. Big Data Executive Overview WP-BD-10312014-01. Jafar Shunnar & Dan Raver. Page 1 Last Updated 11-10-2014 White Paper Big Data Executive Overview WP-BD-10312014-01 By Jafar Shunnar & Dan Raver Page 1 Last Updated 11-10-2014 Table of Contents Section 01 Big Data Facts Page 3-4 Section 02 What is Big Data? Page

More information

Application Development. A Paradigm Shift

Application Development. A Paradigm Shift Application Development for the Cloud: A Paradigm Shift Ramesh Rangachar Intelsat t 2012 by Intelsat. t Published by The Aerospace Corporation with permission. New 2007 Template - 1 Motivation for the

More information

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components of Hadoop. We will see what types of nodes can exist in a Hadoop

More information

Suresh Lakavath csir urdip Pune, India lsureshit@gmail.com.

Suresh Lakavath csir urdip Pune, India lsureshit@gmail.com. A Big Data Hadoop Architecture for Online Analysis. Suresh Lakavath csir urdip Pune, India lsureshit@gmail.com. Ramlal Naik L Acme Tele Power LTD Haryana, India ramlalnaik@gmail.com. Abstract Big Data

More information

Certified Big Data and Apache Hadoop Developer VS-1221

Certified Big Data and Apache Hadoop Developer VS-1221 Certified Big Data and Apache Hadoop Developer VS-1221 Certified Big Data and Apache Hadoop Developer Certification Code VS-1221 Vskills certification for Big Data and Apache Hadoop Developer Certification

More information

Improving Data Processing Speed in Big Data Analytics Using. HDFS Method

Improving Data Processing Speed in Big Data Analytics Using. HDFS Method Improving Data Processing Speed in Big Data Analytics Using HDFS Method M.R.Sundarakumar Assistant Professor, Department Of Computer Science and Engineering, R.V College of Engineering, Bangalore, India

More information

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges Prerita Gupta Research Scholar, DAV College, Chandigarh Dr. Harmunish Taneja Department of Computer Science and

More information

Addressing Open Source Big Data, Hadoop, and MapReduce limitations

Addressing Open Source Big Data, Hadoop, and MapReduce limitations Addressing Open Source Big Data, Hadoop, and MapReduce limitations 1 Agenda What is Big Data / Hadoop? Limitations of the existing hadoop distributions Going enterprise with Hadoop 2 How Big are Data?

More information

BIG DATA What it is and how to use?

BIG DATA What it is and how to use? BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14

More information

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance. Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analytics

More information

HDP Hadoop From concept to deployment.

HDP Hadoop From concept to deployment. HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some

More information

Big Data and Hadoop for the Executive A Reference Guide

Big Data and Hadoop for the Executive A Reference Guide Big Data and Hadoop for the Executive A Reference Guide Overview The amount of information being collected by companies today is incredible. Wal- Mart has 460 terabytes of data, which, according to the

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A COMPREHENSIVE VIEW OF HADOOP ER. AMRINDER KAUR Assistant Professor, Department

More information

The Big Deal about Big Data. Mike Skinner, CPA CISA CITP HORNE LLP

The Big Deal about Big Data. Mike Skinner, CPA CISA CITP HORNE LLP The Big Deal about Big Data Mike Skinner, CPA CISA CITP HORNE LLP Mike Skinner, CPA CISA CITP Senior Manager, IT Assurance & Risk Services HORNE LLP Focus areas: IT security & risk assessment IT governance,

More information

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012. Viswa Sharma Solutions Architect Tata Consultancy Services

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012. Viswa Sharma Solutions Architect Tata Consultancy Services Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012 Viswa Sharma Solutions Architect Tata Consultancy Services 1 Agenda What is Hadoop Why Hadoop? The Net Generation is here Sizing the

More information

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Hadoop MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Understanding Hadoop Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current

More information

Lecture 2 (08/31, 09/02, 09/09): Hadoop. Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015

Lecture 2 (08/31, 09/02, 09/09): Hadoop. Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015 Lecture 2 (08/31, 09/02, 09/09): Hadoop Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015 K. Zhang BUDT 758 What we ll cover Overview Architecture o Hadoop

More information

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing

More information

EXPERIMENTATION. HARRISON CARRANZA School of Computer Science and Mathematics

EXPERIMENTATION. HARRISON CARRANZA School of Computer Science and Mathematics BIG DATA WITH HADOOP EXPERIMENTATION HARRISON CARRANZA Marist College APARICIO CARRANZA NYC College of Technology CUNY ECC Conference 2016 Poughkeepsie, NY, June 12-14, 2016 Marist College AGENDA Contents

More information

Google Bing Daytona Microsoft Research

Google Bing Daytona Microsoft Research Google Bing Daytona Microsoft Research Raise your hand Great, you can help answer questions ;-) Sit with these people during lunch... An increased number and variety of data sources that generate large

More information

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2 Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue

More information

White Paper: Hadoop for Intelligence Analysis

White Paper: Hadoop for Intelligence Analysis CTOlabs.com White Paper: Hadoop for Intelligence Analysis July 2011 A White Paper providing context, tips and use cases on the topic of analysis over large quantities of data. Inside: Apache Hadoop and

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON BIG DATA MANAGEMENT AND ITS SECURITY PRUTHVIKA S. KADU 1, DR. H. R.

More information

Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software. 22 nd October 2013 10:00 Sesión B - DB2 LUW

Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software. 22 nd October 2013 10:00 Sesión B - DB2 LUW Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software 22 nd October 2013 10:00 Sesión B - DB2 LUW 1 Agenda Big Data The Technical Challenges Architecture of Hadoop

More information

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to

More information

Data processing goes big

Data processing goes big Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,

More information

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 ISSN 2278-7763

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 ISSN 2278-7763 International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 A Discussion on Testing Hadoop Applications Sevuga Perumal Chidambaram ABSTRACT The purpose of analysing

More information

marlabs driving digital agility WHITEPAPER Big Data and Hadoop

marlabs driving digital agility WHITEPAPER Big Data and Hadoop marlabs driving digital agility WHITEPAPER Big Data and Hadoop Abstract This paper explains the significance of Hadoop, an emerging yet rapidly growing technology. The prime goal of this paper is to unveil

More information

Big Data Explained. An introduction to Big Data Science.

Big Data Explained. An introduction to Big Data Science. Big Data Explained An introduction to Big Data Science. 1 Presentation Agenda What is Big Data Why learn Big Data Who is it for How to start learning Big Data When to learn it Objective and Benefits of

More information

Design and Evolution of the Apache Hadoop File System(HDFS)

Design and Evolution of the Apache Hadoop File System(HDFS) Design and Evolution of the Apache Hadoop File System(HDFS) Dhruba Borthakur Engineer@Facebook Committer@Apache HDFS SDC, Sept 19 2011 Outline Introduction Yet another file-system, why? Goals of Hadoop

More information