Age of Big data. Presented by: Mohammad Iqbal BCM -2014

Size: px
Start display at page:

Download "Age of Big data. Presented by: Mohammad Iqbal BCM -2014"

Transcription

1 Age of Presented by: Mohammad Iqbal BCM -2014

2 Agenda Big? Big evolution from

3 Big? Name Symbol Value Kilobyte KB 10^3 BIG DATA Megabyte MB 10^6 Gigabyte GB 10^9 Terabyte TB 10^12 Petabyte PB 10^15 So large data that it becomes difficult to process it using the traditional system Exabyte EB 10^18 Zettabyte ZB 10^21 Yottabyte YB 10^24 Big? Big

4 Difficult to process by Traditional System Unable to send Unable to View 100 MB document Unable to Edit 100 GB document Depends on capability of system 100 TB document Big? Big

5 Organization/Context Specific 500 TB Text,Audio,Video data per day Big Date NOT a Big data Depends on capabilities of the organization Company A Company B Big? Big

6 Areas of Challenges Capture search Curation Sharing Storage Transfer Anlaysis Visualization Big? Big

7 Big Big Large & growing files At High speed In various Format V^3 comes at high speed result in large file This files comes in various formats VELOCITY VOLUME VARIETY Big? Big

8 Structured / Unstructured Challenge /Opportunity Mostly wasted Used in decision making Unstructured 90% Structured 10% To analyze & extract meaningful information Big? Big

9 Users Applications Systems Large & growing files ( files) Sensors Big? Big

10 Generation point Examples Mobile devices Machine Sensors Microphones cameras Readers/Scanners Social Media Science facilities Software/program Big? Big

11 Sample Events generating Every day, we create 2.5 Exabytes of data i.e 2.5 billion GB, so much that 90% of the data in the world today has been created in the last few years alone. CERN Atomic facility generates 40 TB data per second. Twitter generates 12 TB of data every day. Airbus A380 generates 10 TB every 30 minutes of flight. About 650TB generated in one flight. In 2009 total data in world was estimated to be 1 ZB. By 2020 estimated to be 35 ZB. (Source :IBM.com) Big? Big

12 Collect Analyze Understand Big? Big

13 Applications Companies gaining edge by collecting,analyzing and understanding information. Government forecasting events and taking proactive actions. Big? Big

14 Not able to handle Big data Created to handle big data Traditional Systems (e.g RDBMS,SQL) tool (e.g NoSQL) Time Big? Big

15 Traditional Enterprise Approach Only So much data could be processed Processing Limit Powerful Computer Big? Big

16 Modern s approach Computation Combined result Computation Computation Computation Big? Big

17 s s Hive Map Reduce HBase Mahout File System HDFS Pig Oozie Projects Source :hortonworks/hadoop/hdfs/.com/ Flume Scoop Big? Big

18 MASTER Task tracker Job Tracker DATA Application Node Name Node Slaves Task tracker Task tracker Task tracker Task tracker Node Node Node Node

19 MASTER can be taken directly Task tracker Job Tracker DATA Application Node Name Node Know where data residing Slaves Task tracker Task tracker Task tracker Task tracker Node Node Node Node

20 HDFS vs GFS Similarity with file system (GFS) MapReduce Back in 1990 search engine supported by: Excite Altavista Lycos Infoseek Big? Big

21 Victory 1995 Excite 2000 Altavista Lycos Big? Big

22 evolution from GFS paper released by released paper on MapReduce created by Doug & Cafarella at Yahoo! (Nutch search engine) Yahoo donated the project to Apache Source : & Nutch white papers Big? Big

23 is here!! Big? Big

24 scientists with just two years' experience can earn between $200,000 and $300,000 a year (wall street journel). Anyone with "data science" in his or her job title on a LinkedIn page is going to get "100 recruiter s a day,.(wall street journel). is a super hot up-and-coming "big data" technology. (Business insider.com). Many other data scientists, especially at data-driven companies such as, Amazon, Microsoft, Walmart, ebay, LinkedIn, and Twitter, have added to and looking for developing the tool kit. (Harvard business review). "People are slapping buzzwords as on résumés and looking to get 50 or 100 percent more, and they're getting it," said Scott Gnau, president of Teradata Lab. Big? Big

25 References Dean & Sanjay (2004)> MapReduce: Simplied Processing on Large Clusters.google.com Dogh Cutting Nutch(2005): A Flexible and Scalable Open-Source Web Search Engine.yahoo.com Sanjay & Howard (2003): The File System, google.com [Accessed date 27 th nov 2014] salary ?op=1[accessed date 27 th nov 2014] Big 's High-Priests of Algorithms, date 27 th nov 2014]

26 Thank you for your attention Q/A

Introduction to Predictive Analytics. Dr. Ronen Meiri ronen@dmway.com

Introduction to Predictive Analytics. Dr. Ronen Meiri ronen@dmway.com Introduction to Predictive Analytics Dr. Ronen Meiri Outline From big data to predictive analytics Predictive Analytics vs. BI Intelligent platforms What can we do with it. The modeling process. Example

More information

A Survey on Big Data Concepts and Tools

A Survey on Big Data Concepts and Tools A Survey on Big Data Concepts and Tools D. Rajasekar 1, C. Dhanamani 2, S. K. Sandhya 3 1,3 PG Scholar, 2 Assistant Professor, Department of Computer Science and Engineering, Sri Krishna College of Engineering

More information

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics

More information

Doing Multidisciplinary Research in Data Science

Doing Multidisciplinary Research in Data Science Doing Multidisciplinary Research in Data Science Assoc.Prof. Abzetdin ADAMOV CeDAWI - Center for Data Analytics and Web Insights Qafqaz University aadamov@qu.edu.az http://ce.qu.edu.az/~aadamov 16 May

More information

BIG DATA TRENDS AND TECHNOLOGIES

BIG DATA TRENDS AND TECHNOLOGIES BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.

More information

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012 Big Data Buzzwords From A to Z By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012 Big Data Buzzwords Big data is one of the, well, biggest trends in IT today, and it has spawned a whole new generation

More information

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics Dr. Liangxiu Han Future Networks and Distributed Systems Group (FUNDS) School of Computing, Mathematics and Digital Technology,

More information

Big Data Streams. Analytics Challenges, Analysis, and Applications. Adel M. Alimi

Big Data Streams. Analytics Challenges, Analysis, and Applications. Adel M. Alimi Big Data Streams 1 Analytics Challenges, Analysis, and Applications Adel M. Alimi REGIM-Lab., University of Sfax, Tunisia http://adel.alimi.regim.org adel.alimi@ieee.org 2 Evolution of Technology 3 Nano,

More information

Oracle Big Data for Dummies

Oracle Big Data for Dummies Oracle Big Data for Dummies Sai Janakiram Penumuru WW Product Expert Cloud Platforms Hewlett-Packard, India The Father of Microbiology first microbiologist Antonie Philips van Leeuwenhoek 2 Sai Janakiram

More information

Big Data Explained. An introduction to Big Data Science.

Big Data Explained. An introduction to Big Data Science. Big Data Explained An introduction to Big Data Science. 1 Presentation Agenda What is Big Data Why learn Big Data Who is it for How to start learning Big Data When to learn it Objective and Benefits of

More information

The Big Deal about Big Data. Mike Skinner, CPA CISA CITP HORNE LLP

The Big Deal about Big Data. Mike Skinner, CPA CISA CITP HORNE LLP The Big Deal about Big Data Mike Skinner, CPA CISA CITP HORNE LLP Mike Skinner, CPA CISA CITP Senior Manager, IT Assurance & Risk Services HORNE LLP Focus areas: IT security & risk assessment IT governance,

More information

Big Data Big Data/Data Analytics & Software Development

Big Data Big Data/Data Analytics & Software Development Big Data Big Data/Data Analytics & Software Development Danairat T. danairat@gmail.com, 081-559-1446 1 Agenda Big Data Overview Business Cases and Benefits Hadoop Technology Architecture Big Data Development

More information

A Brief Outline on Bigdata Hadoop

A Brief Outline on Bigdata Hadoop A Brief Outline on Bigdata Hadoop Twinkle Gupta 1, Shruti Dixit 2 RGPV, Department of Computer Science and Engineering, Acropolis Institute of Technology and Research, Indore, India Abstract- Bigdata is

More information

Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software. 22 nd October 2013 10:00 Sesión B - DB2 LUW

Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software. 22 nd October 2013 10:00 Sesión B - DB2 LUW Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software 22 nd October 2013 10:00 Sesión B - DB2 LUW 1 Agenda Big Data The Technical Challenges Architecture of Hadoop

More information

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW AGENDA What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story Hadoop PDW Our BIG DATA Roadmap BIG DATA? Volume 59% growth in annual WW information 1.2M Zetabytes (10 21 bytes) this

More information

Large scale processing using Hadoop. Ján Vaňo

Large scale processing using Hadoop. Ján Vaňo Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine

More information

Application Development. A Paradigm Shift

Application Development. A Paradigm Shift Application Development for the Cloud: A Paradigm Shift Ramesh Rangachar Intelsat t 2012 by Intelsat. t Published by The Aerospace Corporation with permission. New 2007 Template - 1 Motivation for the

More information

Introduction to the Mathematics of Big Data. Philippe B. Laval

Introduction to the Mathematics of Big Data. Philippe B. Laval Introduction to the Mathematics of Big Data Philippe B. Laval Fall 2015 Introduction In recent years, Big Data has become more than just a buzz word. Every major field of science, engineering, business,

More information

BIG DATA CHALLENGES AND PERSPECTIVES

BIG DATA CHALLENGES AND PERSPECTIVES BIG DATA CHALLENGES AND PERSPECTIVES Meenakshi Sharma 1, Keshav Kishore 2 1 Student of Master of Technology, 2 Head of Department, Department of Computer Science and Engineering, A P Goyal Shimla University,

More information

Hadoop implementation of MapReduce computational model. Ján Vaňo

Hadoop implementation of MapReduce computational model. Ján Vaňo Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed

More information

Oracle Big Data for Dummies

Oracle Big Data for Dummies Oracle Big Data for Dummies Sai Janakiram Penumuru WW Product Expert Cloud Platforms The Father of Microbiology First Microbiologist Antonie Philips van Leeuwenhoek 2 Sai Janakiram Penumuru o o o o o o

More information

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to

More information

Taming the Beast of Big Data

Taming the Beast of Big Data Taming the Beast of Big Data Jeff Zakrzewski Vice President Sogeti USA Local Touch, Global Reach 1 Agenda What is Big Data? Some Sources of Big Data Approaches to Big Data The Hadoop Buzz Vertical Perspective

More information

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014 Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014 Defining Big Not Just Massive Data Big data refers to data sets whose size is beyond the ability of typical database software tools

More information

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14 Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 14 Big Data Management IV: Big-data Infrastructures (Background, IO, From NFS to HFDS) Chapter 14-15: Abideboul

More information

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate

More information

Data-Intensive Computing with Map-Reduce and Hadoop

Data-Intensive Computing with Map-Reduce and Hadoop Data-Intensive Computing with Map-Reduce and Hadoop Shamil Humbetov Department of Computer Engineering Qafqaz University Baku, Azerbaijan humbetov@gmail.com Abstract Every day, we create 2.5 quintillion

More information

Hadoop Big Data for Processing Data and Performing Workload

Hadoop Big Data for Processing Data and Performing Workload Hadoop Big Data for Processing Data and Performing Workload Girish T B 1, Shadik Mohammed Ghouse 2, Dr. B. R. Prasad Babu 3 1 M Tech Student, 2 Assosiate professor, 3 Professor & Head (PG), of Computer

More information

DIGITAL MARKETING STRATEGIES Leveraging The Back-End Tools

DIGITAL MARKETING STRATEGIES Leveraging The Back-End Tools DIGITAL MARKETING STRATEGIES Leveraging The Back-End Tools Professional Background RACING INDUSTRY EXPERIENCE: First Job Out of Undergrad: - Arlington Park, Assistant to the VP of Marketing - Sponsorship

More information

Distributed Computing and Hadoop in Statistics

Distributed Computing and Hadoop in Statistics Distributed Computing and Hadoop in Statistics Xiaoling Lu and Bing Zheng Center For Applied Statistics, Renmin University of China, Beijing, China Corresponding author: Xiaoling Lu, e-mail: xiaolinglu@ruc.edu.cn

More information

So Just What Is Big Data? James E. Tcheng, MD, FACC, FSCAI

So Just What Is Big Data? James E. Tcheng, MD, FACC, FSCAI So Just What Is Big Data? James E. Tcheng, MD, FACC, FSCAI Disclosures James E. Tcheng, MD, FACC, FSCAI Affiliations / Financial Relationships / Other RWI ACC Chair, Informatics and Health IT Task Force

More information

BIG DATA TECHNOLOGY. Hadoop Ecosystem

BIG DATA TECHNOLOGY. Hadoop Ecosystem BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big

More information

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy Presented by: Jeffrey Zhang and Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop?

More information

LARGE, DISTRIBUTED COMPUTING INFRASTRUCTURES OPPORTUNITIES & CHALLENGES. Dominique A. Heger Ph.D. DHTechnologies, Data Nubes Austin, TX, USA

LARGE, DISTRIBUTED COMPUTING INFRASTRUCTURES OPPORTUNITIES & CHALLENGES. Dominique A. Heger Ph.D. DHTechnologies, Data Nubes Austin, TX, USA LARGE, DISTRIBUTED COMPUTING INFRASTRUCTURES OPPORTUNITIES & CHALLENGES Dominique A. Heger Ph.D. DHTechnologies, Data Nubes Austin, TX, USA Performance & Capacity Studies Availability & Reliability Studies

More information

Entering the Zettabyte Age Jeffrey Krone

Entering the Zettabyte Age Jeffrey Krone Entering the Zettabyte Age Jeffrey Krone 1 Kilobyte 1,000 bits/byte. 1 megabyte 1,000,000 1 gigabyte 1,000,000,000 1 terabyte 1,000,000,000,000 1 petabyte 1,000,000,000,000,000 1 exabyte 1,000,000,000,000,000,000

More information

HDP Enabling the Modern Data Architecture

HDP Enabling the Modern Data Architecture HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,

More information

Architectures for massive data management

Architectures for massive data management Architectures for massive data management Apache Kafka, Samza, Storm Albert Bifet albert.bifet@telecom-paristech.fr October 20, 2015 Stream Engine Motivation Digital Universe EMC Digital Universe with

More information

BIG DATA What it is and how to use?

BIG DATA What it is and how to use? BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14

More information

Testing 3Vs (Volume, Variety and Velocity) of Big Data

Testing 3Vs (Volume, Variety and Velocity) of Big Data Testing 3Vs (Volume, Variety and Velocity) of Big Data 1 A lot happens in the Digital World in 60 seconds 2 What is Big Data Big Data refers to data sets whose size is beyond the ability of commonly used

More information

Big Data: Opportunities for the Dental Benefits Industry

Big Data: Opportunities for the Dental Benefits Industry Big Data: Opportunities for the Dental Benefits Industry Joel Reichert - VP, Data Strategy Herschel Reich - VP, Payer Consulting September 16, 2014 Big Data: Opportunities for the Dental Benefits Industry

More information

Big Data Drupal. Commercial Open Source Big Data Tool Chain

Big Data Drupal. Commercial Open Source Big Data Tool Chain Big Data Drupal Commercial Open Source Big Data Tool Chain How did I prepare? MapReduce Field Work About Me Nicholas Roberts 10+ years web Webmaster, Project & Product Manager Australian Sonoma County

More information

Majed Al-Ghandour, PhD, PE, CPM Division of Planning and Programming NCDOT 2016 NCAMPO Conference- Greensboro, NC May 12, 2016

Majed Al-Ghandour, PhD, PE, CPM Division of Planning and Programming NCDOT 2016 NCAMPO Conference- Greensboro, NC May 12, 2016 Big Data! Majed Al-Ghandour, PhD, PE, CPM Division of Planning and Programming NCDOT 2016 NCAMPO Conference- Greensboro, NC May 12, 2016 Big Data: Data Analytical Tools for Decision Support 2 Outline Introduce

More information

Hadoop IST 734 SS CHUNG

Hadoop IST 734 SS CHUNG Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to

More information

Changing the face of Business Intelligence & Information Management

Changing the face of Business Intelligence & Information Management 1300 530 335 info@c3businessolutions.com www.c3businesssolutions.com GPO Box 589 Melbourne VIC 3001 Australia ABN 35 122 885 465 White Paper Big Data Changing the face of Business Intelligence & Information

More information

HP Vertica at MIT Sloan Sports Analytics Conference March 1, 2013 Will Cairns, Senior Data Scientist, HP Vertica

HP Vertica at MIT Sloan Sports Analytics Conference March 1, 2013 Will Cairns, Senior Data Scientist, HP Vertica HP Vertica at MIT Sloan Sports Analytics Conference March 1, 2013 Will Cairns, Senior Data Scientist, HP Vertica So What s the market s definition of Big Data? Datasets whose volume, velocity, variety

More information

Big Data System and Architecture

Big Data System and Architecture CHANGE, a 2012 DAC workshop 2nd International Workshop on Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments Moscone Center, San Francisco, California, June 3, 2012 Big Data System and

More information

How To Scale Out Of A Nosql Database

How To Scale Out Of A Nosql Database Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI

More information

Large-Scale Data Processing

Large-Scale Data Processing Large-Scale Data Processing Eiko Yoneki eiko.yoneki@cl.cam.ac.uk http://www.cl.cam.ac.uk/~ey204 Systems Research Group University of Cambridge Computer Laboratory 2010s: Big Data Why Big Data now? Increase

More information

Copyright (c) 2012, Meta Business Systems. Mario Bojilov Meta Business Systems 20 February 2013

Copyright (c) 2012, Meta Business Systems. Mario Bojilov Meta Business Systems 20 February 2013 Mario Bojilov Meta Business Systems 20 February 2013 What is Big Data Volume 90% of data in the world was created in the last 2 years What is Big Data Volume 90% of data in the world was created in the

More information

Hadoop Introduction. 2012 coreservlets.com and Dima May. 2012 coreservlets.com and Dima May

Hadoop Introduction. 2012 coreservlets.com and Dima May. 2012 coreservlets.com and Dima May 2012 coreservlets.com and Dima May Hadoop Introduction Originals of slides and source code for examples: http://www.coreservlets.com/hadoop-tutorial/ Also see the customized Hadoop training courses (onsite

More information

BIG DATA: ARE YOU READY? Andy Kyiet Demand Flow Intelligence May, 2013

BIG DATA: ARE YOU READY? Andy Kyiet Demand Flow Intelligence May, 2013 BIG DATA: ARE YOU READY? Andy Kyiet Demand Flow Intelligence May, 2013 PERSONAL BACKGROUND Founder of the first specialist Service Management & Helpdesk System provider in Europe Past President of AFSMI

More information

HDP Hadoop From concept to deployment.

HDP Hadoop From concept to deployment. HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some

More information

Tap into Hadoop and Other No SQL Sources

Tap into Hadoop and Other No SQL Sources Tap into Hadoop and Other No SQL Sources Presented by: Trishla Maru What is Big Data really? The Three Vs of Big Data According to Gartner Volume Volume Orders of magnitude bigger than conventional data

More information

Big Data a threat or a chance?

Big Data a threat or a chance? Big Data a threat or a chance? Helwig Hauser University of Bergen, Dept. of Informatics Big Data What is Big Data? well, lots of data, right? we come back to this in a moment. certainly, a buzz-word but

More information

Hadoop: Distributed Data Processing. Amr Awadallah Founder/CTO, Cloudera, Inc. ACM Data Mining SIG Thursday, January 25 th, 2010

Hadoop: Distributed Data Processing. Amr Awadallah Founder/CTO, Cloudera, Inc. ACM Data Mining SIG Thursday, January 25 th, 2010 Hadoop: Distributed Data Processing Amr Awadallah Founder/CTO, Cloudera, Inc. ACM Data Mining SIG Thursday, January 25 th, 2010 Outline Scaling for Large Data Processing What is Hadoop? HDFS and MapReduce

More information

Big Data and Apache Hadoop s MapReduce

Big Data and Apache Hadoop s MapReduce Big Data and Apache Hadoop s MapReduce Michael Hahsler Computer Science and Engineering Southern Methodist University January 23, 2012 Michael Hahsler (SMU/CSE) Hadoop/MapReduce January 23, 2012 1 / 23

More information

Transforming the Telecoms Business using Big Data and Analytics

Transforming the Telecoms Business using Big Data and Analytics Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe

More information

Big Data and Hadoop. Sreedhar C, Dr. D. Kavitha, K. Asha Rani

Big Data and Hadoop. Sreedhar C, Dr. D. Kavitha, K. Asha Rani Big Data and Hadoop Sreedhar C, Dr. D. Kavitha, K. Asha Rani Abstract Big data has become a buzzword in the recent years. Big data is used to describe a massive volume of both structured and unstructured

More information

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools

More information

Big Data: Tools and Technologies in Big Data

Big Data: Tools and Technologies in Big Data Big Data: Tools and Technologies in Big Data Jaskaran Singh Student Lovely Professional University, Punjab Varun Singla Assistant Professor Lovely Professional University, Punjab ABSTRACT Big data can

More information

Big Data. Lyle Ungar, University of Pennsylvania

Big Data. Lyle Ungar, University of Pennsylvania Big Data Big data will become a key basis of competition, underpinning new waves of productivity growth, innovation, and consumer surplus. McKinsey Data Scientist: The Sexiest Job of the 21st Century -

More information

Hadoop Ecosystem B Y R A H I M A.

Hadoop Ecosystem B Y R A H I M A. Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open

More information

Big Data Zurich, November 23. September 2011

Big Data Zurich, November 23. September 2011 Institute of Technology Management Big Data Projektskizze «Competence Center Automotive Intelligence» Zurich, November 11th 23. September 2011 Felix Wortmann Assistant Professor Technology Management,

More information

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld Tapping into Hadoop and NoSQL Data Sources in MicroStrategy Presented by: Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop? Customer Case

More information

SCALABLE FILE SHARING AND DATA MANAGEMENT FOR INTERNET OF THINGS

SCALABLE FILE SHARING AND DATA MANAGEMENT FOR INTERNET OF THINGS Sean Lee Solution Architect, SDI, IBM Systems SCALABLE FILE SHARING AND DATA MANAGEMENT FOR INTERNET OF THINGS Agenda Converging Technology Forces New Generation Applications Data Management Challenges

More information

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give

More information

HOW TO LIVE WITH THE ELEPHANT IN THE SERVER ROOM APACHE HADOOP WORKSHOP

HOW TO LIVE WITH THE ELEPHANT IN THE SERVER ROOM APACHE HADOOP WORKSHOP HOW TO LIVE WITH THE ELEPHANT IN THE SERVER ROOM APACHE HADOOP WORKSHOP AGENDA Introduction What is Hadoop and the rationale behind it Hadoop Distributed File System (HDFS) and MapReduce Common Hadoop

More information

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics Overview Big Data in Apache Hadoop - HDFS - MapReduce in Hadoop - YARN https://hadoop.apache.org 138 Apache Hadoop - Historical Background - 2003: Google publishes its cluster architecture & DFS (GFS)

More information

Big Data Technologies

Big Data Technologies Big Data Technologies Hadoop and its Ecosystem Hala El-Ali EMIS/CSE 8331 SMU helali@smu.edu Agenda Introduction Hadoop Core Demo Hadoop Ecosystem Demo QA Big Data Big data is the term for a collection

More information

Big Data: Study in Structured and Unstructured Data

Big Data: Study in Structured and Unstructured Data Big Data: Study in Structured and Unstructured Data Motashim Rasool 1, Wasim Khan 2 mail2motashim@gmail.com, khanwasim051@gmail.com Abstract With the overlay of digital world, Information is available

More information

Cloud beyond the obvious, an approach for innovation

Cloud beyond the obvious, an approach for innovation Cloud beyond the obvious, an approach for innovation Christian Verstraete Chief Technologist Cloud Strategy Our World is Changing Living in the age of tectonic shifts, and welcome to the new style of IT

More information

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing

More information

MySQL and Hadoop. Percona Live 2014 Chris Schneider

MySQL and Hadoop. Percona Live 2014 Chris Schneider MySQL and Hadoop Percona Live 2014 Chris Schneider About Me Chris Schneider, Database Architect @ Groupon Spent the last 10 years building MySQL architecture for multiple companies Worked with Hadoop for

More information

Real Time Big Data Processing

Real Time Big Data Processing Real Time Big Data Processing Cloud Expo 2014 Ian Meyers Amazon Web Services Global Infrastructure Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure

More information

Workshop on Hadoop with Big Data

Workshop on Hadoop with Big Data Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly

More information

CSE-E5430 Scalable Cloud Computing Lecture 2

CSE-E5430 Scalable Cloud Computing Lecture 2 CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing

More information

Big Data Realities Hadoop in the Enterprise Architecture

Big Data Realities Hadoop in the Enterprise Architecture Big Data Realities Hadoop in the Enterprise Architecture Paul Phillips Director, EMEA, Hortonworks pphillips@hortonworks.com +44 (0)777 444 3857 Hortonworks Inc. 2012 Page 1 Agenda The Growth of Enterprise

More information

UNDERSTANDING THE BIG DATA PROBLEMS AND THEIR SOLUTIONS USING HADOOP AND MAP-REDUCE

UNDERSTANDING THE BIG DATA PROBLEMS AND THEIR SOLUTIONS USING HADOOP AND MAP-REDUCE UNDERSTANDING THE BIG DATA PROBLEMS AND THEIR SOLUTIONS USING HADOOP AND MAP-REDUCE Mr. Swapnil A. Kale 1, Prof. Sangram S.Dandge 2 1 ME (CSE), First Year, Department of CSE, Prof. Ram Meghe Institute

More information

The little elephant driving Big Data

The little elephant driving Big Data The little elephant driving Big Data Despite the funny-sounding name, Hadoop is a serious enterprise software suite that drives Big Data Hadoop enables the storage and processing of very large databases

More information

Microsoft SQL Server 2012 with Hadoop

Microsoft SQL Server 2012 with Hadoop Microsoft SQL Server 2012 with Hadoop Debarchan Sarkar Chapter No. 1 "Introduction to Big Data and Hadoop" In this package, you will find: A Biography of the author of the book A preview chapter from the

More information

Chapter 7. Using Hadoop Cluster and MapReduce

Chapter 7. Using Hadoop Cluster and MapReduce Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

More information

Community Driven Apache Hadoop. Apache Hadoop Basics. May 2013. 2013 Hortonworks Inc. http://www.hortonworks.com

Community Driven Apache Hadoop. Apache Hadoop Basics. May 2013. 2013 Hortonworks Inc. http://www.hortonworks.com Community Driven Apache Hadoop Apache Hadoop Basics May 2013 2013 Hortonworks Inc. http://www.hortonworks.com Big Data A big shift is occurring. Today, the enterprise collects more data than ever before,

More information

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect on AWS Services Overview Bernie Nallamotu Principle Solutions Architect \ So what is it? When your data sets become so large that you have to start innovating around how to collect, store, organize, analyze

More information

Applications for Business Intelligence, Predictive Analytics and Big Data

Applications for Business Intelligence, Predictive Analytics and Big Data Finance, Management, & Operations Applications for Business Intelligence, Predictive Analytics and Big Data Patrick Bogan, Chief Information Officer, Fuzion Analytics Kyle Korzenowski, Chief Information

More information

Linux Clusters Ins.tute: Turning HPC cluster into a Big Data Cluster. A Partnership for an Advanced Compu@ng Environment (PACE) OIT/ART, Georgia Tech

Linux Clusters Ins.tute: Turning HPC cluster into a Big Data Cluster. A Partnership for an Advanced Compu@ng Environment (PACE) OIT/ART, Georgia Tech Linux Clusters Ins.tute: Turning HPC cluster into a Big Data Cluster Fang (Cherry) Liu, PhD fang.liu@oit.gatech.edu A Partnership for an Advanced Compu@ng Environment (PACE) OIT/ART, Georgia Tech Targets

More information

Journal of Environmental Science, Computer Science and Engineering & Technology

Journal of Environmental Science, Computer Science and Engineering & Technology JECET; March 2015-May 2015; Sec. B; Vol.4.No.2, 202-209. E-ISSN: 2278 179X Journal of Environmental Science, Computer Science and Engineering & Technology An International Peer Review E-3 Journal of Sciences

More information

What happens when Big Data and Master Data come together?

What happens when Big Data and Master Data come together? What happens when Big Data and Master Data come together? Jeremy Pritchard Master Data Management fgdd 1 What is Master Data? Master data is data that is shared by multiple computer systems. The Information

More information

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014 Forecast of Big Data Trends Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014 Big Data transforms Business 2 Data created every minute Source http://mashable.com/2012/06/22/data-created-every-minute/

More information

Wednesday, October 6, 2010

Wednesday, October 6, 2010 Evolving a New Analytical Platform What Works and What s Missing Jeff Hammerbacher Chief Scientist, Cloudera October 10, 2010 My Background Thanks for Asking hammer@cloudera.com Studied Mathematics at

More information

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 15

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 15 Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 15 Big Data Management V (Big-data Analytics / Map-Reduce) Chapter 16 and 19: Abideboul et. Al. Demetris

More information

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social

More information

So What s the Big Deal?

So What s the Big Deal? So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data

More information

Data Analyst Program- 0 to 100

Data Analyst Program- 0 to 100 Development Data Analyst Program- 0 to 100 Master the Data Analysis tools like Pig and hive Data Science Build a recommendation engine 1 Data Analyst Program- 0 to 100 HADOOP SCHOOL OF TRAINING Basics

More information

We are building the next generation of Big Data and Analytics solutions!

We are building the next generation of Big Data and Analytics solutions! We are building the next generation of Big Data and Analytics solutions! Background 26 years Experience IT Industry 12 Years Solutions Architect - International Profile Passionate about Technology Genuine

More information

DISCOVERING ediscovery

DISCOVERING ediscovery DISCOVERING ediscovery Purpose This paper is the first in a series that are designed to educate organisations and increase awareness in the area of ediscovery technology. What is ediscovery? Electronic

More information

Beginner s Guide to. BigDataAnalytics

Beginner s Guide to. BigDataAnalytics Beginner s Guide to BigDataAnalytics Introduction Big Data, What do these two words really mean? Yes everyone is talking about it but frankly, not many really understand what the hype is all about. This

More information

THE AGE OF BIG DATA. Chula DataScience

THE AGE OF BIG DATA. Chula DataScience THE AGE OF BIG DATA Asst. Prof. Natawut Nupairoj, Ph.D. Mobile Application and System Services Research Group Department of Computing Engineering Chulalongkorn University natawut.n@chula.ac.th Data is

More information

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Created by Doug Cutting and Mike Carafella in 2005. Cutting named the program after

More information

CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing. University of Florida, CISE Department Prof.

CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing. University of Florida, CISE Department Prof. CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing University of Florida, CISE Department Prof. Daisy Zhe Wang Data Science Overview Why, What, How, Who Outline Why Data Science?

More information