Deciphering Big Data Analytics - A Review Of Technology & Applications

Size: px
Start display at page:

Download "Deciphering Big Data Analytics - A Review Of Technology & Applications"

Transcription

1 Deciphering Big Data Analytics - A Review Of Technology & Applications Ramesh Mahadik, M.Tech.(Computer Sc. & Engg.), I.I.T, Mumbai Director MCA Institute Of Management And Computer Studies - MCA Institute Mumbai, Maharashtra, India Ramesh.Imcost@Yahoo.Com; Rameshgm.Iitb@Gmail.Com ABSTRACT: There Is A Great Excitement Surrounding Big Data Analytics, In The World Today. Organizations Now Understand The Importance & Significance Of Data-Driven Decision Making, Because Of Which There Is A Growing Enthusiasm To The Idea Of Big Data. Data Becomes Big Data When Its Volume, Velocity, Or Variety Exceeds The Abilities Of Conventional IT Systems To Gulp, Store, Analyze, And Process It. Several Organizations Have The IT Systems And Expertise To Handle Large Quantities Of Structured Data, But With The Increasing Volume And Faster Flows Of Data, They Lack The Ability To Mine It And Derive Actionable Intelligence In A Timely Way. Big Data Analytics Addresses This Need For Evolved Data Processing & Analytics, Which Can Handle The Fast Growing, High Volume, Multiple Typed Data (Structured, Semi-Structured & Unstructured), Generated At High Speed. This Paper Explores The Technology Framework & Application Areas Of Big Data Analytics. Keywords: Big Data, Big Data Analytics, Hadoop, Mapreduce, Decision Support Systems, Data Mining, Business Intelligence 1. Introduction We Are Flooded With Data Today. A Wide Spectrum Of Application Areas, Collect Data At A Humungous Scale. Decisions That Previously Were Based On Guesswork, Or Crudely Constructed Models, Are Now Based On The Data Itself. Big Data Analytics Now Drives Nearly Every Aspect Of Our Modern Society, Including Retail, Manufacturing, Financial Services, Mobile Services, Social Media And Healthcare, To Name A Few. Apparently, Big Data Means Business Opportunities, But At The Same Time It Also Poses Major Research Challenges. According To Mckinsey & Co., Big Data Is The Next Frontier For Innovation, Competition And Productivity. The Impact Of Big Data Gives Not Only A Huge Potential For Competition And Growth For Individual Companies, But The Right Use Of Big Data Also Can Increase Productivity, Innovation, And Competitiveness For Entire Sectors And Economies. 2. Research Methodology An Exhaustive Study Of Various Texts, Research Articles And Materials Pertaining To Big Data Analytics Was Carried Out, With The Aim Of Understanding Its Technology Framework And Application Areas. INCON X

2 3. What Is Big Data? Big Data Relates To Rapidly Growing, Structured & Unstructured Datasets With Sizes Beyond The Ability Of Conventional Database Tools To Store, Manage And Analyze Them. It Is Characterized Primarily By The 3Vs: Volume, Variety & Velocity. The 4 th Characteristic Considered Is Veracity. Speed, Accuracy & Complexity of Intelligence Small Data Sets Advanced Analytics Small Data Sets Traditional Analytics Big Data Big Data Analytics Big Data Traditional Analytics GB TB PB EB ZB (10 9 ) (10 12 ) (10 15 ) (10 18 ) (10 21 ) Size of Data Volume Velocity Large quantity of data which may be enterprise-specific, general or public Diverse set of data, created by social networking feeds, video audio, , sensor data, etc Speed of data inflow as well as rate at which this fast moving data needs to be stored. Variety Figure 1: What Is Big Data? Veracity: Unlike Carefully Governed Internal Data, Most Big Data Comes From Sources Outside Our Control And Therefore Suffers From Significant Correctness Or Accuracy Problems. Veracity Represents Both The Credibility Of The Data Source As Well As The Suitability Of The Data For The Target Audience. 4. Understanding Big Data Analytics Fundamentally, Big Data Analytics Is The Process Of Examining Large Data Sets Containing A Variety Of Data Types To Uncover Hidden Patterns, Unknown Correlations, Market Trends, Customer Preferences And Other Useful Business Information. Lets Try To Understand What Big Data Analytics Is & What It Isn t. 4.1 What Big Data Analytics Is: i) A Technology-Enabled Strategy For Gaining Richer, Deeper Insights Into Customers, Partners, And The Business And Ultimately Gaining Competitive Advantage ii) Working With Data Sets Whose Size And Variety Is Beyond The Ability Of Conventional Database Software To Capture, Store, Manage, And Analyze. iii) Processing A Steady Stream Of Real-Time Data In Order To Make Time-Sensitive Decisions Faster Than Ever Before. iv) Distributed In Nature Analytics Processing Goes To Where The Data Is For Greater Speed And Efficiency. INCON X

3 v) A New Paradigm In Which IT Collaborates With Business Users And Data Scientists To Identify And Implement Analytics That Will Increase Operational Efficiency And Solve New Business Problems. 4.2 What Big Data Analytics Isn t: i) Just About Technology. At The Business Level, It s About How To Exploit The Vastly Enhanced Sources Of Data To Gain Insight. ii) Only About Volume. It s Also About Variety And Velocity. But Perhaps Most Important It s About Value Derived From The Data. iii) Generated Or Used Only By Huge Online Companies Like Google Or Amazon Anymore. While Internet Companies May Have Pioneered The Use Of Big Data At Web Scale, Applications Touch Every Industry. iv) About None-Size-Fits-All Traditional Relational Databases Built On Shared Disk And Memory Architecture. Big Data Analytics Uses A Grid Of Computing Resources For Massively Parallel Processing (MPP). v) Meant To Replace Relational Databases Or The Data Warehouse. Structured Data Continues To Be Critically Important To Companies. However, Traditional Systems May Not Be Suitable For The New Sources And Contexts Of Big Data.,, 5. Architecture For Big Data In The World Of Big Data, Data Volumes We Need To Work With On A Day-To- Day Basis Have Outgrown The Storage And Processing Capabilities Of A Single Host. Big Data Brings With It Two Fundamental Challenges: (I) How To Store And Process Voluminous Data Sizes, And More Important, (Ii) How To Understand Data And Turn It Into A Competitive Advantage. Apache Hadoop Fills The Gap In The Market By Effectively Storing And Providing Computational Capabilities Over Substantial Amounts Of Data. It s A Distributed System Made Up Of A Distributed Filesystem And It Offers A Way To Parallelize And Execute Programs On A Cluster Of Machines. Its Already Been Adopted By Technology Giants Like Yahoo!, Facebook, And Twitter To Address Their Big Data Needs. From An Architectural Perspective, Hadoop, As Shown In Figure 2, Is A Distributed Master-Slave Architecture That Consists Of The Hadoop Distributed File System (HDFS) For Storage And Map-Reduce For Computational Capabilities. INCON X

4 Figure 2: High Level Hadoop Architecture 5.1 Core Hadoop Components To Understand Hadoop s Architecture We ll Start By Looking At The Basics Of HDFS & Mapreduce HDFS (Hadoop Distributed File System) HDFS Is The Storage Component Of Hadoop. It s A Distributed Filesystem That s Modeled After The Google File System (GFS) Paper. HDFS Is Optimized For High Throughput And Works Best When Reading And Writing Large Files (Gigabytes And Larger). To Support This Throughput HDFS Leverages Unusually Large (For A Filesystem) Block Sizes And Data Locality Optimizations To Reduce Network Input/Output (I/O). Scalability And Availability Are Also Key Traits Of HDFS, Achieved In Part Due To Data Replication And Fault Tolerance. HDFS Replicates Files For A Configured Number Of Times, Is Tolerant Of Both Software And Hardware Failure, And Automatically Re- Replicates Data Blocks On Nodes That Have Failed. Figure 3 Shows A Logical Representation Of The Components In HDFS: Figure 3: HDFS Architecture Mapreduce Mapreduce Is A Batch-Based, Distributed Computing Framework Modeled After Google s Paper On Mapreduce. It Allows You To Parallelize Work Over A Large Amount Of Raw Data, Such As Combining Web Logs With Relational Data From An OLTP Database To Model How Users Interact With Your Website. This Type Of Work, Which Could Take Days Or Longer Using Conventional Serial Programming Techniques, Can Be Reduced Down To Minutes Using Mapreduce On A Hadoop Cluster. It s A Nice Way To Partition Tasks Across Lots Of Machines And Can Handle Machine Failure. It Works INCON X

5 Across Different Application Types, Like Search And Ads. It Allows Pre-Computation Of Useful Data, Find Word Counts, Sort Tbs Of Data, Etc. Mapreduce Decomposes Work Submitted By A Client Into Small Parallelized Map And Reduce Workers. The Role Of The Programmer Is To Define Map And Reduce Functions, Where The Map Function Outputs Key/Value Tuples, Which Are Processed By Reduce Functions To Produce The Final Output. The Power Of Mapreduce Occurs In Between The Map Output And The Reduce Input, In The Shuffle And Sort Phases. Hadoop s Mapreduce Architecture Is Similar To The Master-Slave Model In HDFS. The Main Components Of Mapreduce Are Illustrated In Its Logical Architecture, As Shown In Figure The Hadoop Ecosystem Figure 4: Mapreduce Logical Architecture INCON X

6 Figure 5: The Hadoop Ecosystem Following Are The Components Of The Hadoop Ecosystem: HDFS: A Distributed, Fault Tolerant File System Mapreduce: A Framework For Writing/Executing Distributed, Fault Tolerant Algorithms Avro tm : A Data Serialization System. Hive tm : A Data Warehouse Infrastructure That Provides Data Summarization And Ad Hoc Querying. Pig tm : A High-Level Data-Flow Language And Execution Framework For Parallel Computation. Sqoop tm : A Package For Moving Data Between HDFS And Relational DB Systems H Base tm : A Scalable, Distributed Database That Supports Structured Data Storage For Large Tables. Zookeeper tm : A High-Performance Coordination Service For Distributed Applications. 6. An Architecture For Big Data Analytics The Big Data Analytics Architecture Described Below Utilizes The Massively Parallel, Distributed Storage And Processing Framework As Provided By Hadoop HDFS And Mapreduce. Figure 6: An Architecture For Big Data Analytics Structured Data Are Captured Through Various Data Sources Including OLTP Systems, Legacy Systems And External Systems. It Goes Through The ETL Process INCON X

7 From The Source Systems To The Target Data Warehouse. Traditional Business Intelligence (BI) Batched Analytical Processing Tools Such As Online Analytical Processing (OLAP), Data Mining, And Query And Reporting, Can Be Used To Create The Business Intelligence To Enhance Business Operations And Decision Processes. Unstructured And Semi-Structured Big Data Sources Can Be Of A Wide Variety That Includes Data From Social Media, Mobile Device, Sensors, Documents And Reports, Web Logs, Call Records, Scientific Research, Satellites, And Geospatial Devices. They Are Loaded Into The Hadoop Distributed File System Cluster. Hadoop Mapreduce Provides The Fault-Tolerant Distributed Processing Framework Across The Hadoop Cluster, Where Batched Analytics Can Be Performed. Actionable Insight Resulting From Hadoop Mapreduce Analytics And Business Intelligence Analytics Can Be Consumed By Operational And Analytical Applications. Geospatial Intelligence Is Described As Using Data About Space And Time To Improve The Quality Of Predictive Analysis. For Example, Real-Time Recommendations Of Places Of Interest Can Be Based On The Real-Time Location From Smartphone Usage Location. This Real-Time Information Can Be Combined With Batched Analytics To Improve The Quality Of The Predictions. Real- Time Nosql Databases Such As Hbase Can Be Used In Conjunction With Hadoop To Provide Real-Time Read/Write Of Hadoop Data. Real-Time Insight Created By Real- Time Analytics Can Be Consumed By Real-Time Operations And Decision Processes. 7. Applications Of Big Data Analytics Eventually, Every Aspect Of Our Lives Will Be Affected By Big Data Analytics. Following Are Some Areas Where Big Data Analytics Is Already Making A Real Difference Today, With Widespread Use, As Well As The Highest Benefits: 7.1 Understanding And Targeting Customers This Is One Of The Biggest And Most Publicized Areas Of Big Data Use Today. Here, Big Data Is Used To Better Understand Customers And Their Behaviors And Preferences. Companies Are Keen To Expand Their Traditional Data Sets With Social Media Data, Browser Logs As Well As Text Analytics And Sensor Data To Get A More Complete Picture Of Their Customers. The Big Objective, In Many Cases, Is To Create Predictive Models. Some Examples: Telecom Companies Using Big Data, Can Now Better Predict Customer Churn; Wal-Mart Can Predict What Products Will Sell, And Car Insurance Companies Understand How Well Their Customers Actually Drive. 7.2 Understanding And Optimizing Business Processes Big Data Is Also Increasingly Used To Optimize Business Processes. Retailers Are Able To Optimize Their Stock Based On Predictions Generated From Social Media Data, Web Search Trends And Weather Forecasts. One Particular Business Process That Is Seeing A Lot Of Big Data Analytics Is Supply Chain Or Delivery Route Optimization. Here, Geographic Positioning And Radio Frequency Identification Sensors Are Used To Track Goods Or Delivery Vehicles And Optimize Routes By Integrating Live Traffic Data, Etc. 7.3 Personal Quantification And Performance Optimization INCON X

8 Big Data Is Not Just For Companies And Governments But Also For All Of Us Individually. We Can Now Benefit From The Data Generated From Wearable Devices Such As Smart Watches Or Smart Bracelets. Take The Up Band From Jawbone As An Example: The Armband Collects Data On Our Calorie Consumption, Activity Levels, And Our Sleep Patterns. While It Gives Individuals Rich Insights, The Real Value Is In Analyzing The Collective Data. In Jawbone s Case, The Company Now Collects 60 Years Worth Of Sleep Data Every Night. Analyzing Such Volumes Of Data Will Bring Entirely New Insights That It Can Feed Back To Individual Users. 7.4 Improving Healthcare And Public Health The Computing Power Of Big Data Analytics Enables Us To Decode Entire DNA Strings In Minutes And Will Allow Us To Find New Cures And Better Understand And Predict Disease Patterns. Big Data Techniques Are Already Being Used To Monitor Babies In A Specialist Premature And Sick Baby Unit. By Recording And Analyzing Every Heart Beat And Breathing Pattern Of Every Baby, The Unit Was Able To Develop Algorithms That Can Now Predict Infections 24 Hours Before Any Physical Symptoms Appear. That Way, The Team Can Intervene Early And Save Fragile Babies In An Environment Where Every Hour Counts. Integrating Data From Medical Records With Social Media Analytics Enables Us To Monitor Flu Outbreaks In Real-Time, Simply By Listening To What People Are Saying, I.E. Feeling Rubbish Today - In Bed With A Cold Improving Sports Performance Most Elite Sports Have Now Embraced Big Data Analytics. The IBM Slamtracker Tool For Tennis Tournaments; Uses Video Analytics That Track The Performance Of Every Player In A Football Or Baseball Game, And Sensor Technology In Sports Equipment Such As Basket Balls Or Golf Clubs Allows Us To Get Feedback (Via Smart Phones And Cloud Servers) On Our Game And How To Improve It. Many Elite Sports Teams Also Track Athletes Outside Of The Sporting Environment Using Smart Technology To Track Nutrition And Sleep, As Well As Social Media Conversations To Monitor Emotional Wellbeing Improving Science And Research Science And Research Is Currently Being Transformed By The New Possibilities Big Data Brings. Take, For Example, CERN, The Swiss Nuclear Physics Lab With Its Large Hadron Collider, The World s Largest And Most Powerful Particle Accelerator. Experiments To Unlock The Secrets Of Our Universe How It Started And Works - Generate Huge Amounts Of Data. The CERN Data Center Has 65,000 Processors To Analyze Its 30 Petabytes Of Data. However, It Uses The Computing Powers Of Thousands Of Computers Distributed Across 150 Data Centers Worldwide To Analyze The Data. 7.7 Optimizing Machine And Device Performance INCON X

9 Big Data Analytics Help Machines And Devices Become Smarter And More Autonomous. For Example, Big Data Tools Are Used To Operate Google s Self-Driving Car. The Toyota Prius Is Fitted With Cameras, GPS As Well As Powerful Computers And Sensors To Safely Drive On The Road Without The Intervention Of Human Beings. 7.8 Improving Security And Law Enforcement. Big Data Is Applied Heavily In Improving Security And Enabling Law Enforcement. The National Security Agency (NSA) In The U.S. Uses Big Data Analytics To Foil Terrorist Plots. Others Use Big Data Techniques To Detect And Prevent Cyber Attacks. Police Forces Use Big Data Tools To Catch Criminals And Even Predict Criminal Activity And Credit Card Companies Use Big Data Use It To Detect Fraudulent Transactions. 7.9 Improving And Optimizing Cities And Countries Big Data Is Used To Improve Many Aspects Of Our Cities And Countries. For Example, It Allows Cities To Optimize Traffic Flows Based On Real Time Traffic Information As Well As Social Media And Weather Data. A Bus Would Wait For A Delayed Train And Where Traffic Signals Predict Traffic Volumes And Operate To Minimize Jams Financial Trading The Final Category Of Big Data Application Comes From Financial Trading. High- Frequency Trading (HFT) Is An Area Where Big Data Finds A Lot Of Use Today. Here, Big Data Algorithms Are Used To Make Trading Decisions. Today, The Majority Of Equity Trading Takes Place Via Data Algorithms That Increasingly Take Into Account Signals From Social Media Networks And News Websites To Make, Buy And Sell Decisions In Split Seconds. 8. Conclusion We Have Already Entered Into The Era Of Big Data Analytics. Through Better Analysis Of The Large Volumes Of Data That Are Becoming Available, There Is The Potential For Making Faster Advances In Many Scientific Disciplines And Improving The Profitability And Success Of Many Enterprises. This Paper Explains The Characteristics Of Big Data & Reviews An Architecture For Big Data Analytics. It Also Presents Some Of The Most Widespread Application Areas Of Big Data Analytics. However, We Need To Address Many Technical Challenges Before The Potential Of Big Data Analytics Can Be Fully Realized. These Challenges Include Not Just The Obvious Issues Of Scale, But Also Heterogeneity, Lack Of Structure, Security, Privacy, Timeliness And Visualization, At All Stages Of The Analysis Pipeline From Data Acquisition To Result Interpretation. 9. References 1. Joseph O. Chan, Roosevelt University, USA: An Architecture For Big Data Analytics, Communications Of The IIMA, , Issue 2 2. Rob Peglar: Introduction To Analytics & Big Data Hadoop, Storage Networking Industry Association, 2012 INCON X

10 3. Apache Software Foundation: (2013a), Welcome To Apache Hadoop, Retrieved From 4. Apache Software Foundation: (2013b), Welcome To Apache Hbase, Retrieved From Apache.Org/ Apache Software Foundation. 5. Architecture Overview: (2013c), What Is The Difference Between Hbase And Hadoop/HDFS? Retrieved From Html#Arch.Overview 6. NASSCOM, New Delhi: Big Data - The Next Big Thing, 2012, 7. Community Paper By US Researchers, Philip Bernstein - Microsoft, Elisa Bertino - Purdue Univ., Umeshwar Dayal - HP, Michael Franklin - UC Berkeley, Johannes Gehrke - Cornell Univ.: Challenges & Opportunities With Big Data, NESSI White Paper: Big Data A New World Of Opportunities, An Oracle White Paper: Big Data Analytics Advanced Analytics In Oracle Database, Intel IT Center: Getting Started With Big Data Steps IT Managers Can Take To Move Forward With Apache Hadoop Software, 2013, Alex Holmes: Hadoop In Practice, 2012, Bernard Marr, Keynote Speaker And Consultant In Strategy, Performance Management, Analytics, Kpis And Big Data: The Awesome Ways Big Data Is Used Today To Change Our World, November, 2013, ***** INCON X

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA http://kzhang6.people.uic.edu/tutorial/amcis2014.html August 7, 2014 Schedule I. Introduction to big data

More information

BIG DATA TRENDS AND TECHNOLOGIES

BIG DATA TRENDS AND TECHNOLOGIES BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.

More information

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12 Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using

More information

How To Handle Big Data With A Data Scientist

How To Handle Big Data With A Data Scientist III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

Transforming the Telecoms Business using Big Data and Analytics

Transforming the Telecoms Business using Big Data and Analytics Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe

More information

L1: Introduction to Hadoop

L1: Introduction to Hadoop L1: Introduction to Hadoop Feng Li feng.li@cufe.edu.cn School of Statistics and Mathematics Central University of Finance and Economics Revision: December 1, 2014 Today we are going to learn... 1 General

More information

Hadoop implementation of MapReduce computational model. Ján Vaňo

Hadoop implementation of MapReduce computational model. Ján Vaňo Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed

More information

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the

More information

Are You Ready for Big Data?

Are You Ready for Big Data? Are You Ready for Big Data? Jim Gallo National Director, Business Analytics February 11, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?

More information

Application Development. A Paradigm Shift

Application Development. A Paradigm Shift Application Development for the Cloud: A Paradigm Shift Ramesh Rangachar Intelsat t 2012 by Intelsat. t Published by The Aerospace Corporation with permission. New 2007 Template - 1 Motivation for the

More information

A Brief Outline on Bigdata Hadoop

A Brief Outline on Bigdata Hadoop A Brief Outline on Bigdata Hadoop Twinkle Gupta 1, Shruti Dixit 2 RGPV, Department of Computer Science and Engineering, Acropolis Institute of Technology and Research, Indore, India Abstract- Bigdata is

More information

BIG DATA TECHNOLOGY. Hadoop Ecosystem

BIG DATA TECHNOLOGY. Hadoop Ecosystem BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big

More information

Chapter 7. Using Hadoop Cluster and MapReduce

Chapter 7. Using Hadoop Cluster and MapReduce Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

More information

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012. Viswa Sharma Solutions Architect Tata Consultancy Services

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012. Viswa Sharma Solutions Architect Tata Consultancy Services Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012 Viswa Sharma Solutions Architect Tata Consultancy Services 1 Agenda What is Hadoop Why Hadoop? The Net Generation is here Sizing the

More information

Are You Ready for Big Data?

Are You Ready for Big Data? Are You Ready for Big Data? Jim Gallo National Director, Business Analytics April 10, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?

More information

The 4 Pillars of Technosoft s Big Data Practice

The 4 Pillars of Technosoft s Big Data Practice beyond possible Big Use End-user applications Big Analytics Visualisation tools Big Analytical tools Big management systems The 4 Pillars of Technosoft s Big Practice Overview Businesses have long managed

More information

Large scale processing using Hadoop. Ján Vaňo

Large scale processing using Hadoop. Ján Vaňo Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine

More information

How To Scale Out Of A Nosql Database

How To Scale Out Of A Nosql Database Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI

More information

BIG DATA What it is and how to use?

BIG DATA What it is and how to use? BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14

More information

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014 Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014 Defining Big Not Just Massive Data Big data refers to data sets whose size is beyond the ability of typical database software tools

More information

BIG DATA CHALLENGES AND PERSPECTIVES

BIG DATA CHALLENGES AND PERSPECTIVES BIG DATA CHALLENGES AND PERSPECTIVES Meenakshi Sharma 1, Keshav Kishore 2 1 Student of Master of Technology, 2 Head of Department, Department of Computer Science and Engineering, A P Goyal Shimla University,

More information

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: simmibagga12@gmail.com

More information

Sunnie Chung. Cleveland State University

Sunnie Chung. Cleveland State University Sunnie Chung Cleveland State University Data Scientist Big Data Processing Data Mining 2 INTERSECT of Computer Scientists and Statisticians with Knowledge of Data Mining AND Big data Processing Skills:

More information

Real Time Big Data Processing

Real Time Big Data Processing Real Time Big Data Processing Cloud Expo 2014 Ian Meyers Amazon Web Services Global Infrastructure Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure

More information

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES Relational vs. Non-Relational Architecture Relational Non-Relational Rational Predictable Traditional Agile Flexible Modern 2 Agenda Big Data

More information

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop

More information

Executive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform...

Executive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform... Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure Requirements... 5 Solution Spectrum... 6 Oracle s Big Data

More information

Big Data With Hadoop

Big Data With Hadoop With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials

More information

Big Data Explained. An introduction to Big Data Science.

Big Data Explained. An introduction to Big Data Science. Big Data Explained An introduction to Big Data Science. 1 Presentation Agenda What is Big Data Why learn Big Data Who is it for How to start learning Big Data When to learn it Objective and Benefits of

More information

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce Analytics in the Cloud Peter Sirota, GM Elastic MapReduce Data-Driven Decision Making Data is the new raw material for any business on par with capital, people, and labor. What is Big Data? Terabytes of

More information

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance. Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analytics

More information

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW AGENDA What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story Hadoop PDW Our BIG DATA Roadmap BIG DATA? Volume 59% growth in annual WW information 1.2M Zetabytes (10 21 bytes) this

More information

ANALYTICS BUILT FOR INTERNET OF THINGS

ANALYTICS BUILT FOR INTERNET OF THINGS ANALYTICS BUILT FOR INTERNET OF THINGS Big Data Reporting is Out, Actionable Insights are In In recent years, it has become clear that data in itself has little relevance, it is the analysis of it that

More information

An Oracle White Paper October 2011. Oracle: Big Data for the Enterprise

An Oracle White Paper October 2011. Oracle: Big Data for the Enterprise An Oracle White Paper October 2011 Oracle: Big Data for the Enterprise Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics Dr. Liangxiu Han Future Networks and Distributed Systems Group (FUNDS) School of Computing, Mathematics and Digital Technology,

More information

The 3 questions to ask yourself about BIG DATA

The 3 questions to ask yourself about BIG DATA The 3 questions to ask yourself about BIG DATA Do you have a big data problem? Companies looking to tackle big data problems are embarking on a journey that is full of hype, buzz, confusion, and misinformation.

More information

Hadoop Ecosystem B Y R A H I M A.

Hadoop Ecosystem B Y R A H I M A. Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open

More information

Big Data Analytics. Copyright 2011 EMC Corporation. All rights reserved.

Big Data Analytics. Copyright 2011 EMC Corporation. All rights reserved. Big Data Analytics 1 Priority Discussion Topics What are the most compelling business drivers behind big data analytics? Do you have or expect to have data scientists on your staff, and what will be their

More information

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe

More information

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture DATA MINING WITH HADOOP AND HIVE Introduction to Architecture Dr. Wlodek Zadrozny (Most slides come from Prof. Akella s class in 2014) 2015-2025. Reproduction or usage prohibited without permission of

More information

Advanced Big Data Analytics with R and Hadoop

Advanced Big Data Analytics with R and Hadoop REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional

More information

Addressing Open Source Big Data, Hadoop, and MapReduce limitations

Addressing Open Source Big Data, Hadoop, and MapReduce limitations Addressing Open Source Big Data, Hadoop, and MapReduce limitations 1 Agenda What is Big Data / Hadoop? Limitations of the existing hadoop distributions Going enterprise with Hadoop 2 How Big are Data?

More information

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect on AWS Services Overview Bernie Nallamotu Principle Solutions Architect \ So what is it? When your data sets become so large that you have to start innovating around how to collect, store, organize, analyze

More information

Big Data Big Data/Data Analytics & Software Development

Big Data Big Data/Data Analytics & Software Development Big Data Big Data/Data Analytics & Software Development Danairat T. danairat@gmail.com, 081-559-1446 1 Agenda Big Data Overview Business Cases and Benefits Hadoop Technology Architecture Big Data Development

More information

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Kanchan A. Khedikar Department of Computer Science & Engineering Walchand Institute of Technoloy, Solapur, Maharashtra,

More information

CSE-E5430 Scalable Cloud Computing Lecture 2

CSE-E5430 Scalable Cloud Computing Lecture 2 CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing

More information

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Prepared By : Manoj Kumar Joshi & Vikas Sawhney Prepared By : Manoj Kumar Joshi & Vikas Sawhney General Agenda Introduction to Hadoop Architecture Acknowledgement Thanks to all the authors who left their selfexplanatory images on the internet. Thanks

More information

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank Agenda» Overview» What is Big Data?» Accelerates advances in computer & technologies» Revolutionizes data measurement»

More information

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Siva Ravada Senior Director of Development Oracle Spatial and MapViewer 2 Evolving Technology Platforms

More information

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014 Forecast of Big Data Trends Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014 Big Data transforms Business 2 Data created every minute Source http://mashable.com/2012/06/22/data-created-every-minute/

More information

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics

More information

White Paper: Hadoop for Intelligence Analysis

White Paper: Hadoop for Intelligence Analysis CTOlabs.com White Paper: Hadoop for Intelligence Analysis July 2011 A White Paper providing context, tips and use cases on the topic of analysis over large quantities of data. Inside: Apache Hadoop and

More information

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future

More information

IoT and Big Data- The Current and Future Technologies: A Review

IoT and Big Data- The Current and Future Technologies: A Review Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 5, Issue. 1, January 2016,

More information

Data Mining in the Swamp

Data Mining in the Swamp WHITE PAPER Page 1 of 8 Data Mining in the Swamp Taming Unruly Data with Cloud Computing By John Brothers Business Intelligence is all about making better decisions from the data you have. However, all

More information

Keywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop

Keywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop Volume 4, Issue 1, January 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Transitioning

More information

Manifest for Big Data Pig, Hive & Jaql

Manifest for Big Data Pig, Hive & Jaql Manifest for Big Data Pig, Hive & Jaql Ajay Chotrani, Priyanka Punjabi, Prachi Ratnani, Rupali Hande Final Year Student, Dept. of Computer Engineering, V.E.S.I.T, Mumbai, India Faculty, Computer Engineering,

More information

Oracle Big Data SQL Technical Update

Oracle Big Data SQL Technical Update Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical

More information

Open source Google-style large scale data analysis with Hadoop

Open source Google-style large scale data analysis with Hadoop Open source Google-style large scale data analysis with Hadoop Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory School of Electrical

More information

MapReduce with Apache Hadoop Analysing Big Data

MapReduce with Apache Hadoop Analysing Big Data MapReduce with Apache Hadoop Analysing Big Data April 2010 Gavin Heavyside gavin.heavyside@journeydynamics.com About Journey Dynamics Founded in 2006 to develop software technology to address the issues

More information

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2 Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue

More information

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Datenverwaltung im Wandel - Building an Enterprise Data Hub with Datenverwaltung im Wandel - Building an Enterprise Data Hub with Cloudera Bernard Doering Regional Director, Central EMEA, Cloudera Cloudera Your Hadoop Experts Founded 2008, by former employees of Employees

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

Data Refinery with Big Data Aspects

Data Refinery with Big Data Aspects International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data

More information

Introduction to Analytics and Big Data - Hadoop. Rob Peglar EMC Isilon

Introduction to Analytics and Big Data - Hadoop. Rob Peglar EMC Isilon Introduction to Analytics and Big Data - Hadoop Rob Peglar EMC Isilon SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individual members may use

More information

How to make BIG DATA work for you. Faster results with Microsoft SQL Server PDW

How to make BIG DATA work for you. Faster results with Microsoft SQL Server PDW How to make BIG DATA work for you. Faster results with Microsoft SQL Server PDW Roger Breu PDW Solution Specialist Microsoft Western Europe Marcus Gullberg PDW Partner Account Manager Microsoft Sweden

More information

Big Data and Market Surveillance. April 28, 2014

Big Data and Market Surveillance. April 28, 2014 Big Data and Market Surveillance April 28, 2014 Copyright 2014 Scila AB. All rights reserved. Scila AB reserves the right to make changes to the information contained herein without prior notice. No part

More information

BITKOM& NIK - Big Data Wo liegen die Chancen für den Mittelstand?

BITKOM& NIK - Big Data Wo liegen die Chancen für den Mittelstand? BITKOM& NIK - Big Data Wo liegen die Chancen für den Mittelstand? The Big Data Buzz big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database

More information

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give

More information

W H I T E P A P E R. Building your Big Data analytics strategy: Block-by-Block! Abstract

W H I T E P A P E R. Building your Big Data analytics strategy: Block-by-Block! Abstract W H I T E P A P E R Building your Big Data analytics strategy: Block-by-Block! Abstract In this white paper, Impetus discusses how you can handle Big Data problems. It talks about how analytics on Big

More information

Big Data in Enterprise challenges & opportunities. Yuanhao Sun 孙 元 浩 yuanhao.sun@intel.com Software and Service Group

Big Data in Enterprise challenges & opportunities. Yuanhao Sun 孙 元 浩 yuanhao.sun@intel.com Software and Service Group Big Data in Enterprise challenges & opportunities Yuanhao Sun 孙 元 浩 yuanhao.sun@intel.com Software and Service Group Big Data Phenomenon 1.8ZB in 2011 2 Days > the dawn of civilization to 2003 750M Photos

More information

Here comes the flood Tools for Big Data analytics. Guy Chesnot -June, 2012

Here comes the flood Tools for Big Data analytics. Guy Chesnot -June, 2012 Here comes the flood Tools for Big Data analytics Guy Chesnot -June, 2012 Agenda Data flood Implementations Hadoop Not Hadoop 2 Agenda Data flood Implementations Hadoop Not Hadoop 3 Forecast Data Growth

More information

Hadoop IST 734 SS CHUNG

Hadoop IST 734 SS CHUNG Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to

More information

NoSQL and Hadoop Technologies On Oracle Cloud

NoSQL and Hadoop Technologies On Oracle Cloud NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath

More information

Applications for Big Data Analytics

Applications for Big Data Analytics Smarter Healthcare Applications for Big Data Analytics Multi-channel sales Finance Log Analysis Homeland Security Traffic Control Telecom Search Quality Manufacturing Trading Analytics Fraud and Risk Retail:

More information

WHAT IS BIG DATA? David Bechtold

WHAT IS BIG DATA? David Bechtold WHAT IS BIG DATA? David Bechtold Agenda 1. Introduction 2. What is Big Data? 3. Big Data a perspective 4. Characteristic of Big Data Three Vs 5. A Fourth V..? 6. Examples 7. How did we get here?... A historical

More information

Sources: Summary Data is exploding in volume, variety and velocity timely

Sources: Summary Data is exploding in volume, variety and velocity timely 1 Sources: The Guardian, May 2010 IDC Digital Universe, 2010 IBM Institute for Business Value, 2009 IBM CIO Study 2010 TDWI: Next Generation Data Warehouse Platforms Q4 2009 Summary Data is exploding

More information

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics BIG DATA & ANALYTICS Transforming the business and driving revenue through big data and analytics Collection, storage and extraction of business value from data generated from a variety of sources are

More information

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges Prerita Gupta Research Scholar, DAV College, Chandigarh Dr. Harmunish Taneja Department of Computer Science and

More information

Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料

Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料 Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料 美 國 13 歲 學 生 用 Big Data 找 出 霸 淩 熱 點 Puri 架 設 網 站 Bullyvention, 藉 由 分 析 Twitter 上 找 出 提 到 跟 霸 凌 相 關 的 詞, 搭 配 地 理 位 置

More information

Big Data and Hadoop. Sreedhar C, Dr. D. Kavitha, K. Asha Rani

Big Data and Hadoop. Sreedhar C, Dr. D. Kavitha, K. Asha Rani Big Data and Hadoop Sreedhar C, Dr. D. Kavitha, K. Asha Rani Abstract Big data has become a buzzword in the recent years. Big data is used to describe a massive volume of both structured and unstructured

More information

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme Big Data Analytics Prof. Dr. Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany 33. Sitzung des Arbeitskreises Informationstechnologie,

More information

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All

More information

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

Accelerating Hadoop MapReduce Using an In-Memory Data Grid Accelerating Hadoop MapReduce Using an In-Memory Data Grid By David L. Brinker and William L. Bain, ScaleOut Software, Inc. 2013 ScaleOut Software, Inc. 12/27/2012 H adoop has been widely embraced for

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON HIGH PERFORMANCE DATA STORAGE ARCHITECTURE OF BIGDATA USING HDFS MS.

More information

Keywords: Big Data, Hadoop, cluster, heterogeneous, HDFS, MapReduce

Keywords: Big Data, Hadoop, cluster, heterogeneous, HDFS, MapReduce Volume 5, Issue 9, September 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Study of

More information

Big Data and Apache Hadoop s MapReduce

Big Data and Apache Hadoop s MapReduce Big Data and Apache Hadoop s MapReduce Michael Hahsler Computer Science and Engineering Southern Methodist University January 23, 2012 Michael Hahsler (SMU/CSE) Hadoop/MapReduce January 23, 2012 1 / 23

More information

Big Data Mining: Challenges and Opportunities to Forecast Future Scenario

Big Data Mining: Challenges and Opportunities to Forecast Future Scenario Big Data Mining: Challenges and Opportunities to Forecast Future Scenario Poonam G. Sawant, Dr. B.L.Desai Assist. Professor, Dept. of MCA, SIMCA, Savitribai Phule Pune University, Pune, Maharashtra, India

More information

Apache Hadoop: The Big Data Refinery

Apache Hadoop: The Big Data Refinery Architecting the Future of Big Data Whitepaper Apache Hadoop: The Big Data Refinery Introduction Big data has become an extremely popular term, due to the well-documented explosion in the amount of data

More information

VIEWPOINT. High Performance Analytics. Industry Context and Trends

VIEWPOINT. High Performance Analytics. Industry Context and Trends VIEWPOINT High Performance Analytics Industry Context and Trends In the digital age of social media and connected devices, enterprises have a plethora of data that they can mine, to discover hidden correlations

More information

TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP

TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP Pythian White Paper TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP ABSTRACT As companies increasingly rely on big data to steer decisions, they also find themselves looking for ways to simplify

More information

Big Data and Analytics: Challenges and Opportunities

Big Data and Analytics: Challenges and Opportunities Big Data and Analytics: Challenges and Opportunities Dr. Amin Beheshti Lecturer and Senior Research Associate University of New South Wales, Australia (Service Oriented Computing Group, CSE) Talk: Sharif

More information

Testing Big data is one of the biggest

Testing Big data is one of the biggest Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing

More information

The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn

The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn Presented by :- Ishank Kumar Aakash Patel Vishnu Dev Yadav CONTENT Abstract Introduction Related work The Ecosystem Ingress

More information

Chapter 6. Foundations of Business Intelligence: Databases and Information Management

Chapter 6. Foundations of Business Intelligence: Databases and Information Management Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:

More information

NoSQL for SQL Professionals William McKnight

NoSQL for SQL Professionals William McKnight NoSQL for SQL Professionals William McKnight Session Code BD03 About your Speaker, William McKnight President, McKnight Consulting Group Frequent keynote speaker and trainer internationally Consulted to

More information

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances INSIGHT Oracle's All- Out Assault on the Big Data Market: Offering Hadoop, R, Cubes, and Scalable IMDB in Familiar Packages Carl W. Olofson IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA

More information

COMP9321 Web Application Engineering

COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411

More information