Deciphering Big Data Analytics - A Review Of Technology & Applications
|
|
- Valerie Stewart
- 8 years ago
- Views:
Transcription
1 Deciphering Big Data Analytics - A Review Of Technology & Applications Ramesh Mahadik, M.Tech.(Computer Sc. & Engg.), I.I.T, Mumbai Director MCA Institute Of Management And Computer Studies - MCA Institute Mumbai, Maharashtra, India Ramesh.Imcost@Yahoo.Com; Rameshgm.Iitb@Gmail.Com ABSTRACT: There Is A Great Excitement Surrounding Big Data Analytics, In The World Today. Organizations Now Understand The Importance & Significance Of Data-Driven Decision Making, Because Of Which There Is A Growing Enthusiasm To The Idea Of Big Data. Data Becomes Big Data When Its Volume, Velocity, Or Variety Exceeds The Abilities Of Conventional IT Systems To Gulp, Store, Analyze, And Process It. Several Organizations Have The IT Systems And Expertise To Handle Large Quantities Of Structured Data, But With The Increasing Volume And Faster Flows Of Data, They Lack The Ability To Mine It And Derive Actionable Intelligence In A Timely Way. Big Data Analytics Addresses This Need For Evolved Data Processing & Analytics, Which Can Handle The Fast Growing, High Volume, Multiple Typed Data (Structured, Semi-Structured & Unstructured), Generated At High Speed. This Paper Explores The Technology Framework & Application Areas Of Big Data Analytics. Keywords: Big Data, Big Data Analytics, Hadoop, Mapreduce, Decision Support Systems, Data Mining, Business Intelligence 1. Introduction We Are Flooded With Data Today. A Wide Spectrum Of Application Areas, Collect Data At A Humungous Scale. Decisions That Previously Were Based On Guesswork, Or Crudely Constructed Models, Are Now Based On The Data Itself. Big Data Analytics Now Drives Nearly Every Aspect Of Our Modern Society, Including Retail, Manufacturing, Financial Services, Mobile Services, Social Media And Healthcare, To Name A Few. Apparently, Big Data Means Business Opportunities, But At The Same Time It Also Poses Major Research Challenges. According To Mckinsey & Co., Big Data Is The Next Frontier For Innovation, Competition And Productivity. The Impact Of Big Data Gives Not Only A Huge Potential For Competition And Growth For Individual Companies, But The Right Use Of Big Data Also Can Increase Productivity, Innovation, And Competitiveness For Entire Sectors And Economies. 2. Research Methodology An Exhaustive Study Of Various Texts, Research Articles And Materials Pertaining To Big Data Analytics Was Carried Out, With The Aim Of Understanding Its Technology Framework And Application Areas. INCON X
2 3. What Is Big Data? Big Data Relates To Rapidly Growing, Structured & Unstructured Datasets With Sizes Beyond The Ability Of Conventional Database Tools To Store, Manage And Analyze Them. It Is Characterized Primarily By The 3Vs: Volume, Variety & Velocity. The 4 th Characteristic Considered Is Veracity. Speed, Accuracy & Complexity of Intelligence Small Data Sets Advanced Analytics Small Data Sets Traditional Analytics Big Data Big Data Analytics Big Data Traditional Analytics GB TB PB EB ZB (10 9 ) (10 12 ) (10 15 ) (10 18 ) (10 21 ) Size of Data Volume Velocity Large quantity of data which may be enterprise-specific, general or public Diverse set of data, created by social networking feeds, video audio, , sensor data, etc Speed of data inflow as well as rate at which this fast moving data needs to be stored. Variety Figure 1: What Is Big Data? Veracity: Unlike Carefully Governed Internal Data, Most Big Data Comes From Sources Outside Our Control And Therefore Suffers From Significant Correctness Or Accuracy Problems. Veracity Represents Both The Credibility Of The Data Source As Well As The Suitability Of The Data For The Target Audience. 4. Understanding Big Data Analytics Fundamentally, Big Data Analytics Is The Process Of Examining Large Data Sets Containing A Variety Of Data Types To Uncover Hidden Patterns, Unknown Correlations, Market Trends, Customer Preferences And Other Useful Business Information. Lets Try To Understand What Big Data Analytics Is & What It Isn t. 4.1 What Big Data Analytics Is: i) A Technology-Enabled Strategy For Gaining Richer, Deeper Insights Into Customers, Partners, And The Business And Ultimately Gaining Competitive Advantage ii) Working With Data Sets Whose Size And Variety Is Beyond The Ability Of Conventional Database Software To Capture, Store, Manage, And Analyze. iii) Processing A Steady Stream Of Real-Time Data In Order To Make Time-Sensitive Decisions Faster Than Ever Before. iv) Distributed In Nature Analytics Processing Goes To Where The Data Is For Greater Speed And Efficiency. INCON X
3 v) A New Paradigm In Which IT Collaborates With Business Users And Data Scientists To Identify And Implement Analytics That Will Increase Operational Efficiency And Solve New Business Problems. 4.2 What Big Data Analytics Isn t: i) Just About Technology. At The Business Level, It s About How To Exploit The Vastly Enhanced Sources Of Data To Gain Insight. ii) Only About Volume. It s Also About Variety And Velocity. But Perhaps Most Important It s About Value Derived From The Data. iii) Generated Or Used Only By Huge Online Companies Like Google Or Amazon Anymore. While Internet Companies May Have Pioneered The Use Of Big Data At Web Scale, Applications Touch Every Industry. iv) About None-Size-Fits-All Traditional Relational Databases Built On Shared Disk And Memory Architecture. Big Data Analytics Uses A Grid Of Computing Resources For Massively Parallel Processing (MPP). v) Meant To Replace Relational Databases Or The Data Warehouse. Structured Data Continues To Be Critically Important To Companies. However, Traditional Systems May Not Be Suitable For The New Sources And Contexts Of Big Data.,, 5. Architecture For Big Data In The World Of Big Data, Data Volumes We Need To Work With On A Day-To- Day Basis Have Outgrown The Storage And Processing Capabilities Of A Single Host. Big Data Brings With It Two Fundamental Challenges: (I) How To Store And Process Voluminous Data Sizes, And More Important, (Ii) How To Understand Data And Turn It Into A Competitive Advantage. Apache Hadoop Fills The Gap In The Market By Effectively Storing And Providing Computational Capabilities Over Substantial Amounts Of Data. It s A Distributed System Made Up Of A Distributed Filesystem And It Offers A Way To Parallelize And Execute Programs On A Cluster Of Machines. Its Already Been Adopted By Technology Giants Like Yahoo!, Facebook, And Twitter To Address Their Big Data Needs. From An Architectural Perspective, Hadoop, As Shown In Figure 2, Is A Distributed Master-Slave Architecture That Consists Of The Hadoop Distributed File System (HDFS) For Storage And Map-Reduce For Computational Capabilities. INCON X
4 Figure 2: High Level Hadoop Architecture 5.1 Core Hadoop Components To Understand Hadoop s Architecture We ll Start By Looking At The Basics Of HDFS & Mapreduce HDFS (Hadoop Distributed File System) HDFS Is The Storage Component Of Hadoop. It s A Distributed Filesystem That s Modeled After The Google File System (GFS) Paper. HDFS Is Optimized For High Throughput And Works Best When Reading And Writing Large Files (Gigabytes And Larger). To Support This Throughput HDFS Leverages Unusually Large (For A Filesystem) Block Sizes And Data Locality Optimizations To Reduce Network Input/Output (I/O). Scalability And Availability Are Also Key Traits Of HDFS, Achieved In Part Due To Data Replication And Fault Tolerance. HDFS Replicates Files For A Configured Number Of Times, Is Tolerant Of Both Software And Hardware Failure, And Automatically Re- Replicates Data Blocks On Nodes That Have Failed. Figure 3 Shows A Logical Representation Of The Components In HDFS: Figure 3: HDFS Architecture Mapreduce Mapreduce Is A Batch-Based, Distributed Computing Framework Modeled After Google s Paper On Mapreduce. It Allows You To Parallelize Work Over A Large Amount Of Raw Data, Such As Combining Web Logs With Relational Data From An OLTP Database To Model How Users Interact With Your Website. This Type Of Work, Which Could Take Days Or Longer Using Conventional Serial Programming Techniques, Can Be Reduced Down To Minutes Using Mapreduce On A Hadoop Cluster. It s A Nice Way To Partition Tasks Across Lots Of Machines And Can Handle Machine Failure. It Works INCON X
5 Across Different Application Types, Like Search And Ads. It Allows Pre-Computation Of Useful Data, Find Word Counts, Sort Tbs Of Data, Etc. Mapreduce Decomposes Work Submitted By A Client Into Small Parallelized Map And Reduce Workers. The Role Of The Programmer Is To Define Map And Reduce Functions, Where The Map Function Outputs Key/Value Tuples, Which Are Processed By Reduce Functions To Produce The Final Output. The Power Of Mapreduce Occurs In Between The Map Output And The Reduce Input, In The Shuffle And Sort Phases. Hadoop s Mapreduce Architecture Is Similar To The Master-Slave Model In HDFS. The Main Components Of Mapreduce Are Illustrated In Its Logical Architecture, As Shown In Figure The Hadoop Ecosystem Figure 4: Mapreduce Logical Architecture INCON X
6 Figure 5: The Hadoop Ecosystem Following Are The Components Of The Hadoop Ecosystem: HDFS: A Distributed, Fault Tolerant File System Mapreduce: A Framework For Writing/Executing Distributed, Fault Tolerant Algorithms Avro tm : A Data Serialization System. Hive tm : A Data Warehouse Infrastructure That Provides Data Summarization And Ad Hoc Querying. Pig tm : A High-Level Data-Flow Language And Execution Framework For Parallel Computation. Sqoop tm : A Package For Moving Data Between HDFS And Relational DB Systems H Base tm : A Scalable, Distributed Database That Supports Structured Data Storage For Large Tables. Zookeeper tm : A High-Performance Coordination Service For Distributed Applications. 6. An Architecture For Big Data Analytics The Big Data Analytics Architecture Described Below Utilizes The Massively Parallel, Distributed Storage And Processing Framework As Provided By Hadoop HDFS And Mapreduce. Figure 6: An Architecture For Big Data Analytics Structured Data Are Captured Through Various Data Sources Including OLTP Systems, Legacy Systems And External Systems. It Goes Through The ETL Process INCON X
7 From The Source Systems To The Target Data Warehouse. Traditional Business Intelligence (BI) Batched Analytical Processing Tools Such As Online Analytical Processing (OLAP), Data Mining, And Query And Reporting, Can Be Used To Create The Business Intelligence To Enhance Business Operations And Decision Processes. Unstructured And Semi-Structured Big Data Sources Can Be Of A Wide Variety That Includes Data From Social Media, Mobile Device, Sensors, Documents And Reports, Web Logs, Call Records, Scientific Research, Satellites, And Geospatial Devices. They Are Loaded Into The Hadoop Distributed File System Cluster. Hadoop Mapreduce Provides The Fault-Tolerant Distributed Processing Framework Across The Hadoop Cluster, Where Batched Analytics Can Be Performed. Actionable Insight Resulting From Hadoop Mapreduce Analytics And Business Intelligence Analytics Can Be Consumed By Operational And Analytical Applications. Geospatial Intelligence Is Described As Using Data About Space And Time To Improve The Quality Of Predictive Analysis. For Example, Real-Time Recommendations Of Places Of Interest Can Be Based On The Real-Time Location From Smartphone Usage Location. This Real-Time Information Can Be Combined With Batched Analytics To Improve The Quality Of The Predictions. Real- Time Nosql Databases Such As Hbase Can Be Used In Conjunction With Hadoop To Provide Real-Time Read/Write Of Hadoop Data. Real-Time Insight Created By Real- Time Analytics Can Be Consumed By Real-Time Operations And Decision Processes. 7. Applications Of Big Data Analytics Eventually, Every Aspect Of Our Lives Will Be Affected By Big Data Analytics. Following Are Some Areas Where Big Data Analytics Is Already Making A Real Difference Today, With Widespread Use, As Well As The Highest Benefits: 7.1 Understanding And Targeting Customers This Is One Of The Biggest And Most Publicized Areas Of Big Data Use Today. Here, Big Data Is Used To Better Understand Customers And Their Behaviors And Preferences. Companies Are Keen To Expand Their Traditional Data Sets With Social Media Data, Browser Logs As Well As Text Analytics And Sensor Data To Get A More Complete Picture Of Their Customers. The Big Objective, In Many Cases, Is To Create Predictive Models. Some Examples: Telecom Companies Using Big Data, Can Now Better Predict Customer Churn; Wal-Mart Can Predict What Products Will Sell, And Car Insurance Companies Understand How Well Their Customers Actually Drive. 7.2 Understanding And Optimizing Business Processes Big Data Is Also Increasingly Used To Optimize Business Processes. Retailers Are Able To Optimize Their Stock Based On Predictions Generated From Social Media Data, Web Search Trends And Weather Forecasts. One Particular Business Process That Is Seeing A Lot Of Big Data Analytics Is Supply Chain Or Delivery Route Optimization. Here, Geographic Positioning And Radio Frequency Identification Sensors Are Used To Track Goods Or Delivery Vehicles And Optimize Routes By Integrating Live Traffic Data, Etc. 7.3 Personal Quantification And Performance Optimization INCON X
8 Big Data Is Not Just For Companies And Governments But Also For All Of Us Individually. We Can Now Benefit From The Data Generated From Wearable Devices Such As Smart Watches Or Smart Bracelets. Take The Up Band From Jawbone As An Example: The Armband Collects Data On Our Calorie Consumption, Activity Levels, And Our Sleep Patterns. While It Gives Individuals Rich Insights, The Real Value Is In Analyzing The Collective Data. In Jawbone s Case, The Company Now Collects 60 Years Worth Of Sleep Data Every Night. Analyzing Such Volumes Of Data Will Bring Entirely New Insights That It Can Feed Back To Individual Users. 7.4 Improving Healthcare And Public Health The Computing Power Of Big Data Analytics Enables Us To Decode Entire DNA Strings In Minutes And Will Allow Us To Find New Cures And Better Understand And Predict Disease Patterns. Big Data Techniques Are Already Being Used To Monitor Babies In A Specialist Premature And Sick Baby Unit. By Recording And Analyzing Every Heart Beat And Breathing Pattern Of Every Baby, The Unit Was Able To Develop Algorithms That Can Now Predict Infections 24 Hours Before Any Physical Symptoms Appear. That Way, The Team Can Intervene Early And Save Fragile Babies In An Environment Where Every Hour Counts. Integrating Data From Medical Records With Social Media Analytics Enables Us To Monitor Flu Outbreaks In Real-Time, Simply By Listening To What People Are Saying, I.E. Feeling Rubbish Today - In Bed With A Cold Improving Sports Performance Most Elite Sports Have Now Embraced Big Data Analytics. The IBM Slamtracker Tool For Tennis Tournaments; Uses Video Analytics That Track The Performance Of Every Player In A Football Or Baseball Game, And Sensor Technology In Sports Equipment Such As Basket Balls Or Golf Clubs Allows Us To Get Feedback (Via Smart Phones And Cloud Servers) On Our Game And How To Improve It. Many Elite Sports Teams Also Track Athletes Outside Of The Sporting Environment Using Smart Technology To Track Nutrition And Sleep, As Well As Social Media Conversations To Monitor Emotional Wellbeing Improving Science And Research Science And Research Is Currently Being Transformed By The New Possibilities Big Data Brings. Take, For Example, CERN, The Swiss Nuclear Physics Lab With Its Large Hadron Collider, The World s Largest And Most Powerful Particle Accelerator. Experiments To Unlock The Secrets Of Our Universe How It Started And Works - Generate Huge Amounts Of Data. The CERN Data Center Has 65,000 Processors To Analyze Its 30 Petabytes Of Data. However, It Uses The Computing Powers Of Thousands Of Computers Distributed Across 150 Data Centers Worldwide To Analyze The Data. 7.7 Optimizing Machine And Device Performance INCON X
9 Big Data Analytics Help Machines And Devices Become Smarter And More Autonomous. For Example, Big Data Tools Are Used To Operate Google s Self-Driving Car. The Toyota Prius Is Fitted With Cameras, GPS As Well As Powerful Computers And Sensors To Safely Drive On The Road Without The Intervention Of Human Beings. 7.8 Improving Security And Law Enforcement. Big Data Is Applied Heavily In Improving Security And Enabling Law Enforcement. The National Security Agency (NSA) In The U.S. Uses Big Data Analytics To Foil Terrorist Plots. Others Use Big Data Techniques To Detect And Prevent Cyber Attacks. Police Forces Use Big Data Tools To Catch Criminals And Even Predict Criminal Activity And Credit Card Companies Use Big Data Use It To Detect Fraudulent Transactions. 7.9 Improving And Optimizing Cities And Countries Big Data Is Used To Improve Many Aspects Of Our Cities And Countries. For Example, It Allows Cities To Optimize Traffic Flows Based On Real Time Traffic Information As Well As Social Media And Weather Data. A Bus Would Wait For A Delayed Train And Where Traffic Signals Predict Traffic Volumes And Operate To Minimize Jams Financial Trading The Final Category Of Big Data Application Comes From Financial Trading. High- Frequency Trading (HFT) Is An Area Where Big Data Finds A Lot Of Use Today. Here, Big Data Algorithms Are Used To Make Trading Decisions. Today, The Majority Of Equity Trading Takes Place Via Data Algorithms That Increasingly Take Into Account Signals From Social Media Networks And News Websites To Make, Buy And Sell Decisions In Split Seconds. 8. Conclusion We Have Already Entered Into The Era Of Big Data Analytics. Through Better Analysis Of The Large Volumes Of Data That Are Becoming Available, There Is The Potential For Making Faster Advances In Many Scientific Disciplines And Improving The Profitability And Success Of Many Enterprises. This Paper Explains The Characteristics Of Big Data & Reviews An Architecture For Big Data Analytics. It Also Presents Some Of The Most Widespread Application Areas Of Big Data Analytics. However, We Need To Address Many Technical Challenges Before The Potential Of Big Data Analytics Can Be Fully Realized. These Challenges Include Not Just The Obvious Issues Of Scale, But Also Heterogeneity, Lack Of Structure, Security, Privacy, Timeliness And Visualization, At All Stages Of The Analysis Pipeline From Data Acquisition To Result Interpretation. 9. References 1. Joseph O. Chan, Roosevelt University, USA: An Architecture For Big Data Analytics, Communications Of The IIMA, , Issue 2 2. Rob Peglar: Introduction To Analytics & Big Data Hadoop, Storage Networking Industry Association, 2012 INCON X
10 3. Apache Software Foundation: (2013a), Welcome To Apache Hadoop, Retrieved From 4. Apache Software Foundation: (2013b), Welcome To Apache Hbase, Retrieved From Apache.Org/ Apache Software Foundation. 5. Architecture Overview: (2013c), What Is The Difference Between Hbase And Hadoop/HDFS? Retrieved From Html#Arch.Overview 6. NASSCOM, New Delhi: Big Data - The Next Big Thing, 2012, 7. Community Paper By US Researchers, Philip Bernstein - Microsoft, Elisa Bertino - Purdue Univ., Umeshwar Dayal - HP, Michael Franklin - UC Berkeley, Johannes Gehrke - Cornell Univ.: Challenges & Opportunities With Big Data, NESSI White Paper: Big Data A New World Of Opportunities, An Oracle White Paper: Big Data Analytics Advanced Analytics In Oracle Database, Intel IT Center: Getting Started With Big Data Steps IT Managers Can Take To Move Forward With Apache Hadoop Software, 2013, Alex Holmes: Hadoop In Practice, 2012, Bernard Marr, Keynote Speaker And Consultant In Strategy, Performance Management, Analytics, Kpis And Big Data: The Awesome Ways Big Data Is Used Today To Change Our World, November, 2013, ***** INCON X
Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA
Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA http://kzhang6.people.uic.edu/tutorial/amcis2014.html August 7, 2014 Schedule I. Introduction to big data
More informationBIG DATA TRENDS AND TECHNOLOGIES
BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.
More informationHadoop. http://hadoop.apache.org/ Sunday, November 25, 12
Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using
More informationHow To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
More informationTransforming the Telecoms Business using Big Data and Analytics
Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe
More informationL1: Introduction to Hadoop
L1: Introduction to Hadoop Feng Li feng.li@cufe.edu.cn School of Statistics and Mathematics Central University of Finance and Economics Revision: December 1, 2014 Today we are going to learn... 1 General
More informationHadoop implementation of MapReduce computational model. Ján Vaňo
Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed
More informationW H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract
W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the
More informationAre You Ready for Big Data?
Are You Ready for Big Data? Jim Gallo National Director, Business Analytics February 11, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?
More informationApplication Development. A Paradigm Shift
Application Development for the Cloud: A Paradigm Shift Ramesh Rangachar Intelsat t 2012 by Intelsat. t Published by The Aerospace Corporation with permission. New 2007 Template - 1 Motivation for the
More informationA Brief Outline on Bigdata Hadoop
A Brief Outline on Bigdata Hadoop Twinkle Gupta 1, Shruti Dixit 2 RGPV, Department of Computer Science and Engineering, Acropolis Institute of Technology and Research, Indore, India Abstract- Bigdata is
More informationBIG DATA TECHNOLOGY. Hadoop Ecosystem
BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big
More informationChapter 7. Using Hadoop Cluster and MapReduce
Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in
More informationHadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012. Viswa Sharma Solutions Architect Tata Consultancy Services
Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012 Viswa Sharma Solutions Architect Tata Consultancy Services 1 Agenda What is Hadoop Why Hadoop? The Net Generation is here Sizing the
More informationAre You Ready for Big Data?
Are You Ready for Big Data? Jim Gallo National Director, Business Analytics April 10, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?
More informationThe 4 Pillars of Technosoft s Big Data Practice
beyond possible Big Use End-user applications Big Analytics Visualisation tools Big Analytical tools Big management systems The 4 Pillars of Technosoft s Big Practice Overview Businesses have long managed
More informationLarge scale processing using Hadoop. Ján Vaňo
Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine
More informationHow To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI
More informationBIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
More informationBig Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014
Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014 Defining Big Not Just Massive Data Big data refers to data sets whose size is beyond the ability of typical database software tools
More informationBIG DATA CHALLENGES AND PERSPECTIVES
BIG DATA CHALLENGES AND PERSPECTIVES Meenakshi Sharma 1, Keshav Kishore 2 1 Student of Master of Technology, 2 Head of Department, Department of Computer Science and Engineering, A P Goyal Shimla University,
More informationInternational Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop
ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: simmibagga12@gmail.com
More informationSunnie Chung. Cleveland State University
Sunnie Chung Cleveland State University Data Scientist Big Data Processing Data Mining 2 INTERSECT of Computer Scientists and Statisticians with Knowledge of Data Mining AND Big data Processing Skills:
More informationReal Time Big Data Processing
Real Time Big Data Processing Cloud Expo 2014 Ian Meyers Amazon Web Services Global Infrastructure Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure
More informationBIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES
BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES Relational vs. Non-Relational Architecture Relational Non-Relational Rational Predictable Traditional Agile Flexible Modern 2 Agenda Big Data
More informationIntroduction to Hadoop. New York Oracle User Group Vikas Sawhney
Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop
More informationExecutive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform...
Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure Requirements... 5 Solution Spectrum... 6 Oracle s Big Data
More informationBig Data With Hadoop
With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials
More informationBig Data Explained. An introduction to Big Data Science.
Big Data Explained An introduction to Big Data Science. 1 Presentation Agenda What is Big Data Why learn Big Data Who is it for How to start learning Big Data When to learn it Objective and Benefits of
More informationAnalytics in the Cloud. Peter Sirota, GM Elastic MapReduce
Analytics in the Cloud Peter Sirota, GM Elastic MapReduce Data-Driven Decision Making Data is the new raw material for any business on par with capital, people, and labor. What is Big Data? Terabytes of
More informationKeywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.
Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analytics
More informationAGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW
AGENDA What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story Hadoop PDW Our BIG DATA Roadmap BIG DATA? Volume 59% growth in annual WW information 1.2M Zetabytes (10 21 bytes) this
More informationANALYTICS BUILT FOR INTERNET OF THINGS
ANALYTICS BUILT FOR INTERNET OF THINGS Big Data Reporting is Out, Actionable Insights are In In recent years, it has become clear that data in itself has little relevance, it is the analysis of it that
More informationAn Oracle White Paper October 2011. Oracle: Big Data for the Enterprise
An Oracle White Paper October 2011 Oracle: Big Data for the Enterprise Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5
More informationManaging Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
More informationSurfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics
Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics Dr. Liangxiu Han Future Networks and Distributed Systems Group (FUNDS) School of Computing, Mathematics and Digital Technology,
More informationThe 3 questions to ask yourself about BIG DATA
The 3 questions to ask yourself about BIG DATA Do you have a big data problem? Companies looking to tackle big data problems are embarking on a journey that is full of hype, buzz, confusion, and misinformation.
More informationHadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
More informationBig Data Analytics. Copyright 2011 EMC Corporation. All rights reserved.
Big Data Analytics 1 Priority Discussion Topics What are the most compelling business drivers behind big data analytics? Do you have or expect to have data scientists on your staff, and what will be their
More informationINTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE
INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe
More informationDATA MINING WITH HADOOP AND HIVE Introduction to Architecture
DATA MINING WITH HADOOP AND HIVE Introduction to Architecture Dr. Wlodek Zadrozny (Most slides come from Prof. Akella s class in 2014) 2015-2025. Reproduction or usage prohibited without permission of
More informationAdvanced Big Data Analytics with R and Hadoop
REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional
More informationAddressing Open Source Big Data, Hadoop, and MapReduce limitations
Addressing Open Source Big Data, Hadoop, and MapReduce limitations 1 Agenda What is Big Data / Hadoop? Limitations of the existing hadoop distributions Going enterprise with Hadoop 2 How Big are Data?
More informationBig Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect
on AWS Services Overview Bernie Nallamotu Principle Solutions Architect \ So what is it? When your data sets become so large that you have to start innovating around how to collect, store, organize, analyze
More informationBig Data Big Data/Data Analytics & Software Development
Big Data Big Data/Data Analytics & Software Development Danairat T. danairat@gmail.com, 081-559-1446 1 Agenda Big Data Overview Business Cases and Benefits Hadoop Technology Architecture Big Data Development
More informationRole of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop
Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Kanchan A. Khedikar Department of Computer Science & Engineering Walchand Institute of Technoloy, Solapur, Maharashtra,
More informationCSE-E5430 Scalable Cloud Computing Lecture 2
CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing
More informationPrepared By : Manoj Kumar Joshi & Vikas Sawhney
Prepared By : Manoj Kumar Joshi & Vikas Sawhney General Agenda Introduction to Hadoop Architecture Acknowledgement Thanks to all the authors who left their selfexplanatory images on the internet. Thanks
More informationDanny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank
Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank Agenda» Overview» What is Big Data?» Accelerates advances in computer & technologies» Revolutionizes data measurement»
More informationTrends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum
Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Siva Ravada Senior Director of Development Oracle Spatial and MapViewer 2 Evolving Technology Platforms
More informationForecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014
Forecast of Big Data Trends Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014 Big Data transforms Business 2 Data created every minute Source http://mashable.com/2012/06/22/data-created-every-minute/
More informationArchitecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics
More informationWhite Paper: Hadoop for Intelligence Analysis
CTOlabs.com White Paper: Hadoop for Intelligence Analysis July 2011 A White Paper providing context, tips and use cases on the topic of analysis over large quantities of data. Inside: Apache Hadoop and
More informationHadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
More informationIoT and Big Data- The Current and Future Technologies: A Review
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 5, Issue. 1, January 2016,
More informationData Mining in the Swamp
WHITE PAPER Page 1 of 8 Data Mining in the Swamp Taming Unruly Data with Cloud Computing By John Brothers Business Intelligence is all about making better decisions from the data you have. However, all
More informationKeywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop
Volume 4, Issue 1, January 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Transitioning
More informationManifest for Big Data Pig, Hive & Jaql
Manifest for Big Data Pig, Hive & Jaql Ajay Chotrani, Priyanka Punjabi, Prachi Ratnani, Rupali Hande Final Year Student, Dept. of Computer Engineering, V.E.S.I.T, Mumbai, India Faculty, Computer Engineering,
More informationOracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
More informationOpen source Google-style large scale data analysis with Hadoop
Open source Google-style large scale data analysis with Hadoop Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory School of Electrical
More informationMapReduce with Apache Hadoop Analysing Big Data
MapReduce with Apache Hadoop Analysing Big Data April 2010 Gavin Heavyside gavin.heavyside@journeydynamics.com About Journey Dynamics Founded in 2006 to develop software technology to address the issues
More informationAssociate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2
Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue
More informationDatenverwaltung im Wandel - Building an Enterprise Data Hub with
Datenverwaltung im Wandel - Building an Enterprise Data Hub with Cloudera Bernard Doering Regional Director, Central EMEA, Cloudera Cloudera Your Hadoop Experts Founded 2008, by former employees of Employees
More informationESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
More informationData Refinery with Big Data Aspects
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data
More informationIntroduction to Analytics and Big Data - Hadoop. Rob Peglar EMC Isilon
Introduction to Analytics and Big Data - Hadoop Rob Peglar EMC Isilon SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individual members may use
More informationHow to make BIG DATA work for you. Faster results with Microsoft SQL Server PDW
How to make BIG DATA work for you. Faster results with Microsoft SQL Server PDW Roger Breu PDW Solution Specialist Microsoft Western Europe Marcus Gullberg PDW Partner Account Manager Microsoft Sweden
More informationBig Data and Market Surveillance. April 28, 2014
Big Data and Market Surveillance April 28, 2014 Copyright 2014 Scila AB. All rights reserved. Scila AB reserves the right to make changes to the information contained herein without prior notice. No part
More informationBITKOM& NIK - Big Data Wo liegen die Chancen für den Mittelstand?
BITKOM& NIK - Big Data Wo liegen die Chancen für den Mittelstand? The Big Data Buzz big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database
More informationIntroduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
More informationW H I T E P A P E R. Building your Big Data analytics strategy: Block-by-Block! Abstract
W H I T E P A P E R Building your Big Data analytics strategy: Block-by-Block! Abstract In this white paper, Impetus discusses how you can handle Big Data problems. It talks about how analytics on Big
More informationBig Data in Enterprise challenges & opportunities. Yuanhao Sun 孙 元 浩 yuanhao.sun@intel.com Software and Service Group
Big Data in Enterprise challenges & opportunities Yuanhao Sun 孙 元 浩 yuanhao.sun@intel.com Software and Service Group Big Data Phenomenon 1.8ZB in 2011 2 Days > the dawn of civilization to 2003 750M Photos
More informationHere comes the flood Tools for Big Data analytics. Guy Chesnot -June, 2012
Here comes the flood Tools for Big Data analytics Guy Chesnot -June, 2012 Agenda Data flood Implementations Hadoop Not Hadoop 2 Agenda Data flood Implementations Hadoop Not Hadoop 3 Forecast Data Growth
More informationHadoop IST 734 SS CHUNG
Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to
More informationNoSQL and Hadoop Technologies On Oracle Cloud
NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath
More informationApplications for Big Data Analytics
Smarter Healthcare Applications for Big Data Analytics Multi-channel sales Finance Log Analysis Homeland Security Traffic Control Telecom Search Quality Manufacturing Trading Analytics Fraud and Risk Retail:
More informationWHAT IS BIG DATA? David Bechtold
WHAT IS BIG DATA? David Bechtold Agenda 1. Introduction 2. What is Big Data? 3. Big Data a perspective 4. Characteristic of Big Data Three Vs 5. A Fourth V..? 6. Examples 7. How did we get here?... A historical
More informationSources: Summary Data is exploding in volume, variety and velocity timely
1 Sources: The Guardian, May 2010 IDC Digital Universe, 2010 IBM Institute for Business Value, 2009 IBM CIO Study 2010 TDWI: Next Generation Data Warehouse Platforms Q4 2009 Summary Data is exploding
More informationBIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics
BIG DATA & ANALYTICS Transforming the business and driving revenue through big data and analytics Collection, storage and extraction of business value from data generated from a variety of sources are
More informationManaging Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges
Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges Prerita Gupta Research Scholar, DAV College, Chandigarh Dr. Harmunish Taneja Department of Computer Science and
More informationBig Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料
Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料 美 國 13 歲 學 生 用 Big Data 找 出 霸 淩 熱 點 Puri 架 設 網 站 Bullyvention, 藉 由 分 析 Twitter 上 找 出 提 到 跟 霸 凌 相 關 的 詞, 搭 配 地 理 位 置
More informationBig Data and Hadoop. Sreedhar C, Dr. D. Kavitha, K. Asha Rani
Big Data and Hadoop Sreedhar C, Dr. D. Kavitha, K. Asha Rani Abstract Big data has become a buzzword in the recent years. Big data is used to describe a massive volume of both structured and unstructured
More informationBig Data Analytics. Prof. Dr. Lars Schmidt-Thieme
Big Data Analytics Prof. Dr. Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany 33. Sitzung des Arbeitskreises Informationstechnologie,
More informationBig Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum
Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All
More informationAccelerating Hadoop MapReduce Using an In-Memory Data Grid
Accelerating Hadoop MapReduce Using an In-Memory Data Grid By David L. Brinker and William L. Bain, ScaleOut Software, Inc. 2013 ScaleOut Software, Inc. 12/27/2012 H adoop has been widely embraced for
More informationINTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON HIGH PERFORMANCE DATA STORAGE ARCHITECTURE OF BIGDATA USING HDFS MS.
More informationKeywords: Big Data, Hadoop, cluster, heterogeneous, HDFS, MapReduce
Volume 5, Issue 9, September 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Study of
More informationBig Data and Apache Hadoop s MapReduce
Big Data and Apache Hadoop s MapReduce Michael Hahsler Computer Science and Engineering Southern Methodist University January 23, 2012 Michael Hahsler (SMU/CSE) Hadoop/MapReduce January 23, 2012 1 / 23
More informationBig Data Mining: Challenges and Opportunities to Forecast Future Scenario
Big Data Mining: Challenges and Opportunities to Forecast Future Scenario Poonam G. Sawant, Dr. B.L.Desai Assist. Professor, Dept. of MCA, SIMCA, Savitribai Phule Pune University, Pune, Maharashtra, India
More informationApache Hadoop: The Big Data Refinery
Architecting the Future of Big Data Whitepaper Apache Hadoop: The Big Data Refinery Introduction Big data has become an extremely popular term, due to the well-documented explosion in the amount of data
More informationVIEWPOINT. High Performance Analytics. Industry Context and Trends
VIEWPOINT High Performance Analytics Industry Context and Trends In the digital age of social media and connected devices, enterprises have a plethora of data that they can mine, to discover hidden correlations
More informationTAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP
Pythian White Paper TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP ABSTRACT As companies increasingly rely on big data to steer decisions, they also find themselves looking for ways to simplify
More informationBig Data and Analytics: Challenges and Opportunities
Big Data and Analytics: Challenges and Opportunities Dr. Amin Beheshti Lecturer and Senior Research Associate University of New South Wales, Australia (Service Oriented Computing Group, CSE) Talk: Sharif
More informationTesting Big data is one of the biggest
Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing
More informationThe Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn
The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn Presented by :- Ishank Kumar Aakash Patel Vishnu Dev Yadav CONTENT Abstract Introduction Related work The Ecosystem Ingress
More informationChapter 6. Foundations of Business Intelligence: Databases and Information Management
Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:
More informationNoSQL for SQL Professionals William McKnight
NoSQL for SQL Professionals William McKnight Session Code BD03 About your Speaker, William McKnight President, McKnight Consulting Group Frequent keynote speaker and trainer internationally Consulted to
More informationWell packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances
INSIGHT Oracle's All- Out Assault on the Big Data Market: Offering Hadoop, R, Cubes, and Scalable IMDB in Familiar Packages Carl W. Olofson IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA
More informationCOMP9321 Web Application Engineering
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411
More information