Big Data System and Architecture
|
|
- Stephany Marshall
- 8 years ago
- Views:
Transcription
1 CHANGE, a 2012 DAC workshop 2nd International Workshop on Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments Moscone Center, San Francisco, California, June 3, 2012 Big Data System and Architecture Jian Li IBM Research in Austin jianli@us.ibm.com 2011 IBM Corporation
2 Agenda Big Data requirements Industry view Big Data application scenarios Case study: Watson Big Data platform Case study: A real world deployment Big Data research items Case studies: Performance 2
3 IBM Disclaimer Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion. More info at:
4 Big Data Big/Deep Insights to enterprise and society 44x as much Data and Content Over Coming Decade zettabytes 1 in 3 don t have Business leaders frequently make decisions based on information they don t trust, or Kilobyte (kb) 1,000 Bytes Megabyte (MB) Gigabyte (GB) 1,000 Kilobytes 1,000 Megabytes Business leaders say they don t have access to the information 1 in 2 they need to do their jobs Terabyte (TB) 1,000 Gigabytes Petabyte (PB) 1,000 Terabytes Exabyte (EB) Zettabyte (ZB) 1,000 Petabytes 1,000 Exabytes ,000 petabytes 80% Of world s data is unstructured 83% of CIOs cited Business intelligence and analytics as part of their visionary plans to enhance competitiveness of CEOs need to do a better job capturing and understanding information rapidly in order to 60% make swift business decisions The resulting explosion of information (plus intermediate data) creates a need for a new kind of intelligence
5 Big Data Presents Big Opportunities Extract insight from a high volume, variety, velocity and veracity of data in a timely and cost-effective manner Veracity Variety: Velocity: Volume: Veracity: Manage and benefit from diverse data types and data structures Analyze streaming data (data in motion) and large volumes of persistent data (data at rest) Scale from terabytes to zettabytes Establish confidence in data, information and solutions, e.g. Watson 5
6 Categories of Analytics Degree of Complexity / Competitive Advantage Stochastic Optimization Optimization Predictive modeling Forecasting Simulation Alerts Query/drill down Ad hoc reporting Standard Reporting What will happen next if? What if these trends continue? What could happen.? What actions are needed? What exactly is the problem? How many, how often, where? What happened? (Self?) learning system, expert system e.g. Watson How can we achieve the best outcome including the effects of variability? How can we achieve the best outcome? Prescriptive (E.g., ILOG) Predictive (E.g., SPSS, WebSphere Business Modeler) Descriptive (E.g., Cognos) Based on: TLE 2010 in CA. 6
7 Case Study: IBM Watson IBM Watson is a breakthrough in analytic innovation, but it is only successful because of the quality of the information from which it is working.
8 Big Data and Watson Big Data technology is used to build Watsonʼs knowledge base! Watson technology offers great potential for advanced business analytics! Watson used the Apache Hadoop open framework to distribute the workload for loading information into memory." Approx. 200M pages of text (To compete on Jeopardy!) POS Data CRM Data InfoSphere BigInsights Social Media Distilled Insight - Spending habits - Social relationships - Buying trends 10 racks of P750s, 2870 processor cores Watson s Memory (15TB) Advanced search and analysis
9 IBM Big Data Platform Vision IBM Big Data Solutions Big Data Operators and Accelerators Client and Partner Solutions Text Statistics Financial Geospatial Acoustic Image/Video Mining Times Series Mathematical Connectors Graph Analysis Accelerators Big Data Enterprise Engines InfoSphere Streams InfoSphere BigInsights Productivity Tools and Optimization Workload Management and Optimization Consumability and Management Tools Open Source Foundation Components Eclipse Oozie Hadoop HBase Pig Lucene Jaql Linux POWER (GA June 2012) x86 9
10 BigInsights: Analytics for Data at Rest BigInsights Enterprise Edition Adaptive MapReduce SystemML Unstructured Analytics (SystemT) Metatracker GPFS SNC BigInsights Core Install & Configuration Monitoring Jaql Management console Streams, DB and Warehouse integration Pig,Hive,Flume,Sqoop, etc Applications & Solutions Enabling Infrastructure BigSheets (included in BigInsights) Applications / Solutions / Partners / Community Cognos Consumer Insights SPSS and R Next Generation Credit Risk Analytics Custom applications IBM Distribution of Apache Hadoop Passed IBM legal and IP review, safe to use 10
11 Streams: Analytics for Data in Motion Real time delivery Volume Variety Terabytes per second Petabytes per day All kinds of data All kinds of analytics ICU Monitoring Algo Trading Powerful Analytics Cyber Security Government / Law enforcement Environment Monitoring Smart Grid Telco churn predict Velocity Insights in microseconds Millions of events per second Microsecond Latency Agility Dynamically responsive Rapid application development Traditional / Non-traditional data sources 11
12 Streams and BigInsights Hybrid and Integrated Analytics on Data in Motion & Data at Rest Visualization of realtime and historical insights Data Data ingest, preparation, online analysis, model validation InfoSphere Streams 1. Data Ingest 2. Bootstrap/Enrich Control flow 3. Adaptive Analytics Model InfoSphere BigInsights, Database & Warehouse Data Integration, data mining, machine learning, statistical modeling
13 Case Study: A Real World Hybrid Big Data Deployment RDMS Client/Internet Presentation/Web Application/Object Message Queues 12 x 10Gbe network Message Center Sample Services: Sales & Marketing Recommendation Systems Credit management and Fraud Detection Offline Analytics Hadoop + Mahout + Hive + Pig + DFS 500+ nodes Online Analytics HBase + Solr (Lucene) + Yahoo! S4 + DFS 80+ nodes
14 Example Research Issues Data generation, benchmarks, metrics and workload characterization for analytics and dataintensive computing, e.g. how accurate/fast/efficient is necessary and tradeoffs Accelerators in heterogeneous and hybrid systems for analytics and data-intensive computing Scalable system and network designs for capturing large numbers of concurrent data streams or high bandwidth data streaming Data management for vast amounts of unstructured data OS, distributed systems and system management support for very large-scale analytics Debugging and performance analysis tools for analytics and data-intensive computing Programming systems and language support for deep analytics Mapreduce and other processing paradigms for analytics Processor, memory and system architectures for data analytics Implications of data analytics to cloud computing Implications of data analytics to mobile, embedded and autonomous systems Energy-efficiency and energy-efficient designs for analytics Availability, fault tolerance and data recovery in large-scale data-oriented environments Self learning and cognitive system? 14
15 Performance Case Study: Sort 1 TB of Data on PowerLinux Lab Results Initial results demonstrate 1 Terabyte of data sorted in 14 minutes on 10 nodes running PowerLinux (April 2012) Fastest industry results demonstrated to-date is 24 minutes one result used 10 x86 Linux nodes and another used 11 x86 Linux nodes* Both with newer/faster Hadoop versions and patches Lab continuing tuning to fully exploit PowerLinux benefits Continuing hardware-software co-optimization and architecture innovation of IBM Big Data systems Test Hardware 10 Power 730 express servers 120 cores (2-socket, 6-cores per socket, 3.72 GHz) 640GB DRAM (64GB each) 144 TB SAS drives (24x600GB DAS each) Nodes inter-connected with 10GbE switch Test Software Early code (pre-ga) - BigInsights v1.3 on PowerLinux PowerLinux - SLES11sp1 15 *MapR used 1 master node and 10 slave nodes for its Terasort results (@Hadoop Summit 2011). Cloudera used 10 nodes (@Hadoop World 2011).
16 13 minutes 48 seconds Terasort by exploring PowerLinux 16 Results as of April 2012; further Integrated Optimization underway
17 Technology Example: GPFS-SNC for better Performance, Availability, Integrity and Manageability Query languages like Pig and JAQL need good random I/O performance Sort requires better sequential throughput GPFS is twice HDFS for both of the above GPFS-SNC Key technology Locality awareness Write Affinity Metablocks Pipelined replication Distributed recovery For document index lookups, client side caching is a big win 17x throughput speedup Hadoop Indexing (HDFS) Copy Web Service Layer Database Upload (ext3) Fetch HDFS: Extra copy overhead and network fetch, separate clusters for analytics and database Workload Isolation Hadoop Indexing + Database Upload (GPFS) Cache Web Service Layer GPFS: Single cluster for analytics and database, no copying required, caching for web layer Proven data integrity Replicated metadata services Yahoo keeps 3 copies of 3 versions of HDFS because of unknown data integrity [1] Quantcast deletes files once HDFS is 50% full [2] [1] Care and Feeding of Hadoop Clusters, Marc Nicosia, Usenix 2009 [2] The Komos Distributed File System, Sriram Rao, Quantcast Inc. 17
18 Technology Example: OpenCL OpenCL (Open Computing Language) is an open standard for cross-platform, parallel programming of modern processors found in personal computers, servers and handheld/ embedded devices. Highly flexible: supports computation on CPUs, GPUs, accelerators (SIMD, FPGAs, DSPs) Research contributions 2.8 X acceleration factor for the sparse coding phase (considering the best timings for each implementation) for NMF 2.0 X overall algorithm improvement factor, including the preprocessing costs of NMF Java Cluster CPU Usage JOCL Cluster CPU Usage 18
19 Technology Example: Reconfigurable FPGA Acceleration FPGA FPGA CPU CPU N E T W O R K Host Server (POWER) Bandwidth reduction (& capacity increase) Through (De)Compression World s fastest gzip (Research Contributions) Bandwidth reduction Through Filtering Big Data: Dictionary & Regexp based filtering Net result: Significant increase in capacity and throughput in place (computing where data is at)
20 Recap: Example Research Issues Data generation, benchmarks, metrics and workload characterization for analytics and dataintensive computing, e.g. how accurate/fast/efficient is necessary and tradeoffs Accelerators in heterogeneous and hybrid systems for analytics and data-intensive computing Scalable system and network designs for capturing large numbers of concurrent data streams or high bandwidth data streaming Data management for vast amounts of unstructured data OS, distributed systems and system management support for very large-scale analytics Debugging and performance analysis tools for analytics and data-intensive computing Programming systems and language support for deep analytics Mapreduce and other processing paradigms for analytics Processor, memory and system architectures for data analytics Implications of data analytics to cloud computing Implications of data analytics to mobile, embedded and autonomous systems Energy-efficiency and energy-efficient designs for analytics Availability, fault tolerance and data recovery in large-scale data-oriented environments Self learning and cognitive system? 20
21 Conclusions Big Data, Big Data, Big Data!! Open to research collaboration At your convenience, welcome to stop by 2 nd ASBD (Architecture and Systems for Big Data) workshop at ISCA next Saturday, 06/09 in Portland!
22 CHANGE, a 2012 DAC workshop 2nd International Workshop on Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments Moscone Center, San Francisco, California, June 3, 2012 Big Data System and Architecture Jian Li IBM Research in Austin jianli@us.ibm.com 2011 IBM Corporation
IBM Data Warehousing and Analytics Portfolio Summary
IBM Information Management IBM Data Warehousing and Analytics Portfolio Summary Information Management Mike McCarthy IBM Corporation mmccart1@us.ibm.com IBM Information Management Portfolio Current Data
More informationIBM Big Data Platform
Mike Winer IBM Information Management IBM Big Data Platform The big data opportunity Extracting insight from an immense volume, variety and velocity of data, in a timely and cost-effective manner. Variety:
More informationExploiting Data at Rest and Data in Motion with a Big Data Platform
Exploiting Data at Rest and Data in Motion with a Big Data Platform Sarah Brader, sarah_brader@uk.ibm.com What is Big Data? Where does it come from? 12+ TBs of tweet data every day 30 billion RFID tags
More informationCSE-E5430 Scalable Cloud Computing Lecture 2
CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing
More informationInfomatics. Big-Data and Hadoop Developer Training with Oracle WDP
Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools
More informationBig Data and Trusted Information
Dr. Oliver Adamczak Big Data and Trusted Information CAS Single Point of Truth 7. Mai 2012 The Hype Big Data: The next frontier for innovation, competition and productivity McKinsey Global Institute 2012
More informationHur hanterar vi utmaningar inom området - Big Data. Jan Östling Enterprise Technologies Intel Corporation, NER
Hur hanterar vi utmaningar inom området - Big Data Jan Östling Enterprise Technologies Intel Corporation, NER Legal Disclaimers All products, computer systems, dates, and figures specified are preliminary
More informationIBM Big Data Platform
IBM Big Data Platform Turning big data into smarter decisions Stefan Söderlund. IBM kundarkitekt, Försvarsmakten Sesam vår-seminarie Big Data, Bigga byte kräver Pigga Hertz! May 16, 2013 By 2015, 80% of
More informationAddressing Open Source Big Data, Hadoop, and MapReduce limitations
Addressing Open Source Big Data, Hadoop, and MapReduce limitations 1 Agenda What is Big Data / Hadoop? Limitations of the existing hadoop distributions Going enterprise with Hadoop 2 How Big are Data?
More informationHow In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time
SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first
More informationDriving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA
WHITE PAPER April 2014 Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA Executive Summary...1 Background...2 File Systems Architecture...2 Network Architecture...3 IBM BigInsights...5
More informationHadoop: Embracing future hardware
Hadoop: Embracing future hardware Suresh Srinivas @suresh_m_s Page 1 About Me Architect & Founder at Hortonworks Long time Apache Hadoop committer and PMC member Designed and developed many key Hadoop
More informationMassive Scale Analytics for a Smarter Planet
David Konopnicki - Haifa Research Lab Massive Scale Analytics for a Smarter Planet The Big Data Challenge Manage and benefit from massive and growing amounts of data 44x growth in coming decade from 800,000
More informationMaximizing Hadoop Performance and Storage Capacity with AltraHD TM
Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Executive Summary The explosion of internet data, driven in large part by the growth of more and more powerful mobile devices, has created
More informationAn Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database
An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct
More informationBIG DATA TRENDS AND TECHNOLOGIES
BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.
More informationGraySort and MinuteSort at Yahoo on Hadoop 0.23
GraySort and at Yahoo on Hadoop.23 Thomas Graves Yahoo! May, 213 The Apache Hadoop[1] software library is an open source framework that allows for the distributed processing of large data sets across clusters
More informationArchitecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics
More informationLarge scale processing using Hadoop. Ján Vaňo
Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine
More informationKlarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance
Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance IBM s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice
More informationManifest for Big Data Pig, Hive & Jaql
Manifest for Big Data Pig, Hive & Jaql Ajay Chotrani, Priyanka Punjabi, Prachi Ratnani, Rupali Hande Final Year Student, Dept. of Computer Engineering, V.E.S.I.T, Mumbai, India Faculty, Computer Engineering,
More informationReal Time Big Data Processing
Real Time Big Data Processing Cloud Expo 2014 Ian Meyers Amazon Web Services Global Infrastructure Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure
More informationBeyond Watson: The Business Implications of Big Data
Beyond Watson: The Business Implications of Big Data Shankar Venkataraman IBM Program Director, STSM, Big Data August 10, 2011 The World is Changing and Becoming More INSTRUMENTED INTERCONNECTED INTELLIGENT
More informationAccelerating and Simplifying Apache
Accelerating and Simplifying Apache Hadoop with Panasas ActiveStor White paper NOvember 2012 1.888.PANASAS www.panasas.com Executive Overview The technology requirements for big data vary significantly
More informationEntering the Zettabyte Age Jeffrey Krone
Entering the Zettabyte Age Jeffrey Krone 1 Kilobyte 1,000 bits/byte. 1 megabyte 1,000,000 1 gigabyte 1,000,000,000 1 terabyte 1,000,000,000,000 1 petabyte 1,000,000,000,000,000 1 exabyte 1,000,000,000,000,000,000
More informationNews and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren
News and trends in Data Warehouse Automation, Big Data and BI Johan Hendrickx & Dirk Vermeiren Extreme Agility from Source to Analysis DWH Appliances & DWH Automation Typical Architecture 3 What Business
More informationRaul F. Chong Senior program manager Big data, DB2, and Cloud IM Cloud Computing Center of Competence - IBM Toronto Lab, Canada
What is big data? Raul F. Chong Senior program manager Big data, DB2, and Cloud IM Cloud Computing Center of Competence - IBM Toronto Lab, Canada 1 2011 IBM Corporation Agenda The world is changing What
More informationAre You Ready for Big Data?
Are You Ready for Big Data? Jim Gallo National Director, Business Analytics February 11, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?
More informationArchitectures for massive data management
Architectures for massive data management Apache Kafka, Samza, Storm Albert Bifet albert.bifet@telecom-paristech.fr October 20, 2015 Stream Engine Motivation Digital Universe EMC Digital Universe with
More informationSurfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics
Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics Dr. Liangxiu Han Future Networks and Distributed Systems Group (FUNDS) School of Computing, Mathematics and Digital Technology,
More informationW H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract
W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the
More informationIBM System x reference architecture solutions for big data
IBM System x reference architecture solutions for big data Easy-to-implement hardware, software and services for analyzing data at rest and data in motion Highlights Accelerates time-to-value with scalable,
More informationInformation Architecture
The Bloor Group Actian and The Big Data Information Architecture WHITE PAPER The Actian Big Data Information Architecture Actian and The Big Data Information Architecture Originally founded in 2005 to
More informationIBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!
The Bloor Group IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS VENDOR PROFILE The IBM Big Data Landscape IBM can legitimately claim to have been involved in Big Data and to have a much broader
More informationData Refinery with Big Data Aspects
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data
More informationLecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop
Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social
More informationInfrastructure Matters: POWER8 vs. Xeon x86
Advisory Infrastructure Matters: POWER8 vs. Xeon x86 Executive Summary This report compares IBM s new POWER8-based scale-out Power System to Intel E5 v2 x86- based scale-out systems. A follow-on report
More informationPetabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013
Petabyte Scale Data at Facebook Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013 Agenda 1 Types of Data 2 Data Model and API for Facebook Graph Data 3 SLTP (Semi-OLTP) and Analytics
More informationIBM InfoSphere BigInsights Enterprise Edition
IBM InfoSphere BigInsights Enterprise Edition Efficiently manage and mine big data for valuable insights Highlights Advanced analytics for structured, semi-structured and unstructured data Professional-grade
More informationHadoop & Spark Using Amazon EMR
Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?
More informationHadoop Cluster Applications
Hadoop Overview Data analytics has become a key element of the business decision process over the last decade. Classic reporting on a dataset stored in a database was sufficient until recently, but yesterday
More informationBig Data and Data Quality - Mutually Exclusive?
Session 11929 Big Data and Data Quality - Mutually Exclusive? Tom Deutsch tdeutsch@us.ibm.com Program Director, Big Data August 9, 2012 Abstract It is popular to think that Big Data technologies are so
More informationModernizing Your Data Warehouse for Hadoop
Modernizing Your Data Warehouse for Hadoop Big data. Small data. All data. Audie Wright, DW & Big Data Specialist Audie.Wright@Microsoft.com O 425-538-0044, C 303-324-2860 Unlock Insights on Any Data Taking
More informationOverview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics
Overview Big Data in Apache Hadoop - HDFS - MapReduce in Hadoop - YARN https://hadoop.apache.org 138 Apache Hadoop - Historical Background - 2003: Google publishes its cluster architecture & DFS (GFS)
More informationA REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information
More informationA Survey on Big Data Concepts and Tools
A Survey on Big Data Concepts and Tools D. Rajasekar 1, C. Dhanamani 2, S. K. Sandhya 3 1,3 PG Scholar, 2 Assistant Professor, Department of Computer Science and Engineering, Sri Krishna College of Engineering
More informationBig Data and Advanced Analytics Applications and Capabilities Steven Hagan, Vice President, Server Technologies
Big Data and Advanced Analytics Applications and Capabilities Steven Hagan, Vice President, Server Technologies 1 Copyright 2011, Oracle and/or its affiliates. All rights Big Data, Advanced Analytics:
More informationTurbo-Charging Open Source Hadoop for Faster, more Meaningful Insights
Turbo-Charging Open Source Hadoop for Faster, more Meaningful Insights Gord Sissons Senior Manager, Technical Marketing IM Platform Computing gsissons@ca.ibm.com Agenda Some Context IM Platform Computing
More informationDATA MINING WITH HADOOP AND HIVE Introduction to Architecture
DATA MINING WITH HADOOP AND HIVE Introduction to Architecture Dr. Wlodek Zadrozny (Most slides come from Prof. Akella s class in 2014) 2015-2025. Reproduction or usage prohibited without permission of
More informationHadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh
1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets
More informationAre You Ready for Big Data?
Are You Ready for Big Data? Jim Gallo National Director, Business Analytics April 10, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?
More informationHadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
More informationVirtualizing Apache Hadoop. June, 2012
June, 2012 Table of Contents EXECUTIVE SUMMARY... 3 INTRODUCTION... 3 VIRTUALIZING APACHE HADOOP... 4 INTRODUCTION TO VSPHERE TM... 4 USE CASES AND ADVANTAGES OF VIRTUALIZING HADOOP... 4 MYTHS ABOUT RUNNING
More informationDeIC Watson Agreement - hvad betyder den for DeIC medlemmerne
DeIC Watson Agreement - hvad betyder den for DeIC medlemmerne Preben Jacobsen Solution Architect Nordic Lead, Software Defined Infrastructure Group IBM Danmark 2014 IBM Corporation Link: https://www.youtube.com/watch?v=_xcmh1lqb9i
More informationDell In-Memory Appliance for Cloudera Enterprise
Dell In-Memory Appliance for Cloudera Enterprise Hadoop Overview, Customer Evolution and Dell In-Memory Product Details Author: Armando Acosta Hadoop Product Manager/Subject Matter Expert Armando_Acosta@Dell.com/
More informationUsing Big Data for Smarter Decision Making. Colin White, BI Research July 2011 Sponsored by IBM
Using Big Data for Smarter Decision Making Colin White, BI Research July 2011 Sponsored by IBM USING BIG DATA FOR SMARTER DECISION MAKING To increase competitiveness, 83% of CIOs have visionary plans that
More informationHow To Store Data On An Ocora Nosql Database On A Flash Memory Device On A Microsoft Flash Memory 2 (Iomemory)
WHITE PAPER Oracle NoSQL Database and SanDisk Offer Cost-Effective Extreme Performance for Big Data 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Abstract... 3 What Is Big Data?...
More informationChapter 7. Using Hadoop Cluster and MapReduce
Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in
More informationRole of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop
Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Kanchan A. Khedikar Department of Computer Science & Engineering Walchand Institute of Technoloy, Solapur, Maharashtra,
More informationOracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>
s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline
More informationHADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW
HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW 757 Maleta Lane, Suite 201 Castle Rock, CO 80108 Brett Weninger, Managing Director brett.weninger@adurant.com Dave Smelker, Managing Principal dave.smelker@adurant.com
More informationDell Reference Configuration for Hortonworks Data Platform
Dell Reference Configuration for Hortonworks Data Platform A Quick Reference Configuration Guide Armando Acosta Hadoop Product Manager Dell Revolutionary Cloud and Big Data Group Kris Applegate Solution
More informationIntroduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
More informationEinsatzfelder von IBM PureData Systems und Ihre Vorteile.
Einsatzfelder von IBM PureData Systems und Ihre Vorteile demirkaya@de.ibm.com Agenda Information technology challenges PureSystems and PureData introduction PureData for Transactions PureData for Analytics
More informationPlay with Big Data on the Shoulders of Open Source
OW2 Open Source Corporate Network Meeting Play with Big Data on the Shoulders of Open Source Liu Jie Technology Center of Software Engineering Institute of Software, Chinese Academy of Sciences 2012-10-19
More informationOutline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging
Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging
More informationHadoop Architecture. Part 1
Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,
More informationBig Data. Value, use cases and architectures. Petar Torre Lead Architect Service Provider Group. Dubrovnik, Croatia, South East Europe 20-22 May, 2013
Dubrovnik, Croatia, South East Europe 20-22 May, 2013 Big Data Value, use cases and architectures Petar Torre Lead Architect Service Provider Group 2011 2013 Cisco and/or its affiliates. All rights reserved.
More informationSEAIP 2009 Presentation
SEAIP 2009 Presentation By David Tan Chair of Yahoo! Hadoop SIG, 2008-2009,Singapore EXCO Member of SGF SIG Imperial College (UK), Institute of Fluid Science (Japan) & Chicago BOOTH GSB (USA) Alumni Email:
More informationHow To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI
More informationHere comes the flood Tools for Big Data analytics. Guy Chesnot -June, 2012
Here comes the flood Tools for Big Data analytics Guy Chesnot -June, 2012 Agenda Data flood Implementations Hadoop Not Hadoop 2 Agenda Data flood Implementations Hadoop Not Hadoop 3 Forecast Data Growth
More informationOracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
More informationIBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look
IBM BigInsights Has Potential If It Lives Up To Its Promise By Prakash Sukumar, Principal Consultant at iolap, Inc. IBM released Hadoop-based InfoSphere BigInsights in May 2013. There are already Hadoop-based
More informationHadoop IST 734 SS CHUNG
Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to
More informationBig Data and Apache Hadoop s MapReduce
Big Data and Apache Hadoop s MapReduce Michael Hahsler Computer Science and Engineering Southern Methodist University January 23, 2012 Michael Hahsler (SMU/CSE) Hadoop/MapReduce January 23, 2012 1 / 23
More informationArchitecture & Experience
Architecture & Experience Data Mining - Combination from SAP HANA, R & Hadoop Markus Severin, Solution Principal Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein
More informationImplement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
More informationApache Hadoop: Past, Present, and Future
The 4 th China Cloud Computing Conference May 25 th, 2012. Apache Hadoop: Past, Present, and Future Dr. Amr Awadallah Founder, Chief Technical Officer aaa@cloudera.com, twitter: @awadallah Hadoop Past
More informationAn Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics
An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,
More informationBIG DATA TECHNOLOGY. Hadoop Ecosystem
BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big
More informationA brief introduction of IBM s work around Hadoop - BigInsights
A brief introduction of IBM s work around Hadoop - BigInsights Yuan Hong Wang Manager, Analytics Infrastructure Development China Development Lab, IBM yhwang@cn.ibm.com Adding IBM Value To Hadoop Role
More informationBig Data Performance Growth on the Rise
Impact of Big Data growth On Transparent Computing Michael A. Greene Intel Vice President, Software and Services Group, General Manager, System Technologies and Optimization 1 Transparent Computing (TC)
More informationBENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next
More informationTransforming the Telecoms Business using Big Data and Analytics
Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe
More informationPoslovni slučajevi upotrebe IBM Netezze
Poslovni slučajevi upotrebe IBM Netezze data at the Speed and with Simplicity businesses need 25. ožujak 2015. vedran.travica@hr.ibm.com Agenda A. IBM PureData for Analytics Netezza B. Scenarij 1.: Novi
More informationLuncheon Webinar Series May 13, 2013
Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration
More informationBig Fast Data Hadoop acceleration with Flash. June 2013
Big Fast Data Hadoop acceleration with Flash June 2013 Agenda The Big Data Problem What is Hadoop Hadoop and Flash The Nytro Solution Test Results The Big Data Problem Big Data Output Facebook Traditional
More informationData processing goes big
Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,
More informationTHE HADOOP DISTRIBUTED FILE SYSTEM
THE HADOOP DISTRIBUTED FILE SYSTEM Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Presented by Alexander Pokluda October 7, 2013 Outline Motivation and Overview of Hadoop Architecture,
More informationHadoop implementation of MapReduce computational model. Ján Vaňo
Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed
More informationCloud Computing Where ISR Data Will Go for Exploitation
Cloud Computing Where ISR Data Will Go for Exploitation 22 September 2009 Albert Reuther, Jeremy Kepner, Peter Michaleas, William Smith This work is sponsored by the Department of the Air Force under Air
More informationA Brief Introduction to Apache Tez
A Brief Introduction to Apache Tez Introduction It is a fact that data is basically the new currency of the modern business world. Companies that effectively maximize the value of their data (extract value
More informationOracle Big Data Building A Big Data Management System
Oracle Big Building A Big Management System Copyright 2015, Oracle and/or its affiliates. All rights reserved. Effi Psychogiou ECEMEA Big Product Director May, 2015 Safe Harbor Statement The following
More informationHadoop Big Data for Processing Data and Performing Workload
Hadoop Big Data for Processing Data and Performing Workload Girish T B 1, Shadik Mohammed Ghouse 2, Dr. B. R. Prasad Babu 3 1 M Tech Student, 2 Assosiate professor, 3 Professor & Head (PG), of Computer
More informationBig Data Explained. An introduction to Big Data Science.
Big Data Explained An introduction to Big Data Science. 1 Presentation Agenda What is Big Data Why learn Big Data Who is it for How to start learning Big Data When to learn it Objective and Benefits of
More informationForecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014
Forecast of Big Data Trends Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014 Big Data transforms Business 2 Data created every minute Source http://mashable.com/2012/06/22/data-created-every-minute/
More informationHadoopTM Analytics DDN
DDN Solution Brief Accelerate> HadoopTM Analytics with the SFA Big Data Platform Organizations that need to extract value from all data can leverage the award winning SFA platform to really accelerate
More informationIBM Netezza High Capacity Appliance
IBM Netezza High Capacity Appliance Petascale Data Archival, Analysis and Disaster Recovery Solutions IBM Netezza High Capacity Appliance Highlights: Allows querying and analysis of deep archival data
More informationData Lake In Action: Real-time, Closed Looped Analytics On Hadoop
1 Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop 2 Pivotal s Full Approach It s More Than Just Hadoop Pivotal Data Labs 3 Why Pivotal Exists First Movers Solve the Big Data Utility Gap
More information