Big Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D.
|
|
- Winifred Fields
- 8 years ago
- Views:
Transcription
1 Big Data Technology ดร.ช ชาต หฤไชยะศ กด Choochart Haruechaiyasak, Ph.D. Speech and Audio Technology Laboratory (SPT) National Electronics and Computer Technology Center (NECTEC) National Science and Technology Development Agency (NSTDA)
2 Presentation outline Big Data Technology Overview, definition, motivation and properties BI vs. Data science Applications Big Data Tools Hadoop NoSQL MongoDB
3 What is Big Data?
4
5 Big Data: Motivation Source:
6 Structured vs. Unstructured Data Source:
7 Structured vs. Unstructured Data Source:
8 Big Data vs. Traditional Data Source:
9 Big Data vs. Traditional Data Source:
10 Big Data vs. Traditional Data Source:
11 Big Data vs. Traditional Data Source:
12 7 Key Drivers for the Big Data Market Source:
13
14
15 3Vs of Big Data
16 3Vs of Big Data
17
18 4Vs of Big Data
19 4Vs of Big Data
20 4Vs of Big Data
21 4Vs of Big Data
22 4Vs of Big Data
23 Where does Big Data come from? Source:
24 Gartner s Hype Cycle for Big Data chart
25 Big Data Business Model Maturity Chart Source:
26 Big Data Business Model Maturity Chart Source:
27 Big Data Business Model Maturity Chart Source: Bill Schmarzo, Big Data: Understanding How Data Powers Big Business
28 Difference between BI and Data Science Source: Bill Schmarzo, CTO, EMC Consulting, Enterprise Information Management & Analytics,
29 Difference between BI and Data Science Source: Bill Schmarzo, CTO, EMC Consulting, Enterprise Information Management & Analytics,
30 Difference between BI and Data Science Source: Bill Schmarzo, CTO, EMC Consulting, Enterprise Information Management & Analytics,
31 Data Scientist Source: Source:
32 Source:
33 Source:
34
35
36
37 What is Hadoop? Source: The content from this section is summarized from
38 Data is growing fast Three reasons why we are generating data faster than ever: (1) Processes are increasingly automated; (2) Systems are increasingly interconnected; (3) People are increasingly living online.
39 Data and system evolution
40 Real-time data analytics The continuous challenge in Web 2.0 is how to improve site relevance, performance, understand user behavior, and predictive insight to influence decisions. Industries - Travel, Retail, Financial Services, Digital Media, Search etc. that are consumer oriented are all facing similar real-time information dynamics.
41 What is Hadoop? Hadoop is a scalable fault-tolerant distributed system for data storage and processing (Apache license). Core Hadoop has two main systems: (1) Hadoop Distributed File System (HDFS): self-healing high-bandwidth clustered storage. (2) MapReduce: distributed faulttolerant resource management and scheduling coupled with a scalable data programming abstraction.
42 Hadoop: MapReduce
43 Hadoop's benefts Flexibility Store any data (structured or not), Run any analysis. Scalability Start at 1TB/3-nodes grow to PB/1000s of nodes. Economics Cost per TB at a fraction of traditional options.
44 Traditional DBMS Traditional relational databases and data warehouse products excel at OLAP and OLTP workloads over structured data. These form the underpinnings of most IT applications. Use relational databases when dealing with (1) Interactive OLAP Analytics; (2) Multistep ACID Transactions (3) 100% SQL Compliance. It is becoming increasingly more diffcult for classic techniques to support the wide range of use cases and workloads that power the next wave of digital business
45 Hadoop's approach Hadoop is designed to solve a different problem: the fast, reliable analysis of both structured, unstructured and complex data. Hadoop and related software are designed for 3V s: (1) Volume Commodity hardware and open source software lowers cost and increases capacity; (2) Velocity Data ingest speed aided by append-only and schema-on-read design; (3) Variety Multiple tools to structure, process, and access data.
46 Scenarios for Using Hadoop When a user types a query, it isn t practical to exhaustively scan millions of items. Instead it makes sense to create an index and use it to rank items and fnd the best matches. Hadoop provides a distributed indexing capability. Hadoop runs on a collection/cluster of commodity, shared-nothing x86 servers. You can add or remove servers in a Hadoop cluster (sizes from 50, 100 to even nodes) at will; the system detects and compensates for hardware or system problems on any server. Hadoop is self-healing and fault tolerant. It can deliver data and can run large-scale, high-performance processing batch jobs in spite of system changes or failures.
47 Scenarios for Using Hadoop (cont'd) 1) Hadoop as an ETL and Filtering Platform One of the biggest challenges with high volume data sources is extracting valuable signal from lot of noise. Hadoop platforms can read in the raw data, apply appropriate flters and logic, and output a structured summary or refned data set. This output (e.g., hourly index refreshes) can be further analyzed or serve as an input to a more traditional analytic environment like SAS. Typically a small % of a raw data feed is required for any business problem.
48 Scenarios for Using Hadoop (cont'd) 2) Hadoop as an exploration engine Once the data is in the MapReduce cluster, using tools to analyze data where it sits makes sense. As the refned output is in a Hadoop cluster, new data can be added to the existing pile without having to reindex all over again. In other words, new data can be added to existing data summaries. Once the data is distilled, it can be loaded into corporate systems so users have wider access to it.
49 Scenarios for Using Hadoop (cont'd) 3) Hadoop as an Archive Historical data is usually archived by tape or disk to secondary storage or sent offsite. When this data is needed for analysis, it s painful and costly to retrieve it and load it back up. With cheap storage in a distributed cluster, lot s of data can be kept active for continuous analysis. Hadoop is effcient it allows better utilization of hardware by allowing the generation of different index types in one cluster.
50 The Hadoop Stack Cloudera s Distribution for Hadoop (CDH)
51
52
53
54 Hadoop's Case Study:
55
56 Inverted index Source: searchkit_basics/searchkit_basics.html
57 Source: This slide is from Stanford course: CS 276 / LING 286: Information Retrieval and Web Search,
58 Source: This slide is from Stanford course: CS 276 / LING 286: Information Retrieval and Web Search,
59 Source: This slide is from Stanford course: CS 276 / LING 286: Information Retrieval and Web Search,
60 Source: This slide is from Stanford course: CS 276 / LING 286: Information Retrieval and Web Search,
61 Source: This slide is from Stanford course: CS 276 / LING 286: Information Retrieval and Web Search,
62 Source: This slide is from Stanford course: CS 276 / LING 286: Information Retrieval and Web Search,
63 Source: This slide is from Stanford course: CS 276 / LING 286: Information Retrieval and Web Search,
64 Source: This slide is from Stanford course: CS 276 / LING 286: Information Retrieval and Web Search,
65 Source: This slide is from Stanford course: CS 276 / LING 286: Information Retrieval and Web Search,
66 Source: This slide is from Stanford course: CS 276 / LING 286: Information Retrieval and Web Search,
67 Source: This slide is from Stanford course: CS 276 / LING 286: Information Retrieval and Web Search,
68 Source: This slide is from Stanford course: CS 276 / LING 286: Information Retrieval and Web Search,
69 Source: This slide is from Stanford course: CS 276 / LING 286: Information Retrieval and Web Search,
70 Hadoop: MapReduce
71 MapReduce: Google Distributed Indexing
72 Source: This slide is from Stanford course: CS 276 / LING 286: Information Retrieval and Web Search,
73 Handling Big Data with MongoDB
74 What is NoSQL?
75 RDBMS VS. NoSQL Database transaction properties: Atomic: Everything in a transaction succeeds or the entire transaction is rolled back. Consistent: A transaction cannot leave the database in an inconsistent state. Isolated: Transactions cannot interfere with each other. Durable: Completed transactions persist, even when servers restart etc. BASE Basically Available Soft-state services with Eventual-consistency.
76 Why is NoSQL becoming popular?
77 RDMBS VS NoSQL: an example
78 RDMBS VS NoSQL: an example
79 Source:
80
81
82
83
84
85
86 Google Trends
87
88
89 MongoDB: Key idea
90
91
92 MongoDB: Auto-sharding
93 RDBMS vs. MongoDB
94
95
96
97
98 Example 1. Create a Java Project 2. Get Mongo Java Driver
99 Example
100 Example
101 Example
102 Example
103 Example
104
105 Thank You Q&A
Apache Hadoop: Past, Present, and Future
The 4 th China Cloud Computing Conference May 25 th, 2012. Apache Hadoop: Past, Present, and Future Dr. Amr Awadallah Founder, Chief Technical Officer aaa@cloudera.com, twitter: @awadallah Hadoop Past
More informationBIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON
BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing
More informationHADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics
HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics ESSENTIALS EMC ISILON Use the industry's first and only scale-out NAS solution with native Hadoop
More informationWell packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances
INSIGHT Oracle's All- Out Assault on the Big Data Market: Offering Hadoop, R, Cubes, and Scalable IMDB in Familiar Packages Carl W. Olofson IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA
More informationTap into Hadoop and Other No SQL Sources
Tap into Hadoop and Other No SQL Sources Presented by: Trishla Maru What is Big Data really? The Three Vs of Big Data According to Gartner Volume Volume Orders of magnitude bigger than conventional data
More informationManaging Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
More informationBig + Fast + Safe + Simple = Lowest Technical Risk
Big + Fast + Safe + Simple = Lowest Technical Risk The Synergy of Greenplum and Isilon Architecture in HP Environments Steffen Thuemmel (Isilon) Andreas Scherbaum (Greenplum) 1 Our problem 2 What is Big
More informationTHE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS
THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS WHITE PAPER Successfully writing Fast Data applications to manage data generated from mobile, smart devices and social interactions, and the
More informationBig Data: Are You Ready? Kevin Lancaster
Big Data: Are You Ready? Kevin Lancaster Director, Engineered Systems Oracle Europe, Middle East & Africa 1 A Data Explosion... Traditional Data Sources Billing engines Custom developed New, Non-Traditional
More informationExecutive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform...
Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure Requirements... 5 Solution Spectrum... 6 Oracle s Big Data
More informationAnalytics in the Cloud. Peter Sirota, GM Elastic MapReduce
Analytics in the Cloud Peter Sirota, GM Elastic MapReduce Data-Driven Decision Making Data is the new raw material for any business on par with capital, people, and labor. What is Big Data? Terabytes of
More informationHadoop and Map-Reduce. Swati Gore
Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data
More informationESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
More informationCan the Elephants Handle the NoSQL Onslaught?
Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented
More informationINTRODUCTION TO CASSANDRA
INTRODUCTION TO CASSANDRA This ebook provides a high level overview of Cassandra and describes some of its key strengths and applications. WHAT IS CASSANDRA? Apache Cassandra is a high performance, open
More informationThe Future of Data Management
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class
More informationApache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc. aaa@cloudera.com, twicer: @awadallah
Apache Hadoop: The Pla/orm for Big Data Amr Awadallah CTO, Founder, Cloudera, Inc. aaa@cloudera.com, twicer: @awadallah 1 The Problems with Current Data Systems BI Reports + Interac7ve Apps RDBMS (aggregated
More informationChallenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2
More informationBig Data and Apache Hadoop Adoption:
Expert Reference Series of White Papers Big Data and Apache Hadoop Adoption: Key Challenges and Rewards 1-800-COURSES www.globalknowledge.com Big Data and Apache Hadoop Adoption: Key Challenges and Rewards
More informationNoSQL for SQL Professionals William McKnight
NoSQL for SQL Professionals William McKnight Session Code BD03 About your Speaker, William McKnight President, McKnight Consulting Group Frequent keynote speaker and trainer internationally Consulted to
More informationOracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>
s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline
More informationApache Hadoop in the Enterprise. Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com
Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com Cloudera The Leader in Big Data Management Powered by Apache Hadoop The Leading Open Source Distribution of Apache
More informationHadoop: Distributed Data Processing. Amr Awadallah Founder/CTO, Cloudera, Inc. ACM Data Mining SIG Thursday, January 25 th, 2010
Hadoop: Distributed Data Processing Amr Awadallah Founder/CTO, Cloudera, Inc. ACM Data Mining SIG Thursday, January 25 th, 2010 Outline Scaling for Large Data Processing What is Hadoop? HDFS and MapReduce
More informationBig Data Open Source Stack vs. Traditional Stack for BI and Analytics
Big Data Open Source Stack vs. Traditional Stack for BI and Analytics Part I By Sam Poozhikala, Vice President Customer Solutions at StratApps Inc. 4/4/2014 You may contact Sam Poozhikala at spoozhikala@stratapps.com.
More informationBig Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect
Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate
More informationTapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru
Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy Presented by: Jeffrey Zhang and Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop?
More informationHow To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI
More informationBIG DATA TRENDS AND TECHNOLOGIES
BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.
More informationAn Integrated Big Data & Analytics Infrastructure June 14, 2012 Robert Stackowiak, VP Oracle ESG Data Systems Architecture
An Integrated Big Data & Analytics Infrastructure June 14, 2012 Robert Stackowiak, VP ESG Data Systems Architecture Big Data & Analytics as a Service Components Unstructured Data / Sparse Data of Value
More informationThe Inside Scoop on Hadoop
The Inside Scoop on Hadoop Orion Gebremedhin National Solutions Director BI & Big Data, Neudesic LLC. VTSP Microsoft Corp. Orion.Gebremedhin@Neudesic.COM B-orgebr@Microsoft.com @OrionGM The Inside Scoop
More informationBig Data on Microsoft Platform
Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4
More informationHow Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns
How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns Table of Contents Abstract... 3 Introduction... 3 Definition... 3 The Expanding Digitization
More information#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld
Tapping into Hadoop and NoSQL Data Sources in MicroStrategy Presented by: Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop? Customer Case
More informationBig Data: Beyond the Hype
Big Data: Beyond the Hype Why Big Data Matters to You WHITE PAPER By DataStax Corporation March 2012 Contents Introduction... 3 Big Data and You... 5 Big Data Is More Prevalent Than You Think... 5 Big
More informationProtecting Big Data Data Protection Solutions for the Business Data Lake
White Paper Protecting Big Data Data Protection Solutions for the Business Data Lake Abstract Big Data use cases are maturing and customers are using Big Data to improve top and bottom line revenues. With
More informationHow To Use Big Data For Telco (For A Telco)
ON-LINE VIDEO ANALYTICS EMBRACING BIG DATA David Vanderfeesten, Bell Labs Belgium ANNO 2012 YOUR DATA IS MONEY BIG MONEY! Your click stream, your activity stream, your electricity consumption, your call
More informationMaking Sense of Big Data in Insurance
Making Sense of Big Data in Insurance Amir Halfon, CTO, Financial Services, MarkLogic Corporation BIG DATA?.. SLIDE: 2 The Evolution of Data Management For your application data! Application- and hardware-specific
More informationThe Enterprise Data Hub and The Modern Information Architecture
The Enterprise Data Hub and The Modern Information Architecture Dr. Amr Awadallah CTO & Co-Founder, Cloudera Twitter: @awadallah 1 2013 Cloudera, Inc. All rights reserved. Cloudera Overview The Leader
More informationForecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014
Forecast of Big Data Trends Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014 Big Data transforms Business 2 Data created every minute Source http://mashable.com/2012/06/22/data-created-every-minute/
More informationIntroduction to Hadoop. New York Oracle User Group Vikas Sawhney
Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop
More informationProact whitepaper on Big Data
Proact whitepaper on Big Data Summary Big Data is not a definite term. Even if it sounds like just another buzz word, it manifests some interesting opportunities for organisations with the skill, resources
More informationThe 3 questions to ask yourself about BIG DATA
The 3 questions to ask yourself about BIG DATA Do you have a big data problem? Companies looking to tackle big data problems are embarking on a journey that is full of hype, buzz, confusion, and misinformation.
More informationX4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released
General announcements In-Memory is available next month http://www.oracle.com/us/corporate/events/dbim/index.html X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released
More informationArchitecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics
More informationData Warehouse design
Data Warehouse design Design of Enterprise Systems University of Pavia 10/12/2013 2h for the first; 2h for hadoop - 1- Table of Contents Big Data Overview Big Data DW & BI Big Data Market Hadoop & Mahout
More informationBig Data: Beyond the Hype
Big Data: Beyond the Hype Why Big Data Matters to You WHITE PAPER Big Data: Beyond the Hype Why Big Data Matters to You By DataStax Corporation October 2011 Table of Contents Introduction...4 Big Data
More informationChukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84
Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics
More informationBusiness Intelligence for Big Data
Business Intelligence for Big Data Will Gorman, Vice President, Engineering May, 2011 2010, Pentaho. All Rights Reserved. www.pentaho.com. What is BI? Business Intelligence = reports, dashboards, analysis,
More informationBig Data Big Data/Data Analytics & Software Development
Big Data Big Data/Data Analytics & Software Development Danairat T. danairat@gmail.com, 081-559-1446 1 Agenda Big Data Overview Business Cases and Benefits Hadoop Technology Architecture Big Data Development
More informationHow To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
More informationAn Approach to Implement Map Reduce with NoSQL Databases
www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 8 Aug 2015, Page No. 13635-13639 An Approach to Implement Map Reduce with NoSQL Databases Ashutosh
More informationImplement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
More informationAn Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics
An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,
More informationArchitectures for Big Data Analytics A database perspective
Architectures for Big Data Analytics A database perspective Fernando Velez Director of Product Management Enterprise Information Management, SAP June 2013 Outline Big Data Analytics Requirements Spectrum
More informationSimplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!
Simplifying Big Data Analytics: Unifying Batch and Stream Processing John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!! Streaming Analy.cs S S S Scale- up Database Data And Compute Grid
More informationDell In-Memory Appliance for Cloudera Enterprise
Dell In-Memory Appliance for Cloudera Enterprise Hadoop Overview, Customer Evolution and Dell In-Memory Product Details Author: Armando Acosta Hadoop Product Manager/Subject Matter Expert Armando_Acosta@Dell.com/
More informationOne Size Doesn t Fit All Choosing which big data, NoSQL or database technology to use
One Size Doesn t Fit All Choosing which big data, NoSQL or database technology to use March 14, 2012 Mark R. Madsen http://thirdnature.net The problem of big is three problems of volume Computations! Amount
More informationREAL-TIME BIG DATA ANALYTICS
www.leanxcale.com info@leanxcale.com REAL-TIME BIG DATA ANALYTICS Blending Transactional and Analytical Processing Delivers Real-Time Big Data Analytics 2 ULTRA-SCALABLE FULL ACID FULL SQL DATABASE LeanXcale
More informationAGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW
AGENDA What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story Hadoop PDW Our BIG DATA Roadmap BIG DATA? Volume 59% growth in annual WW information 1.2M Zetabytes (10 21 bytes) this
More informationHigh Performance IT Insights. Building the Foundation for Big Data
High Performance IT Insights Building the Foundation for Big Data Page 2 For years, companies have been contending with a rapidly rising tide of data that needs to be captured, stored and used by the business.
More informationBig Data. Lyle Ungar, University of Pennsylvania
Big Data Big data will become a key basis of competition, underpinning new waves of productivity growth, innovation, and consumer surplus. McKinsey Data Scientist: The Sexiest Job of the 21st Century -
More informationINTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE
INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe
More informationBeyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations
Beyond Lambda - how to get from logical to physical Artur Borycki, Director International Technology & Innovations Simplification & Efficiency Teradata believe in the principles of self-service, automation
More informationBig Data Explained. An introduction to Big Data Science.
Big Data Explained An introduction to Big Data Science. 1 Presentation Agenda What is Big Data Why learn Big Data Who is it for How to start learning Big Data When to learn it Objective and Benefits of
More informationBig Fast Data Hadoop acceleration with Flash. June 2013
Big Fast Data Hadoop acceleration with Flash June 2013 Agenda The Big Data Problem What is Hadoop Hadoop and Flash The Nytro Solution Test Results The Big Data Problem Big Data Output Facebook Traditional
More informationBig Data Are You Ready? Thomas Kyte http://asktom.oracle.com
Big Data Are You Ready? Thomas Kyte http://asktom.oracle.com The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated
More informationHadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard
Hadoop and Relational base The Best of Both Worlds for Analytics Greg Battas Hewlett Packard The Evolution of Analytics Mainframe EDW Proprietary MPP Unix SMP MPP Appliance Hadoop? Questions Is Hadoop
More informationBIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES
BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES Relational vs. Non-Relational Architecture Relational Non-Relational Rational Predictable Traditional Agile Flexible Modern 2 Agenda Big Data
More informationW H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract
W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the
More informationCIO Guide How to Use Hadoop with Your SAP Software Landscape
SAP Solutions CIO Guide How to Use with Your SAP Software Landscape February 2013 Table of Contents 3 Executive Summary 4 Introduction and Scope 6 Big Data: A Definition A Conventional Disk-Based RDBMs
More informationAn Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise
An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise Solutions Group The following is intended to outline our
More informationA REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information
More informationSplice Machine: SQL-on-Hadoop Evaluation Guide www.splicemachine.com
REPORT Splice Machine: SQL-on-Hadoop Evaluation Guide www.splicemachine.com The content of this evaluation guide, including the ideas and concepts contained within, are the property of Splice Machine,
More informationBig Data. White Paper. Big Data Executive Overview WP-BD-10312014-01. Jafar Shunnar & Dan Raver. Page 1 Last Updated 11-10-2014
White Paper Big Data Executive Overview WP-BD-10312014-01 By Jafar Shunnar & Dan Raver Page 1 Last Updated 11-10-2014 Table of Contents Section 01 Big Data Facts Page 3-4 Section 02 What is Big Data? Page
More informationA Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel
A Next-Generation Analytics Ecosystem for Big Data Colin White, BI Research September 2012 Sponsored by ParAccel BIG DATA IS BIG NEWS The value of big data lies in the business analytics that can be generated
More informationBig Data Challenges. Alexandru Adrian TOLE Romanian American University, Bucharest, Romania adrian.tole@yahoo.com
Database Systems Journal vol. IV, no. 3/2013 31 Big Data Challenges Alexandru Adrian TOLE Romanian American University, Bucharest, Romania adrian.tole@yahoo.com The amount of data that is traveling across
More informationHow to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning
How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume
More informationBIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
More informationIntroduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
More informationAssociate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2
Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue
More informationHadoop implementation of MapReduce computational model. Ján Vaňo
Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed
More informationNavigating the Big Data infrastructure layer Helena Schwenk
mwd a d v i s o r s Navigating the Big Data infrastructure layer Helena Schwenk A special report prepared for Actuate May 2013 This report is the second in a series of four and focuses principally on explaining
More informationSQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford
SQL VS. NO-SQL Adapted Slides from Dr. Jennifer Widom from Stanford 55 Traditional Databases SQL = Traditional relational DBMS Hugely popular among data analysts Widely adopted for transaction systems
More informationDeploying an Operational Data Store Designed for Big Data
Deploying an Operational Data Store Designed for Big Data A fast, secure, and scalable data staging environment with no data volume or variety constraints Sponsored by: Version: 102 Table of Contents Introduction
More informationBig Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum
Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All
More informationConjugating data mood and tenses: Simple past, infinite present, fast continuous, simpler imperative, conditional future perfect
Matteo Migliavacca (mm53@kent) School of Computing Conjugating data mood and tenses: Simple past, infinite present, fast continuous, simpler imperative, conditional future perfect Simple past - Traditional
More informationAn Oracle White Paper June 2013. Oracle: Big Data for the Enterprise
An Oracle White Paper June 2013 Oracle: Big Data for the Enterprise Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure
More informationBeyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc.
Beyond Web Application Log Analysis using Apache TM Hadoop A Whitepaper by Orzota, Inc. 1 Web Applications As more and more software moves to a Software as a Service (SaaS) model, the web application has
More informationOpen source large scale distributed data management with Google s MapReduce and Bigtable
Open source large scale distributed data management with Google s MapReduce and Bigtable Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory
More informationLambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
More informationReport Data Management in the Cloud: Limitations and Opportunities
Report Data Management in the Cloud: Limitations and Opportunities Article by Daniel J. Abadi [1] Report by Lukas Probst January 4, 2013 In this report I want to summarize Daniel J. Abadi's article [1]
More informationEvaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing
Evaluating NoSQL for Enterprise Applications Dirk Bartels VP Strategy & Marketing Agenda The Real Time Enterprise The Data Gold Rush Managing The Data Tsunami Analytics and Data Case Studies Where to go
More informationBig Data and Data Science: Behind the Buzz Words
Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing
More informationBig Data: Beyond the Hype. Why Big Data Matters to You. White Paper
Big Data: Beyond the Hype Why Big Data Matters to You White Paper BY DATASTAX CORPORATION October 2013 Table of Contents Abstract 3 Introduction 3 Big Data and You 5 Big Data Is More Prevalent Than You
More informationOracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
More informationLarge scale processing using Hadoop. Ján Vaňo
Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine
More informationSOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera
SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP Eva Andreasson Cloudera Most FAQ: Super-Quick Overview! The Apache Hadoop Ecosystem a Zoo! Oozie ZooKeeper Hue Impala Solr Hive Pig Mahout HBase MapReduce
More informationHadoop IST 734 SS CHUNG
Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to
More informationUbuntu and Hadoop: the perfect match
WHITE PAPER Ubuntu and Hadoop: the perfect match February 2012 Copyright Canonical 2012 www.canonical.com Executive introduction In many fields of IT, there are always stand-out technologies. This is definitely
More informationThe Future of Data Management with Hadoop and the Enterprise Data Hub
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah Cofounder & CTO, Cloudera, Inc. Twitter: @awadallah 1 2 Cloudera Snapshot Founded 2008, by former employees of Employees
More information