Time-Series Databases and Machine Learning
|
|
|
- Collin Hubbard
- 9 years ago
- Views:
Transcription
1 Time-Series Databases and Machine Learning Jimmy Bates November
2 Top-Ranked Hadoop Read Write File System World Record Performance High Availability Enterprise-grade Security Distribution Ease of Data Ingestion Complete Data Protection Real Multi-tenancy Unbiased Open Source
3 Top-Ranked Hadoop Distribution Top-Ranked NoSQL 3
4 Top-Ranked NoSQL Real-Time Operational Analytics Performance Security Unified Data Access Scalability and Reliability Architectural and Administrative Simplicity
5 Top-Ranked Hadoop Distribution Top-Ranked NoSQL Top-Ranked SQL-on-Hadoop Solution 5
6 Top-Ranked SQL-on-Hadoop 1 The Most SQL Options on Hadoop Solution Including our work with Apache Drill 2 6
7 Agenda What is Time Series The Data Science User Resources for Getting Started A Basic Workflow A Quick Look at OpenTSDB 2 4 7
8 Time Series Data What is Time Series Data? Any set of data-points that have time associated with it. Examples of time series data include weather, web clickstream, product sale, stock trade, machine logs, fleets, sensors, devices, network traffic, system login 8
9 Value of Time Series Analytics Forecasting Exec Dashboards Seasonality Correlation Use Cases Cut Across Different Industries Trend Analysis IOT Sensor Data Operational Behavioral Analysis Preventative Maintenance Environmental Monitoring (Climate) Cell Tower Data Analysis Stock Trade Data Analysis 9
10 Typical Time Series Analytics Flow Data Sources Data Store Extract Prep Aggregate Reports Batch ETL Environment: IT teams/application Development groups/analytics groups Persona: Enterprise Architect, NoSQL Developer, Database developer, Analyst Task: Conduct analyses or provide reports to management 10
11 Typical Time Series Analytics Flow - Challenges Data Sources Data Store Extract Prep Aggregate Reports Batch Processing Larger Volumes of data from newer sources Expensive to store Takes a lot of time to process information and do simple aggregations Not Real-time 11
12 Challenges with Status Quo Existing systems aren t well suited to store high volumes of timeseries data Building aggregations and statistical computations from time-series data is currently done in batch Need: Cost effective and reliable way to store and analyze large amounts of time series data from various sources. Ability to deploy real-time dash-boarding and monitoring capabilities on aggregated data 12
13 Time Series Solution Example MapR Data Platform 13
14 MapR Data Exploration Advantages for IoT Self-Service Data Exploration Single SQL Interface for Structured and Semi-Structured Data Data Agility with Less IT Required 14
15 Squaring the Circle Enter Apache Drill Drill is SQL compliant Uses standard syntax and semantics Drill extends SQL First class treatment of objects, lists Full support for destructuring, flattening Full power of relational model can be applied to complex data 15
16 Drill Provides Scalable and Extended SQL 16
17 Introduction to Open TSDB HBase or MapR-DB 17
18 Speeding up OpenTSDB Why can t it be faster? 20,000 data points per second per node in the cluster 18
19 Speeding up OpenTSDB: open source MapR extensions Available on Github: 19
20 Speeding up OpenTSDB: open source MapR extensions Samples Message queue Collector MapR table Web service queries database and collector Buffering data for 1 hour in collector allows >1000x decrease in insertion rate Log Web service Users Logging latest hour of data allows clean restart of collector (lambda + epsilon architecture) 20
21 Direct Blob Loading for Testing Server Collector Server Collector Grafana OpenTSDB UI TSD TSD Scripts Blob Loader Blob Loader HBase or MapR DB 21
22 A first example: Time-series data 22
23 Column names as data When column names are not pre-defined, they can convey information Examples Time offsets within a window for time series Top-level domains for web crawlers Vendor id s for customer purchase profiles Predefined schema is impossible for this idiom 23
24 Relational Model for Time-series Time series ID Sample time value :51:03 15:52:07 15:52: Row key Data values 24
25 Table Design: Point-by-Point 25
26 Table Design: Hybrid Point-by-Point + Sub-table After close of window, data in row is restated as column-oriented tabular value in different column family. 26
27 Compression Results Samples are 64b time, 16 bit sample Sample time at 10kHz Sample time jitter makes it important to keep original time-stamp How much overhead to retain time-stamp? 27
28 Insertion Speeds Inserting pre-bundled data allows humongous data rates >100 M s/s on 4 nodes >200 M s/s on 8 nodes Linear scaling if enough data sources 28
29 How fast is OpenTSDB? OpenTSDB can scale to writing millions of data points per 'second' on commodity servers with regular spinning hard drives What if we needed x faster writing speed? 29
30 Problem: How Do Load Test Data at Large Scale? Testing at large scale is a more realistic measure of performance than on a small sample But with high velocity data, how do you set up test? For a sample equivalent to long term data, you need ingest rates of 100 to 1000x faster than normal production - OR- - You must wait years to load up test data. What s the solution? 30
31 The need for rapid data loading Let s say you have 1M samples / second Suppose you want to test your system Perhaps with a year of data And you want to load that data in << 1 year 100x real-time = 100M samples / second 31
32 32
33 Short Books by Ted Dunning & Ellen Friedman Published by O Reilly in 2014 and 2015 For sale from Amazon or O Reilly Free e-books currently available courtesy of MapR recommendationebook
34 Thank maprtech mapr-technologies MapRTechnologies maprtech 34
Information Builders Mission & Value Proposition
Value 10/06/2015 2015 MapR Technologies 2015 MapR Technologies 1 Information Builders Mission & Value Proposition Economies of Scale & Increasing Returns (Note: Not to be confused with diminishing returns
The Internet of Things and Big Data: Intro
The Internet of Things and Big Data: Intro John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 1 What This Is; What This Is Not It s not specific to IoT It s not about any specific
SQL on NoSQL (and all of the data) With Apache Drill
SQL on NoSQL (and all of the data) With Apache Drill Richard Shaw Solutions Architect @aggress Who What Where NoSQL DB Very Nice People Open Source Distributed Storage & Compute Platform (up to 1000s of
Datenverwaltung im Wandel - Building an Enterprise Data Hub with
Datenverwaltung im Wandel - Building an Enterprise Data Hub with Cloudera Bernard Doering Regional Director, Central EMEA, Cloudera Cloudera Your Hadoop Experts Founded 2008, by former employees of Employees
BIG DATA FOR MEDIA SIGMA DATA SCIENCE GROUP MARCH 2ND, OSLO
BIG DATA FOR MEDIA SIGMA DATA SCIENCE GROUP MARCH 2ND, OSLO ANTHONY A. KALINDE SIGMA DATA SCIENCE GROUP ASSOCIATE "REALTIME BEHAVIOURAL DATA COLLECTION CLICKSTREAM EXAMPLE" WHAT IS CLICKSTREAM ANALYTICS?
MapR: Best Solution for Customer Success
2015 MapR Technologies 2015 MapR Technologies 1 MapR: Best Solution for Customer Success Best Product High Growth 700+ Customers Premier Investors Apache Open Source 2X 2X Growth In Direct Customers Growth
Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015
Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document
Big Data Analytics Platform @ Nokia
Big Data Analytics Platform @ Nokia 1 Selecting the Right Tool for the Right Workload Yekesa Kosuru Nokia Location & Commerce Strata + Hadoop World NY - Oct 25, 2012 Agenda Big Data Analytics Platform
HDP Hadoop From concept to deployment.
HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some
How To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
Oracle Database 12c Plug In. Switch On. Get SMART.
Oracle Database 12c Plug In. Switch On. Get SMART. Duncan Harvey Head of Core Technology, Oracle EMEA March 2015 Safe Harbor Statement The following is intended to outline our general product direction.
Self-service BI for big data applications using Apache Drill
Self-service BI for big data applications using Apache Drill 2015 MapR Technologies 2015 MapR Technologies 1 Management - MCS MapR Data Platform for Hadoop and NoSQL APACHE HADOOP AND OSS ECOSYSTEM Batch
Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1
Why NoSQL? Your database options in the new non- relational world 2015 IBM Cloudant 1 Table of Contents New types of apps are generating new types of data... 3 A brief history on NoSQL... 3 NoSQL s roots
How To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 [email protected] www.scch.at Michael Zwick DI
The Future of Data Management
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class
QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM
QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM QlikView Technical Case Study Series Big Data June 2012 qlikview.com Introduction This QlikView technical case study focuses on the QlikView deployment
Apache Hadoop in the Enterprise. Dr. Amr Awadallah, CTO/Founder @awadallah, [email protected]
Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, [email protected] Cloudera The Leader in Big Data Management Powered by Apache Hadoop The Leading Open Source Distribution of Apache
Integrating a Big Data Platform into Government:
Integrating a Big Data Platform into Government: Drive Better Decisions for Policy and Program Outcomes John Haddad, Senior Director Product Marketing, Informatica Digital Government Institute s Government
INTRODUCTION TO CASSANDRA
INTRODUCTION TO CASSANDRA This ebook provides a high level overview of Cassandra and describes some of its key strengths and applications. WHAT IS CASSANDRA? Apache Cassandra is a high performance, open
SQLstream Blaze and Apache Storm A BENCHMARK COMPARISON
SQLstream Blaze and Apache Storm A BENCHMARK COMPARISON 2 The V of Big Data Velocity means both how fast data is being produced and how fast the data must be processed to meet demand. Gartner The emergence
Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: [email protected] Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
Real Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA
Real Time Fraud Detection With Sequence Mining on Big Data Platform Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA Open Source Big Data Eco System Query (NOSQL) : Cassandra,
From Spark to Ignition:
From Spark to Ignition: Fueling Your Business on Real-Time Analytics Eric Frenkiel, MemSQL CEO June 29, 2015 San Francisco, CA What s in Store For This Presentation? 1. MemSQL: A real-time database for
BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES
BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES Relational vs. Non-Relational Architecture Relational Non-Relational Rational Predictable Traditional Agile Flexible Modern 2 Agenda Big Data
Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru
Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy Presented by: Jeffrey Zhang and Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop?
A Scalable Data Transformation Framework using the Hadoop Ecosystem
A Scalable Data Transformation Framework using the Hadoop Ecosystem Raj Nair Director Data Platform Kiru Pakkirisamy CTO AGENDA About Penton and Serendio Inc Data Processing at Penton PoC Use Case Functional
Data processing goes big
Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,
Hadoop & Spark Using Amazon EMR
Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?
Real Time Big Data Processing
Real Time Big Data Processing Cloud Expo 2014 Ian Meyers Amazon Web Services Global Infrastructure Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure
2015 Analyst and Advisor Summit. Advanced Data Analytics Dr. Rod Fontecilla Vice President, Application Services, Chief Data Scientist
2015 Analyst and Advisor Summit Advanced Data Analytics Dr. Rod Fontecilla Vice President, Application Services, Chief Data Scientist Agenda Key Facts Offerings and Capabilities Case Studies When to Engage
The Inside Scoop on Hadoop
The Inside Scoop on Hadoop Orion Gebremedhin National Solutions Director BI & Big Data, Neudesic LLC. VTSP Microsoft Corp. [email protected] [email protected] @OrionGM The Inside Scoop
Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
Self-service BI for big data applications using Apache Drill
Self-service BI for big data applications using Apache Drill 2015 MapR Technologies 2015 MapR Technologies 1 Data Is Doubling Every Two Years Unstructured data will account for more than 80% of the data
Getting Real Real Time Data Integration Patterns and Architectures
Getting Real Real Time Data Integration Patterns and Architectures Nelson Petracek Senior Director, Enterprise Technology Architecture Informatica Digital Government Institute s Enterprise Architecture
Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014
Forecast of Big Data Trends Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014 Big Data transforms Business 2 Data created every minute Source http://mashable.com/2012/06/22/data-created-every-minute/
Saving Millions through Data Warehouse Offloading to Hadoop. Jack Norris, CMO MapR Technologies. MapR Technologies. All rights reserved.
Saving Millions through Data Warehouse Offloading to Hadoop Jack Norris, CMO MapR Technologies MapR Technologies. All rights reserved. MapR Technologies Overview Open, enterprise-grade distribution for
Tap into Hadoop and Other No SQL Sources
Tap into Hadoop and Other No SQL Sources Presented by: Trishla Maru What is Big Data really? The Three Vs of Big Data According to Gartner Volume Volume Orders of magnitude bigger than conventional data
Towards Smart and Intelligent SDN Controller
Towards Smart and Intelligent SDN Controller - Through the Generic, Extensible, and Elastic Time Series Data Repository (TSDR) YuLing Chen, Dell Inc. Rajesh Narayanan, Dell Inc. Sharon Aicler, Cisco Systems
SELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM
David Chappell SELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM A PERSPECTIVE FOR SYSTEMS INTEGRATORS Sponsored by Microsoft Corporation Copyright 2014 Chappell & Associates Contents Business
Fast Innovation requires Fast IT
Fast Innovation requires Fast IT 2014 Cisco and/or its affiliates. All rights reserved. 2 2014 Cisco and/or its affiliates. All rights reserved. 3 IoT World Forum Architecture Committee 2013 Cisco and/or
OnX Big Data Reference Architecture
OnX Big Data Reference Architecture Knowledge is Power when it comes to Business Strategy The business landscape of decision-making is converging during a period in which: > Data is considered by most
ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
Architecting for the Internet of Things & Big Data
Architecting for the Internet of Things & Big Data Robert Stackowiak, Oracle North America, VP Information Architecture & Big Data September 29, 2014 Safe Harbor Statement The following is intended to
Ganzheitliches Datenmanagement
Ganzheitliches Datenmanagement für Hadoop Michael Kohs, Senior Sales Consultant @mikchaos The Problem with Big Data Projects in 2016 Relational, Mainframe Documents and Emails Data Modeler Data Scientist
BIG DATA ANALYTICS For REAL TIME SYSTEM
BIG DATA ANALYTICS For REAL TIME SYSTEM Where does big data come from? Big Data is often boiled down to three main varieties: Transactional data these include data from invoices, payment orders, storage
Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase
Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform
www.objectivity.com Choosing The Right Big Data Tools For The Job A Polyglot Approach
www.objectivity.com Choosing The Right Big Data Tools For The Job A Polyglot Approach Nic Caine NoSQL Matters, April 2013 Overview The Problem Current Big Data Analytics Relationship Analytics Leveraging
Tungsten Replicator, more open than ever!
Tungsten Replicator, more open than ever! MC Brown, Senior Product Line Manager September, 2015 2014 VMware Inc. All rights reserved. We Face An Age Old Problem BRS/Search 2 It s Gotten Worse 3 Much Worse
CitusDB Architecture for Real-Time Big Data
CitusDB Architecture for Real-Time Big Data CitusDB Highlights Empowers real-time Big Data using PostgreSQL Scales out PostgreSQL to support up to hundreds of terabytes of data Fast parallel processing
Session 1: IT Infrastructure Security Vertica / Hadoop Integration and Analytic Capabilities for Federal Big Data Challenges
Session 1: IT Infrastructure Security Vertica / Hadoop Integration and Analytic Capabilities for Federal Big Data Challenges James Campbell Corporate Systems Engineer HP Vertica [email protected] Big
Automated Data Ingestion. Bernhard Disselhoff Enterprise Sales Engineer
Automated Data Ingestion Bernhard Disselhoff Enterprise Sales Engineer Agenda Pentaho Overview Templated dynamic ETL workflows Pentaho Data Integration (PDI) Use Cases Pentaho Overview Overview What we
BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata
BIG DATA: FROM HYPE TO REALITY Leandro Ruiz Presales Partner for C&LA Teradata Evolution in The Use of Information Action s ACTIVATING MAKE it happen! Insights OPERATIONALIZING WHAT IS happening now? PREDICTING
Testing Big data is one of the biggest
Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing
Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world
Analytics March 2015 White paper Why NoSQL? Your database options in the new non-relational world 2 Why NoSQL? Contents 2 New types of apps are generating new types of data 2 A brief history of NoSQL 3
Why Spark on Hadoop Matters
Why Spark on Hadoop Matters MC Srivas, CTO and Founder, MapR Technologies Apache Spark Summit - July 1, 2014 1 MapR Overview Top Ranked Exponential Growth 500+ Customers Cloud Leaders 3X bookings Q1 13
Trafodion Operational SQL-on-Hadoop
Trafodion Operational SQL-on-Hadoop SophiaConf 2015 Pierre Baudelle, HP EMEA TSC July 6 th, 2015 Hadoop workload profiles Operational Interactive Non-interactive Batch Real-time analytics Operational SQL
#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld
Tapping into Hadoop and NoSQL Data Sources in MicroStrategy Presented by: Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop? Customer Case
Building Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon.
Building Scalable Big Data Infrastructure Using Open Source Software Sam William sampd@stumbleupon. What is StumbleUpon? Help users find content they did not expect to find The best way to discover new
End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ
End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,
Big Data Management and Security
Big Data Management and Security Audit Concerns and Business Risks Tami Frankenfield Sr. Director, Analytics and Enterprise Data Mercury Insurance What is Big Data? Velocity + Volume + Variety = Value
Big Data and Data Science: Behind the Buzz Words
Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing
How To Use A Data Center With A Data Farm On A Microsoft Server On A Linux Server On An Ipad Or Ipad (Ortero) On A Cheap Computer (Orropera) On An Uniden (Orran)
Day with Development Master Class Big Data Management System DW & Big Data Global Leaders Program Jean-Pierre Dijcks Big Data Product Management Server Technologies Part 1 Part 2 Foundation and Architecture
Using distributed technologies to analyze Big Data
Using distributed technologies to analyze Big Data Abhijit Sharma Innovation Lab BMC Software 1 Data Explosion in Data Center Performance / Time Series Data Incoming data rates ~Millions of data points/
Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview
Programming Hadoop 5-day, instructor-led BD-106 MapReduce Overview The Client Server Processing Pattern Distributed Computing Challenges MapReduce Defined Google's MapReduce The Map Phase of MapReduce
Microsoft Big Data Solutions. Anar Taghiyev P-TSP E-mail: [email protected];
Microsoft Big Data Solutions Anar Taghiyev P-TSP E-mail: [email protected]; Why/What is Big Data and Why Microsoft? Options of storage and big data processing in Microsoft Azure. Real Impact of Big
INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE
INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe
Sentimental Analysis using Hadoop Phase 2: Week 2
Sentimental Analysis using Hadoop Phase 2: Week 2 MARKET / INDUSTRY, FUTURE SCOPE BY ANKUR UPRIT The key value type basically, uses a hash table in which there exists a unique key and a pointer to a particular
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics
BIG DATA TECHNOLOGY. Hadoop Ecosystem
BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big
Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013
Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software SC13, November, 2013 Agenda Abstract Opportunity: HPC Adoption of Big Data Analytics on Apache
Comparison of the Frontier Distributed Database Caching System with NoSQL Databases
Comparison of the Frontier Distributed Database Caching System with NoSQL Databases Dave Dykstra [email protected] Fermilab is operated by the Fermi Research Alliance, LLC under contract No. DE-AC02-07CH11359
Big Data and Analytics in Government
Big Data and Analytics in Government Nov 29, 2012 Mark Johnson Director, Engineered Systems Program 2 Agenda What Big Data Is Government Big Data Use Cases Building a Complete Information Solution Conclusion
Big Data at Cloud Scale
Big Data at Cloud Scale Pushing the limits of flexible & powerful analytics Copyright 2015 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For
HDP Enabling the Modern Data Architecture
HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,
Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges
Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges Prerita Gupta Research Scholar, DAV College, Chandigarh Dr. Harmunish Taneja Department of Computer Science and
Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel
Big Data and Analytics: Getting Started with ArcGIS Mike Park Erik Hoel Agenda Overview of big data Distributed computation User experience Data management Big data What is it? Big Data is a loosely defined
HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics
HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics ESSENTIALS EMC ISILON Use the industry's first and only scale-out NAS solution with native Hadoop
Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015
Pulsar Realtime Analytics At Scale Tony Ng April 14, 2015 Big Data Trends Bigger data volumes More data sources DBs, logs, behavioral & business event streams, sensors Faster analysis Next day to hours
Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics
In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning
The Top 10 7 Hadoop Patterns and Anti-patterns. Alex Holmes @
The Top 10 7 Hadoop Patterns and Anti-patterns Alex Holmes @ whoami Alex Holmes Software engineer Working on distributed systems for many years Hadoop since 2008 @grep_alex grepalex.com what s hadoop...
Time series IoT data ingestion into Cassandra using Kaa
Time series IoT data ingestion into Cassandra using Kaa Andrew Shvayka [email protected] Agenda Data ingestion challenges Why Kaa? Why Cassandra? Reference architecture overview Hands-on Sandbox
Implement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper
Offload Enterprise Data Warehouse (EDW) to Big Data Lake Oracle Exadata, Teradata, Netezza and SQL Server Ample White Paper EDW (Enterprise Data Warehouse) Offloads The EDW (Enterprise Data Warehouse)
How To Create A Data Visualization With Apache Spark And Zeppelin 2.5.3.5
Big Data Visualization using Apache Spark and Zeppelin Prajod Vettiyattil, Software Architect, Wipro Agenda Big Data and Ecosystem tools Apache Spark Apache Zeppelin Data Visualization Combining Spark
The Future of Data Management with Hadoop and the Enterprise Data Hub
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah Cofounder & CTO, Cloudera, Inc. Twitter: @awadallah 1 2 Cloudera Snapshot Founded 2008, by former employees of Employees
Data Governance in the Hadoop Data Lake. Michael Lang May 2015
Data Governance in the Hadoop Data Lake Michael Lang May 2015 Introduction Product Manager for Teradata Loom Joined Teradata as part of acquisition of Revelytix, original developer of Loom VP of Sales
The Enterprise Data Hub and The Modern Information Architecture
The Enterprise Data Hub and The Modern Information Architecture Dr. Amr Awadallah CTO & Co-Founder, Cloudera Twitter: @awadallah 1 2013 Cloudera, Inc. All rights reserved. Cloudera Overview The Leader
Lustre Monitoring with OpenTSDB
Lustre Monitoring with OpenTSDB 2015/9/22 DataDirect Networks Japan, Inc. Shuichi Ihara 2 Lustre Monitoring Background Lustre is a black box Users and Administrators want to know what s going on? Find
Apache HBase. Crazy dances on the elephant back
Apache HBase Crazy dances on the elephant back Roman Nikitchenko, 16.10.2014 YARN 2 FIRST EVER DATA OS 10.000 nodes computer Recent technology changes are focused on higher scale. Better resource usage
Lecture 10: HBase! Claudia Hauff (Web Information Systems)! [email protected]
Big Data Processing, 2014/15 Lecture 10: HBase!! Claudia Hauff (Web Information Systems)! [email protected] 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind the
SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera
SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP Eva Andreasson Cloudera Most FAQ: Super-Quick Overview! The Apache Hadoop Ecosystem a Zoo! Oozie ZooKeeper Hue Impala Solr Hive Pig Mahout HBase MapReduce
Analyzing Big Data with AWS
Analyzing Big Data with AWS Peter Sirota, General Manager, Amazon Elastic MapReduce @petersirota What is Big Data? Computer generated data Application server logs (web sites, games) Sensor data (weather,
Big Data With Hadoop
With Saurabh Singh [email protected] The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials
Three Open Blueprints For Big Data Success
White Paper: Three Open Blueprints For Big Data Success Featuring Pentaho s Open Data Integration Platform Inside: Leverage open framework and open source Kickstart your efforts with repeatable blueprints
How to Enhance Traditional BI Architecture to Leverage Big Data
B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...
MADOCA II Data Logging System Using NoSQL Database for SPring-8
MADOCA II Data Logging System Using NoSQL Database for SPring-8 A.Yamashita and M.Kago SPring-8/JASRI, Japan NoSQL WED3O03 OR: How I Learned to Stop Worrying and Love Cassandra Outline SPring-8 logging
