A SMART ELEPHANT FOR A SMART-GRID: (ELECTRICAL) TIME-SERIES STORAGE AND ANALYTICS. EDF R&D SIGMA Project Marie-Luce Picard
|
|
|
- Rosamund Dalton
- 10 years ago
- Views:
Transcription
1 A SMART ELEPHANT FOR A SMART-GRID: (ELECTRICAL) TIME-SERIES STORAGE AND ANALYTICS WITHIN HADOOP EDF R&D SIGMA Project Marie-Luce Picard Forum TERATEC June 26th 2013
2 OUTLINE 1. CONTEXT 2. A PROOF OF CONCEPT USING HADOOP 3. ADVANCED ANALYTICS 4. CONCLUSION AND PERSPECTIVES 2
3 OUTLINE 1. CONTEXT 2. A PROOF OF CONCEPT USING HADOOP 3. ADVANCED ANALYTICS 4. CONCLUSION AND PERSPECTIVES 3
4 SMART GRIDS SMART METERS SMART DATA Motivated by economic or regulatory constraints, or environmental needs, smart-grid projects are spreading. New perspectives emerge for energy management as electric vehicles, and renewable energy generation will expand. A large amount of smart-meters and sensors will be deployed and they will provoke a data deluge utilities will have to face. 4
5 SMART METERING: A DATA DELUGE! French project: 35+ millions of Smart Meters, billions of meteringdata Currently, pilot program with300k smart meters deployed 5
6 SMART METERING DATA: WHAT DOES A LOAD CURVE LOOK LIKE? 6
7 SMART METERING DATA: WHAT DOES A LOAD CURVE LOOK LIKE? 7
8 SMART METERING DATA: WHAT DOES A LOAD CURVE LOOK LIKE? Individual load curves: -Left: same customer, two different days -Up: same day, two different customers 8
9 MASSIVE DATA MANAGEMENT IN THE ENERGY DOMAIN Challenges: More complexity in the electric power system (distributed generation, demand response ) Fastercomplexity of customer indoor equipment (smart meters and devices, Internet Of Things ) Core business will involve more IT and data management 9
10 OUTLINE 1. CONTEXT 2. A PROOF OF CONCEPT USING HADOOP 3. ADVANCED ANALYTICS 4. CONCLUSION AND PERSPECTIVES 10
11 MASSIVE TIME-SERIES STORAGE Objective: Proof Of Concept for running a large number of queries (variable levels of complexity with variable scopes and frequencies, variable acceptable latencies) on a huge number of load curves Data: individual load curves, weather data, contractual information, network data 1 measure every 10 mnfor 35 million customers Annual data volume: 1800 billion records, 120 TB of raw data Operational DataWareHouse able to: Supply a large volume of data Ingest new incoming data (pre-processing) Allow concurrent and simultaneous queries: tactical queries, analytical queries, adhoc queries Evaluation criteria: competition and performances (QoS, SLA), convergence and agility Traditional VLDB as well as Hadoopas a solution 11
12 USING HADOOP FOR STORING MASSIVE TIME-SERIES HADOOP: Native distributed file system(hdfs) Distributed processing using the MapReduce paradigm A large and fast evolving ecosystem Use cases: mainly on unstructured data Divide and Conquer 12
13 POC HADOOP: THE DATA MODEL Compressed data Volume on HDFS : 10 TB (x3) 13
14 DATA GENERATOR COURBOGEN Courbogen can generate massive datasets (not too statistical realistic) Generates load curves and associated data Customizable tool: interval, duration, data quality, noise on the curves Distributed architecture (NodeJS, Redis) Output as a data stream One week data for 35M curves 14
15 DESIGN R&D 20 nodes cluster, 336 cores All data stored in HDFS Use of HIVE for analytical queries (11 months of data) Use of HBASE at the forefront of data access (tactical queries) (3 months of data) 15
16 POC HADOOP: DATA UPLOAD Daily data Raw data: 327 Go; HDFS data: 50 Go, Hive/HBase: Go Initial upload : 15 minutes Hadoop component Representationmodels Total volume of data (compressed) Time for uploading and preparing the data HIVE Tuplemode 10*3=30To 2h30,35 foruploading HBASE Array(dailydata) 3*3 =9To 8 minutes (only upload, aggregated data is computed in Hive). TOTAL 39 To 2H 38 16
17 POC HADOOP: RESULTS Overall GOOD results; for some types of queries, competitivewith traditional approaches Tactical queries are successful handled by HBase, offeringlow latencies under a high concurrent load Analytical queries are processed by Hive Analytical queries on HIVE 11 months of data, 35 millions of curves, 10 minutes interval Partitioning: day/tariff/power Tuple representation Query Execution time (1 day) Execution time (7 days) GlobalFranceaggregatedcurve 10mn 1 min, 56 sec 19 min, 24 sec Aggregatedcurveby tariffandpower 2 min, 21 sec 24 min, 33 sec Average consomption by building type (ad-hoc) 1 heure, 18 sec 3 heures, 16 min, 50 sec 17
18 POC HADOOP: RESULTS Tactical queries using HBase 3 months of data, 35 millions of curves, 10 minutes interval 500 concurrent queries, SLAs asked to be respected for 90% of queries Array representation Durée Pas de temps NB requêtes simultanées demandées / obtenu SLA demandé / obtenu par requête en sec + taux de réussite obtenu Ecart typeen sec Nombre de requêtes effectuées / sec Jour 10min < 1 0,57 86,53 1,77 210,98 30min < 3 0,68 93,54 1,98 73,83 Semaine 10min < 2 2,11 88,36 9,16 45,76 horaire < 7 2,89 92,24 11,43 13,83 Mois horaire < 7 2,45 94,09 11,06 13,09 jour 8 8 < 10 1,68 94,67 7,35 4,79 Trimestre jour 8 8 < 10 2,13 94,33 9,12 3,77 hebdo 2 2 < 15 2,13 94,92 8,98 0,95 18
19 POC HADOOP: TIME SERIES REPRESENTATION MODELS 19
20 POC HADOOP: VISUALIZATION Hadoop / Tableau Software : Synchrones par RE 20
21 OUTLINE 1. CONTEXT 2. A PROOF OF CONCEPT USING HADOOP 3. ADVANCED ANALYTICS 4. CONCLUSION AND PERSPECTIVES 21
22 ADVANCED ANALYTICS WITH HADOOP Strong Needs for advanced analytics on smart metering data: Segmentation based on load curves, Forecasting on local areas, Scoring for Non Technical Losses, pattern recognition within load curves, predictive modeling, Implementing classical data-mining tasks vs home-made methods Use of existing librairiesand toolkits vsdeveloping HadoopOn Time-series toolkit 22
23 EXISTING ANALYTICS TOOLKIT Rhadoop 3 R packages for analyzing data stored within Hadoopfrom R (rmr2,rhdfs,rhbase) You need to re-implement methods using the rmr2 package Small users community Linked to Revolution Mahout A Java toolkit with data analysis methods implemented using MapReduce paradigm Apache foundation ( Large users community Evolving (and growing) very fast 23
24 POC HADOOP: DATA CLUSTERING Examples of tests runon a data sets of 35 millions of curves, one measurementby curve (daily average consumption) Script Rhadoop(execution time: 3.12 heures) > tb_input_kmeans = mapreduce('/tmp/tb_kmeans_r.csv', input.format = make.input.format('csv', sep=','), structured = T, vectorized = T, map = function(k, v) keyval( v$v1 %% 35000, v, vectorized = T), reduce = function(k,vv) keyval(k, vv, vectorized = F), backend.parameters = list(hadoop = list(d ="mapred.reduce.tasks=200",d="mapred.map.tasks=200")), verbose=t ) > kmeans(tb_input_kmeans, ncenters = 20, iterations = 15, fast = T) Script Mahout (execution time: 17 minutes) $ mahout org.apache.mahout.clustering.conversion.inputdriver --input /user/hive/warehouse/sigma.db/tb_input_kmeans --output /user/sigma/outputvector/ $ mahout kmeans --input /user/sigma/outputvector/ -c clusters -k 20 --output /user/sigma/output -dm org.apache.mahout.common.distance.euclideandistancemeasure --maxiter 15 --overwrite --clustering 24
25 FROST : A LIBRARY FOR TIME SERIES TRANSFORMATION WITHIN HIVE Analytics and Data Mining on time series : curse of dimensionnality FROST is a set of user defined functions in HIVE enabling simplification of time series FROST.JAR (First Release Of time Series Toolkit) Example : ADD JAR FROST.JAR; SELECT ID,SAX(POWER,8,3) FROM BIG.DATA GROUP BY DAY; Other functions in FROST : PAA : Piecewise Aggregate Approximation DFT : Discret Fourier Transform DWT : DiscretWavelet Transform and other «home made» methods SAX principle (Symbolic Aggregate approximation) 25
26 SEARCHING TIME SERIES WITH HADOOP Objective: searching similar time-series within a huge sets of series Top-K or range queries based on a similarity measure Jumping or sliding windows Brutal force of distributed computations in the Hadoopenvironment: use of UDF Hive functions 26
27 SEARCHING TIME SERIES WITH HADOOP Data: 35 millions curves, 30 days Top-5 (or top-500), with UDAFfunctions on jumping windows ~4 mn45s 27
28 OUTLINE 1. CONTEXT 2. A PROOF OF CONCEPT USING HADOOP 3. ADVANCED ANALYTICS 4. CONCLUSION AND PERSPECTIVES 28
29 POC HADOOP : CONCLUSIONS Hadoopas an alternativeapproach for storing and analyzing massive time-series data (good results, some of them competitive for querying, ability to implement advanced analytics) Is Hadoopenterprise-ready? (+) : costs, commodity hardware, flexible, scalability, structured and non structured data, interoperability (-): skills (high complexity), maturity (security, operating, SQL-compliant) Recommendation: should be limited to specific (non critical) usages such as Big ETL, unstructured data, BI subset, Big sand-box and analytical warehouse 29
30 POC HADOOP : PERSPECTIVES Hadoop for unstructured data Text-mining applications Using Mahout library (more efficient, updatable) to implement text-mining functions (Lucent) Needs tuning (format, stop words, ) and integration (text + clustering, results visualization) Currently tested on tweets Hadoopand HPC Impact on the location of data (Hadooprecommendations: data nodes should be close to the processing nodes) Installation of Hadoop ecosystem on an HPC cluster Impact on performances (r/w) NetCDFfiles (coming from simulation codes) ingested by Hadoop: SciHadoop: Hadoop plug-in for NetCdf data sets Work in progress 30
31 REFERENCES A proof of concept with Hadoop: storage and analytics of electrical time-series. Marie-Luce Picard, Bruno Jacquin, Hadoop Summit 2012, Californie, USA, présentation : vidéo: Massive Smart Meter Data Storage and Processing on top of Hadoop. Leeley D. P. dos Santos, Alzennyr G. da Silva, Bruno Jacquin, Marie-Luce Picard, David Worms,Charles Bernard. Workshop Big Data 2012, Conférence VLDB (Very Large Data Bases), Istanbul, Turquie, Smart Metering x Hadoop x Frost: A Smart Elephant Enabling Massive Time Series Analysis. Benoît Grossin, Marie-Luce Picard, Hadoop Summit Europe 2013, Amsterdam, Mars Acknowledgments - Work achieved with: Alice Bérard, Charles Bernard, Leeley Daio-Pires-Dos-Santos, AlzennyrGomes DaSilva, BenoîtGrossin, Georges Hébrail, Bruno Jacquin, JiannanLiu, Vincent Nicolas, David Worms 31
Smart Grid Data Management Challenges
Smart Grid Data Management Challenges Marie-Luce PICARD EDF R&D [email protected] 16 Novembre 2010 Outline 1. Smart grids : What? Where? What for? 2. A road map for Smart Grids functionalities and
Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012. Viswa Sharma Solutions Architect Tata Consultancy Services
Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012 Viswa Sharma Solutions Architect Tata Consultancy Services 1 Agenda What is Hadoop Why Hadoop? The Net Generation is here Sizing the
INTEGRATING R AND HADOOP FOR BIG DATA ANALYSIS
INTEGRATING R AND HADOOP FOR BIG DATA ANALYSIS Bogdan Oancea "Nicolae Titulescu" University of Bucharest Raluca Mariana Dragoescu The Bucharest University of Economic Studies, BIG DATA The term big data
Advanced Big Data Analytics with R and Hadoop
REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional
How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning
How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume
Session 1: IT Infrastructure Security Vertica / Hadoop Integration and Analytic Capabilities for Federal Big Data Challenges
Session 1: IT Infrastructure Security Vertica / Hadoop Integration and Analytic Capabilities for Federal Big Data Challenges James Campbell Corporate Systems Engineer HP Vertica [email protected] Big
Using distributed technologies to analyze Big Data
Using distributed technologies to analyze Big Data Abhijit Sharma Innovation Lab BMC Software 1 Data Explosion in Data Center Performance / Time Series Data Incoming data rates ~Millions of data points/
Hadoop 只 支 援 用 Java 開 發 嘛? Is Hadoop only support Java? 總 不 能 全 部 都 重 新 設 計 吧? 如 何 與 舊 系 統 相 容? Can Hadoop work with existing software?
Hadoop 只 支 援 用 Java 開 發 嘛? Is Hadoop only support Java? 總 不 能 全 部 都 重 新 設 計 吧? 如 何 與 舊 系 統 相 容? Can Hadoop work with existing software? 可 以 跟 資 料 庫 結 合 嘛? Can Hadoop work with Databases? 開 發 者 們 有 聽 到
BIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
How To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to
Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April 9 2013
Integrating Hadoop Into Business Intelligence & Data Warehousing Philip Russom TDWI Research Director for Data Management, April 9 2013 TDWI would like to thank the following companies for sponsoring the
SQL Server 2012 PDW. Ryan Simpson Technical Solution Professional PDW Microsoft. Microsoft SQL Server 2012 Parallel Data Warehouse
SQL Server 2012 PDW Ryan Simpson Technical Solution Professional PDW Microsoft Microsoft SQL Server 2012 Parallel Data Warehouse Massively Parallel Processing Platform Delivers Big Data HDFS Delivers Scale
Integrating a Big Data Platform into Government:
Integrating a Big Data Platform into Government: Drive Better Decisions for Policy and Program Outcomes John Haddad, Senior Director Product Marketing, Informatica Digital Government Institute s Government
Cisco IT Hadoop Journey
Cisco IT Hadoop Journey Srini Desikan, Program Manager IT 2015 MapR Technologies 1 Agenda Hadoop Platform Timeline Key Decisions / Lessons Learnt Data Lake Hadoop s place in IT Data Platforms Use Cases
BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014
BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014 Ralph Kimball Associates 2014 The Data Warehouse Mission Identify all possible enterprise data assets Select those assets
Big Data Analytics - Accelerated. stream-horizon.com
Big Data Analytics - Accelerated stream-horizon.com Legacy ETL platforms & conventional Data Integration approach Unable to meet latency & data throughput demands of Big Data integration challenges Based
Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics
In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning
Are You Ready for Big Data?
Are You Ready for Big Data? Jim Gallo National Director, Business Analytics February 11, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?
Workshop on Hadoop with Big Data
Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly
HPC technology and future architecture
HPC technology and future architecture Visual Analysis for Extremely Large-Scale Scientific Computing KGT2 Internal Meeting INRIA France Benoit Lange [email protected] Toàn Nguyên [email protected]
Agile Business Intelligence Data Lake Architecture
Agile Business Intelligence Data Lake Architecture TABLE OF CONTENTS Introduction... 2 Data Lake Architecture... 2 Step 1 Extract From Source Data... 5 Step 2 Register And Catalogue Data Sets... 5 Step
BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES
BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES Relational vs. Non-Relational Architecture Relational Non-Relational Rational Predictable Traditional Agile Flexible Modern 2 Agenda Big Data
Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce
Analytics in the Cloud Peter Sirota, GM Elastic MapReduce Data-Driven Decision Making Data is the new raw material for any business on par with capital, people, and labor. What is Big Data? Terabytes of
BIG DATA AND THE ENTERPRISE DATA WAREHOUSE WORKSHOP
BIG DATA AND THE ENTERPRISE DATA WAREHOUSE WORKSHOP Business Analytics for All Amsterdam - 2015 Value of Big Data is Being Recognized Executives beginning to see the path from data insights to revenue
You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.
What is this course about? This course is an overview of Big Data tools and technologies. It establishes a strong working knowledge of the concepts, techniques, and products associated with Big Data. Attendees
How Companies are! Using Spark
How Companies are! Using Spark And where the Edge in Big Data will be Matei Zaharia History Decreasing storage costs have led to an explosion of big data Commodity cluster software, like Hadoop, has made
W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract
W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the
Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database
An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct
I/O Considerations in Big Data Analytics
Library of Congress I/O Considerations in Big Data Analytics 26 September 2011 Marshall Presser Federal Field CTO EMC, Data Computing Division 1 Paradigms in Big Data Structured (relational) data Very
Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum
Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All
Chapter 7. Using Hadoop Cluster and MapReduce
Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in
Hadoop: Distributed Data Processing. Amr Awadallah Founder/CTO, Cloudera, Inc. ACM Data Mining SIG Thursday, January 25 th, 2010
Hadoop: Distributed Data Processing Amr Awadallah Founder/CTO, Cloudera, Inc. ACM Data Mining SIG Thursday, January 25 th, 2010 Outline Scaling for Large Data Processing What is Hadoop? HDFS and MapReduce
Oracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
Cisco IT Hadoop Journey
Cisco IT Hadoop Journey Alex Garbarini, IT Engineer, Cisco 2015 MapR Technologies 1 Agenda Hadoop Platform Timeline Key Decisions / Lessons Learnt Data Lake Hadoop s place in IT Data Platforms Use Cases
Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: [email protected] Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
ITG Software Engineering
Introduction to Apache Hadoop Course ID: Page 1 Last Updated 12/15/2014 Introduction to Apache Hadoop Course Overview: This 5 day course introduces the student to the Hadoop architecture, file system,
The Next Wave of Data Management. Is Big Data The New Normal?
The Next Wave of Data Management Is Big Data The New Normal? Table of Contents Introduction 3 Separating Reality and Hype 3 Why Are Firms Making IT Investments In Big Data? 4 Trends In Data Management
Hadoop Big Data for Processing Data and Performing Workload
Hadoop Big Data for Processing Data and Performing Workload Girish T B 1, Shadik Mohammed Ghouse 2, Dr. B. R. Prasad Babu 3 1 M Tech Student, 2 Assosiate professor, 3 Professor & Head (PG), of Computer
Big Data Big Data/Data Analytics & Software Development
Big Data Big Data/Data Analytics & Software Development Danairat T. [email protected], 081-559-1446 1 Agenda Big Data Overview Business Cases and Benefits Hadoop Technology Architecture Big Data Development
Real Time Big Data Processing
Real Time Big Data Processing Cloud Expo 2014 Ian Meyers Amazon Web Services Global Infrastructure Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure
BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON
BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing
Performance Management in Big Data Applica6ons. Michael Kopp, Technology Strategist @mikopp
Performance Management in Big Data Applica6ons Michael Kopp, Technology Strategist NoSQL: High Volume/Low Latency DBs Web Java Key Challenges 1) Even Distribu6on 2) Correct Schema and Access paperns 3)
A very short talk about Apache Kylin Business Intelligence meets Big Data. Fabian Wilckens EMEA Solutions Architect
A very short talk about Apache Kylin Business Intelligence meets Big Data Fabian Wilckens EMEA Solutions Architect 1 The challenge today 2 Very quickly: OLAP Online Analytical Processing How many beers
Modern Data Architecture for Predictive Analytics
Modern Data Architecture for Predictive Analytics David Smith VP Marketing and Community - Revolution Analytics John Kreisa VP Strategic Marketing- Hortonworks Hortonworks Inc. 2013 Page 1 Your Presenters
Big Data on Microsoft Platform
Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4
SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera
SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP Eva Andreasson Cloudera Most FAQ: Super-Quick Overview! The Apache Hadoop Ecosystem a Zoo! Oozie ZooKeeper Hue Impala Solr Hive Pig Mahout HBase MapReduce
Apache Kylin Introduction Dec 8, 2014 @ApacheKylin
Apache Kylin Introduction Dec 8, 2014 @ApacheKylin Luke Han Sr. Product Manager [email protected] @lukehq Yang Li Architect & Tech Leader [email protected] Agenda What s Apache Kylin? Tech Highlights Performance
Big Data Open Source Stack vs. Traditional Stack for BI and Analytics
Big Data Open Source Stack vs. Traditional Stack for BI and Analytics Part I By Sam Poozhikala, Vice President Customer Solutions at StratApps Inc. 4/4/2014 You may contact Sam Poozhikala at [email protected].
Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP
Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools
Where is... How do I get to...
Big Data, Fast Data, Spatial Data Making Sense of Location Data in a Smart City Hans Viehmann Product Manager EMEA ORACLE Corporation August 19, 2015 Copyright 2014, Oracle and/or its affiliates. All rights
Successfully Deploying Alternative Storage Architectures for Hadoop Gus Horn Iyer Venkatesan NetApp
Successfully Deploying Alternative Storage Architectures for Hadoop Gus Horn Iyer Venkatesan NetApp Agenda Hadoop and storage Alternative storage architecture for Hadoop Use cases and customer examples
Big Data Explained. An introduction to Big Data Science.
Big Data Explained An introduction to Big Data Science. 1 Presentation Agenda What is Big Data Why learn Big Data Who is it for How to start learning Big Data When to learn it Objective and Benefits of
Hadoop in Social Network Analysis - overview on tools and some best practices - Headline Goes Here
Hadoop in Social Network Analysis - overview on tools and some best practices - Headline Goes Here Speaker Name or Subhead Goes Here GridKa School 2013, Karlsruhe 2013-08-27 Mirko Kämpf [email protected]
Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
Building Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon.
Building Scalable Big Data Infrastructure Using Open Source Software Sam William sampd@stumbleupon. What is StumbleUpon? Help users find content they did not expect to find The best way to discover new
How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns
How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns Table of Contents Abstract... 3 Introduction... 3 Definition... 3 The Expanding Digitization
ANALYTICS CENTER LEARNING PROGRAM
Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals
Implement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
Datenverwaltung im Wandel - Building an Enterprise Data Hub with
Datenverwaltung im Wandel - Building an Enterprise Data Hub with Cloudera Bernard Doering Regional Director, Central EMEA, Cloudera Cloudera Your Hadoop Experts Founded 2008, by former employees of Employees
Business Intelligence for Big Data
Business Intelligence for Big Data Will Gorman, Vice President, Engineering May, 2011 2010, Pentaho. All Rights Reserved. www.pentaho.com. What is BI? Business Intelligence = reports, dashboards, analysis,
Ganzheitliches Datenmanagement
Ganzheitliches Datenmanagement für Hadoop Michael Kohs, Senior Sales Consultant @mikchaos The Problem with Big Data Projects in 2016 Relational, Mainframe Documents and Emails Data Modeler Data Scientist
A Brief Introduction to Apache Tez
A Brief Introduction to Apache Tez Introduction It is a fact that data is basically the new currency of the modern business world. Companies that effectively maximize the value of their data (extract value
Are You Ready for Big Data?
Are You Ready for Big Data? Jim Gallo National Director, Business Analytics April 10, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?
BIG DATA TRENDS AND TECHNOLOGIES
BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.
European Archival Records and Knowledge Preservation Database Archiving in the E-ARK Project
European Archival Records and Knowledge Preservation Database Archiving in the E-ARK Project Janet Delve, University of Portsmouth Kuldar Aas, National Archives of Estonia Rainer Schmidt, Austrian Institute
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics
SQLstream Blaze and Apache Storm A BENCHMARK COMPARISON
SQLstream Blaze and Apache Storm A BENCHMARK COMPARISON 2 The V of Big Data Velocity means both how fast data is being produced and how fast the data must be processed to meet demand. Gartner The emergence
Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,[email protected]
Data Warehousing and Analytics Infrastructure at Facebook Ashish Thusoo & Dhruba Borthakur athusoo,[email protected] Overview Challenges in a Fast Growing & Dynamic Environment Data Flow Architecture,
How To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 [email protected] www.scch.at Michael Zwick DI
Advanced In-Database Analytics
Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??
Transforming the Telecoms Business using Big Data and Analytics
Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe
Hadoop and its Usage at Facebook. Dhruba Borthakur [email protected], June 22 rd, 2009
Hadoop and its Usage at Facebook Dhruba Borthakur [email protected], June 22 rd, 2009 Who Am I? Hadoop Developer Core contributor since Hadoop s infancy Focussed on Hadoop Distributed File System Facebook
Using Data Mining and Machine Learning in Retail
Using Data Mining and Machine Learning in Retail Omeid Seide Senior Manager, Big Data Solutions Sears Holdings Bharat Prasad Big Data Solution Architect Sears Holdings Over a Century of Innovation A Fortune
Delivering Value from Big Data with Revolution R Enterprise and Hadoop
Executive White Paper Delivering Value from Big Data with Revolution R Enterprise and Hadoop Bill Jacobs, Director of Product Marketing Thomas W. Dinsmore, Director of Product Management October 2013 Abstract
Bringing Big Data to People
Bringing Big Data to People Microsoft s modern data platform SQL Server 2014 Analytics Platform System Microsoft Azure HDInsight Data Platform Everyone should have access to the data they need. Process
WROX Certified Big Data Analyst Program by AnalytixLabs and Wiley
WROX Certified Big Data Analyst Program by AnalytixLabs and Wiley Disclaimer: This material is protected under copyright act AnalytixLabs, 2011. Unauthorized use and/ or duplication of this material or
The Future of Data Management
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class
Industry 4.0 and Big Data
Industry 4.0 and Big Data Marek Obitko, [email protected] Senior Research Engineer 03/25/2015 PUBLIC PUBLIC - 5058-CO900H 2 Background Joint work with Czech Institute of Informatics, Robotics and
The Power of Pentaho and Hadoop in Action. Demonstrating MapReduce Performance at Scale
The Power of Pentaho and Hadoop in Action Demonstrating MapReduce Performance at Scale Introduction Over the last few years, Big Data has gone from a tech buzzword to a value generator for many organizations.
Manifest for Big Data Pig, Hive & Jaql
Manifest for Big Data Pig, Hive & Jaql Ajay Chotrani, Priyanka Punjabi, Prachi Ratnani, Rupali Hande Final Year Student, Dept. of Computer Engineering, V.E.S.I.T, Mumbai, India Faculty, Computer Engineering,
Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard
Hadoop and Relational base The Best of Both Worlds for Analytics Greg Battas Hewlett Packard The Evolution of Analytics Mainframe EDW Proprietary MPP Unix SMP MPP Appliance Hadoop? Questions Is Hadoop
INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE
INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe
Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy
Native Connectivity to Big Data Sources in MicroStrategy 10 Presented by: Raja Ganapathy Agenda MicroStrategy supports several data sources, including Hadoop Why Hadoop? How does MicroStrategy Analytics
Luncheon Webinar Series May 13, 2013
Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration
Unprecedented Performance and Scalability Demonstrated For Meter Data Management:
Unprecedented Performance and Scalability Demonstrated For Meter Data Management: Ten Million Meters Scalable to One Hundred Million Meters For Five Billion Daily Meter Readings Performance testing results
Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop
Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Kanchan A. Khedikar Department of Computer Science & Engineering Walchand Institute of Technoloy, Solapur, Maharashtra,
Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging
Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging
BIG DATA TECHNOLOGY. Hadoop Ecosystem
BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big
Big Data Spatial Analytics An Introduction
2013 Esri International User Conference July 8 12, 2013 San Diego, California Technical Workshop Big Data Spatial Analytics An Introduction Marwa Mabrouk Mansour Raad Esri iu UC2013. Technical Workshop
Extend your analytic capabilities with SAP Predictive Analysis
September 9 11, 2013 Anaheim, California Extend your analytic capabilities with SAP Predictive Analysis Charles Gadalla Learning Points Advanced analytics strategy at SAP Simplifying predictive analytics
Testing 3Vs (Volume, Variety and Velocity) of Big Data
Testing 3Vs (Volume, Variety and Velocity) of Big Data 1 A lot happens in the Digital World in 60 seconds 2 What is Big Data Big Data refers to data sets whose size is beyond the ability of commonly used
Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel
Big Data and Analytics: Getting Started with ArcGIS Mike Park Erik Hoel Agenda Overview of big data Distributed computation User experience Data management Big data What is it? Big Data is a loosely defined
Bringing Big Data Modelling into the Hands of Domain Experts
Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks [email protected] 2015 The MathWorks, Inc. 1 Data is the sword of the
