DoneDeal - Data Pla+orm April 2016 Mar6n Peters DoneDeal Analy6cs Team Manager

Size: px
Start display at page:

Download "DoneDeal - Data Pla+orm April 2016 Mar6n Peters DoneDeal Analy6cs Team Manager"

Transcription

1 DoneDeal - Data Pla+orm April 2016 Mar6n Peters (mar6n@donedeal.ie DoneDeal Analy6cs Team Manager

2 If you don t understand the details of your business you are going to fail. If we can keep our competitors focused on us while we stay focused on the customer, ultimately we ll turn out all right. - Jeff Bezos, Amazon

3 What do these companies have in Common?

4 Data is one of our biggest assets. With the right set of information, you can make business decisions with higher levels of confidence, as you can audit and attribute the data you used for the decision-making process. - Krish Krishnan, 2014

5 Business Intelligence 101 For small companies the gap is onen filled with custom ad hoc solu6ons with limited and rather sta6c repor6ng capability.

6 What and why BI? As a company grows, the Availability, Accuracy and Accessibility requirements of data increases.

7 Some terminology: ETL process Extrac6on Transforma6on: Loading: Extracts data from homogeneous or heterogeneous data sources. Process, Blend, merge and conform the data Store in the proper format or structure for the purposes of querying and analysis.

8 April April 2016

9 Timeline: Silo d Data Manual/Error Prone Blending Value of BI/Data not understood Platform Design Implementation Storage Layer Batch Layer Traditional BI Serving Layer Speed Layer Real Time Analytics

10 Business Goals & Objec6ves 1. Build a future proof data analy2cs pla5orm that will scale with the company over the next 5 years. 2. Take ownership of our data. Collect more data. 3. Replace exis2ng repor2ng tool. 4. Provide a holis2c view of our users (buyers and sellers), ads, products 5. Use our data in a smarter manner and provide recommenda2ons in a 2mely fashion.

11 Apollo Team Data Engineer Data Analyst Architect DevOps BI Consultants Solution Architect Analy2cs Pla5orm that includes Event Streaming, Data Consolida2on, Cleansing & Warehousing, Data Visualisa2on, Business Intelligence and Data Product Delivery. Apollo brings agility and flexibility in our data model, data ownership is key and allows us to blending data more conveniently

12 Apollo Principles Project Principles: 1. System must scale but costs grow more slowly 2. Occam s Razor 3. Analy2cs and core pla5orms are independent 4. Monitoring of pla5orm is key 5. Low maintenance Data Principles: 1. Accurate, Available, Accessible 2. Ownership - Business & Technical 3. Standardised across teams 4. Integrity 5. Iden2fiable - primary source and globally unique iden2fier

13 Apollo Architectural Principles Decoupled data bus Use the right tool/service for the job Data structure, latency, throughput, access pa\erns Use Lambda architecture ideas Immutable (append- only), batch, [speed, serving] layers Leverage AWS Managed Services Scalable/elas2c, available, reliable, secure, no/low admin Big data!= Big Cost

14 Tools/Services in Produc6on Business Users Data Science

15 ETL Architecture: Custom Build Pipeline Summary Summary Summary E T L

16 ETL: Control over complex dependencies Allows control of ETL pipelines with complex dependencies datasource Easy plug-in of new Orchestration with Data Pipeline and Common Status or Summary Files Idempotent Pipeline Historical data extracted as simulated stream

17 ETL: By the numbers Extrac6on days processed - 7 different data sources - 14 domains - 13 event types Orchestra6on processing days - 4GB/day Data Lake - 11M events streamed/day - 3 million files - 3 TB of data stored over 7 buckets RedshiN - 7B records in produc6on - 6 Schemas (core and aggregate) - 86 Tables in core schema - 3 Environments - 15 data pipelines

18 Kinesis Streams 1 Stream with 4 Shards Data reten6on of 24hrs KCL on EC2 writes data to S3 ready for Spark Max size of 1MB data blog 1,000 records/sec per shard write 5 transac6ons/sec read or 2MB/sec Put records Requests Server side API Logging from 7 applica6on servers using Log4JAppender Event Buffering at source [in progress]

19 S3 Simple Storage Service provides secure, highlyscalable, durable cloud storage Native support for Spark, Hive

20 S3 A strongly defined naming convention YYYY/MM/DD prefix used Avro format used for OLTP data/ JSON otherwise - probably the right choice (schema evolution), although we haven t take any advantages for those yet. period Allow easy retrieval of data from a particular time Easy to maintain and browse Handling of summaries from E, T & L steps

21 Spark on EMR AWS s managed Hadoop framework that can interact with data from S3, DynamoDB, etc. Apache Spark - Fast, general purpose engine for large- scale in- memory data processing. Runs on Hadoop/EMR and can read from S3. PySpark + SparkSQL was the focus in Apollo. Streaming and ML will be the focusing the months ahead.

22 Spark on EMR Spark is easy, performant Spark code is hard and time consuming DataFrame API exclusively Developing Spark applications in local environment with limited size dataset significantly differs from running Spark on EMR (e.g. joins, unions etc.) Don t pre-optimize Naive joins to be avoided Spark UI is invaluable to test performances (both locally and on EMR) and to understand the underlying mechanism of Spark Some scaling of Spark on EMR, se\led on memory op2mised instances r3.2xlarge (8 vcpus, 61GB RAM).

23 Data Pipeline + Simple No6fica6on Service Pipeline is a service to reliably process and move between AWS applica6ons (e.g. S3, EMR, DynamoDB) Pipelines run on schedule and alarms are issued with Simple No6fica6on Service (SNS) EMR/Spark used for compute and EC2 used for loading data in RedshiN Debugging can be a challenge

24 RedshiN Dense Compute or Dense Storage? - Single ds2.xlarge instance - Right balance between storage/memory/ compute and cost/hr Kimball Star Schema: Conformed dimensions across all data sources Test Dev Prod Core cmtest cmdev cmprod Agg agtest agdev agprod read permissions Strict ETL, no transforma2on is carried out in DW, an Append Only Strategy - Leverage power and scalability of EMR and Insert speed of Redshif - No Updates in DW, Drop and Recreate Tuning is a 2me consuming task & requires rigorous tes2ng. Define Sort, Distribu2on, Interleaved keys as early as possible. Reserved Nodes will be used in future

25 Tableau on EC2 Tableau Server runs on EC2 (c3.2xlarge) inside AWS Environment. Tableau Desktop used to develop dashboards that are published to the server. Connec2on to Redshif Data Warehouse - JDBC/ODBC Connector. Maps support is poor for countries outside the US

26 Up Next? Increase number of data streams/remove dependence on OLTP Tradi2onal BI/Repor2ng - More dashboards Trials of Kinesis Firehose, Kinesis Analy2cs, Quicksight Improved Code Deployment with Code Pipeline and Code Commit [In progress] Data Products with Spark ML/Amazon ML, DynamoDB, Lambda & API Gateway

27 DoneDeal Image Service Upgrade Image Storage & Transforming moved to AWS Over 4.5M images migrated to S3 ECS + ELB used for image resizing Autoscaling group enables adding new image sizes We now run docker in produc2on thanks to ECS Inves2ga2ng uses for AWS Lambda and image processing For more

28 DoneDeal Dynamic Test Environments QA can now run any feature branch of DoneDeal directly from our CI server Uses Jenkins / Docker (Machine + Compose) / EC2 & Route 53 Enables rapid tes2ng without server conten2on Also used by the mobile team to develop against & test new APIs For more

29 Q&A Session Martin Peters BI Manager at DoneDeal Nigel Creighton CTO at DNM

30

CAPTURING & PROCESSING REAL-TIME DATA ON AWS

CAPTURING & PROCESSING REAL-TIME DATA ON AWS CAPTURING & PROCESSING REAL-TIME DATA ON AWS @ 2015 Amazon.com, Inc. and Its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent

More information

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015 Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document

More information

Real Time Big Data Processing

Real Time Big Data Processing Real Time Big Data Processing Cloud Expo 2014 Ian Meyers Amazon Web Services Global Infrastructure Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure

More information

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate

More information

Hadoop & Spark Using Amazon EMR

Hadoop & Spark Using Amazon EMR Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?

More information

SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES

SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES AWS GLOBAL INFRASTRUCTURE 10 Regions 25 Availability Zones 51 Edge locations WHAT

More information

Big Data & Cloud Computing. Faysal Shaarani

Big Data & Cloud Computing. Faysal Shaarani Big Data & Cloud Computing Faysal Shaarani Agenda Business Trends in Data What is Big Data? Traditional Computing Vs. Cloud Computing Snowflake Architecture for the Cloud Business Trends in Data Critical

More information

Ibis: Scaling Python Analy=cs on Hadoop and Impala

Ibis: Scaling Python Analy=cs on Hadoop and Impala Ibis: Scaling Python Analy=cs on Hadoop and Impala Wes McKinney, Budapest BI Forum 2015-10- 14 @wesmckinn 1 Me R&D at Cloudera Serial creator of structured data tools / user interfaces Mathema=cian MIT

More information

Microservices on AWS

Microservices on AWS Microservices on AWS AWS Summit Berlin 2016 Matthias Jung, Solutions Architect Julien Simon, Evangelist April, 12 th, 2016 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda

More information

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...

More information

QLIKVIEW INTEGRATION TION WITH AMAZON REDSHIFT John Park Partner Engineering

QLIKVIEW INTEGRATION TION WITH AMAZON REDSHIFT John Park Partner Engineering QLIKVIEW INTEGRATION TION WITH AMAZON REDSHIFT John Park Partner Engineering June 2014 Page 1 Contents Introduction... 3 About Amazon Web Services (AWS)... 3 About Amazon Redshift... 3 QlikView on AWS...

More information

Data Warehouse Overview. Srini Rengarajan

Data Warehouse Overview. Srini Rengarajan Data Warehouse Overview Srini Rengarajan Please mute Your cell! Agenda Data Warehouse Architecture Approaches to build a Data Warehouse Top Down Approach Bottom Up Approach Best Practices Case Example

More information

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce Analytics in the Cloud Peter Sirota, GM Elastic MapReduce Data-Driven Decision Making Data is the new raw material for any business on par with capital, people, and labor. What is Big Data? Terabytes of

More information

Course Outline: Course: Implementing a Data Warehouse with Microsoft SQL Server 2012 Learning Method: Instructor-led Classroom Learning

Course Outline: Course: Implementing a Data Warehouse with Microsoft SQL Server 2012 Learning Method: Instructor-led Classroom Learning Course Outline: Course: Implementing a Data with Microsoft SQL Server 2012 Learning Method: Instructor-led Classroom Learning Duration: 5.00 Day(s)/ 40 hrs Overview: This 5-day instructor-led course describes

More information

Big Data Web Analytics Platform on AWS for Yottaa

Big Data Web Analytics Platform on AWS for Yottaa Big Data Web Analytics Platform on AWS for Yottaa Background Yottaa is a young, innovative company, providing a website acceleration platform to optimize Web and mobile applications and maximize user experience,

More information

Next-Generation Cloud Analytics with Amazon Redshift

Next-Generation Cloud Analytics with Amazon Redshift Next-Generation Cloud Analytics with Amazon Redshift What s inside Introduction Why Amazon Redshift is Great for Analytics Cloud Data Warehousing Strategies for Relational Databases Analyzing Fast, Transactional

More information

How to Enhance Traditional BI Architecture to Leverage Big Data

How to Enhance Traditional BI Architecture to Leverage Big Data B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...

More information

Analyzing Big Data with AWS

Analyzing Big Data with AWS Analyzing Big Data with AWS Peter Sirota, General Manager, Amazon Elastic MapReduce @petersirota What is Big Data? Computer generated data Application server logs (web sites, games) Sensor data (weather,

More information

PLATFORA INTERACTIVE, IN-MEMORY BUSINESS INTELLIGENCE FOR HADOOP

PLATFORA INTERACTIVE, IN-MEMORY BUSINESS INTELLIGENCE FOR HADOOP PLATFORA INTERACTIVE, IN-MEMORY BUSINESS INTELLIGENCE FOR HADOOP Your business is swimming in data, and your business analysts want to use it to answer the questions of today and tomorrow. YOU LOOK TO

More information

Data Integration Checklist

Data Integration Checklist The need for data integration tools exists in every company, small to large. Whether it is extracting data that exists in spreadsheets, packaged applications, databases, sensor networks or social media

More information

Technology Enablement

Technology Enablement SOLUTION OVERVIEW 1 ABOUT TECHMILEAGE Founded in 2008 / Tempe, Arizona Over 100 engagements Full range of business & technology services Software Development, Big Data, Cloud/AWS, BI, Advanced Analytics

More information

Cloud Big Data Architectures

Cloud Big Data Architectures Cloud Big Data Architectures Lynn Langit QCon Sao Paulo, Brazil 2016 About this Workshop Real-world Cloud Scenarios w/aws, Azure and GCP 1. Big Data Solution Types 2. Data Pipelines 3. ETL and Visualization

More information

Implementing a Data Warehouse with Microsoft SQL Server 2012 MOC 10777

Implementing a Data Warehouse with Microsoft SQL Server 2012 MOC 10777 Implementing a Data Warehouse with Microsoft SQL Server 2012 MOC 10777 Course Outline Module 1: Introduction to Data Warehousing This module provides an introduction to the key components of a data warehousing

More information

Implementing a Data Warehouse with Microsoft SQL Server 2012 (70-463)

Implementing a Data Warehouse with Microsoft SQL Server 2012 (70-463) Implementing a Data Warehouse with Microsoft SQL Server 2012 (70-463) Course Description Data warehousing is a solution organizations use to centralize business data for reporting and analysis. This five-day

More information

Apache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc. aaa@cloudera.com, twicer: @awadallah

Apache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc. aaa@cloudera.com, twicer: @awadallah Apache Hadoop: The Pla/orm for Big Data Amr Awadallah CTO, Founder, Cloudera, Inc. aaa@cloudera.com, twicer: @awadallah 1 The Problems with Current Data Systems BI Reports + Interac7ve Apps RDBMS (aggregated

More information

Making big data simple with Databricks

Making big data simple with Databricks Making big data simple with Databricks We are Databricks, the company behind Spark Founded by the creators of Apache Spark in 2013 Data 75% Share of Spark code contributed by Databricks in 2014 Value Created

More information

Apache Spark and the future of big data applica5ons. Eric Baldeschwieler

Apache Spark and the future of big data applica5ons. Eric Baldeschwieler Apache Spark and the future of big data applica5ons Eric Baldeschwieler Who is Eric14? Big data veteran (since 1996) Databricks Tech Advisor Twitter handle: @jeric14 Previously CTO/CEO of Hortonworks Yahoo

More information

Using Tableau Software with Hortonworks Data Platform

Using Tableau Software with Hortonworks Data Platform Using Tableau Software with Hortonworks Data Platform September 2013 2013 Hortonworks Inc. http:// Modern businesses need to manage vast amounts of data, and in many cases they have accumulated this data

More information

Production ready hadoop. By Deepak Rao Na,onal Head Datawarehousing Bajaj Finserv

Production ready hadoop. By Deepak Rao Na,onal Head Datawarehousing Bajaj Finserv Production ready hadoop By Deepak Rao Na,onal Head Datawarehousing Bajaj Finserv Agenda! Data in today s BFSI world! Modern Data Lake! Use cases & prototyping! Big data impact in BFSI! Thank you!! Defini8on

More information

Addressing Risk Data Aggregation and Risk Reporting Ben Sharma, CEO. Big Data Everywhere Conference, NYC November 2015

Addressing Risk Data Aggregation and Risk Reporting Ben Sharma, CEO. Big Data Everywhere Conference, NYC November 2015 Addressing Risk Data Aggregation and Risk Reporting Ben Sharma, CEO Big Data Everywhere Conference, NYC November 2015 Agenda 1. Challenges with Risk Data Aggregation and Risk Reporting (RDARR) 2. How a

More information

Reference Architecture, Requirements, Gaps, Roles

Reference Architecture, Requirements, Gaps, Roles Reference Architecture, Requirements, Gaps, Roles The contents of this document are an excerpt from the brainstorming document M0014. The purpose is to show how a detailed Big Data Reference Architecture

More information

Best Practices for Hadoop Data Analysis with Tableau

Best Practices for Hadoop Data Analysis with Tableau Best Practices for Hadoop Data Analysis with Tableau September 2013 2013 Hortonworks Inc. http:// Tableau 6.1.4 introduced the ability to visualize large, complex data stored in Apache Hadoop with Hortonworks

More information

COURSE 20463C: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER

COURSE 20463C: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER Page 1 of 8 ABOUT THIS COURSE This 5 day course describes how to implement a data warehouse platform to support a BI solution. Students will learn how to create a data warehouse with Microsoft SQL Server

More information

Implementing a Data Warehouse with Microsoft SQL Server

Implementing a Data Warehouse with Microsoft SQL Server Page 1 of 7 Overview This course describes how to implement a data warehouse platform to support a BI solution. Students will learn how to create a data warehouse with Microsoft SQL 2014, implement ETL

More information

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam sastry.vedantam@oracle.com

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam sastry.vedantam@oracle.com Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam sastry.vedantam@oracle.com Agenda The rise of Big Data & Hadoop MySQL in the Big Data Lifecycle MySQL Solutions for Big Data Q&A

More information

Data Management in the Cloud: Limitations and Opportunities. Annies Ductan

Data Management in the Cloud: Limitations and Opportunities. Annies Ductan Data Management in the Cloud: Limitations and Opportunities Annies Ductan Discussion Outline: Introduc)on Overview Vision of Cloud Compu8ng Managing Data in The Cloud Cloud Characteris8cs Data Management

More information

Implementing a Data Warehouse with Microsoft SQL Server 2012

Implementing a Data Warehouse with Microsoft SQL Server 2012 Implementing a Data Warehouse with Microsoft SQL Server 2012 Module 1: Introduction to Data Warehousing Describe data warehouse concepts and architecture considerations Considerations for a Data Warehouse

More information

Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!

Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!! Simplifying Big Data Analytics: Unifying Batch and Stream Processing John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!! Streaming Analy.cs S S S Scale- up Database Data And Compute Grid

More information

Big Data Pipeline and Analytics Platform

Big Data Pipeline and Analytics Platform Big Data Pipeline and Analytics Platform Using NetflixOSS and Other Open Source Software Sudhir Tonse (@stonse) Danny Yuan (@g9yuayon) Netflix is a log generating company that also happens to stream movies

More information

Beyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations

Beyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations Beyond Lambda - how to get from logical to physical Artur Borycki, Director International Technology & Innovations Simplification & Efficiency Teradata believe in the principles of self-service, automation

More information

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES Relational vs. Non-Relational Architecture Relational Non-Relational Rational Predictable Traditional Agile Flexible Modern 2 Agenda Big Data

More information

East Asia Network Sdn Bhd

East Asia Network Sdn Bhd Course: Analyzing, Designing, and Implementing a Data Warehouse with Microsoft SQL Server 2014 Elements of this syllabus may be change to cater to the participants background & knowledge. This course describes

More information

Everything You Need to Know about Cloud BI. Freek Kamst

Everything You Need to Know about Cloud BI. Freek Kamst Everything You Need to Know about Cloud BI Freek Kamst Business Analy2cs Insight, Bussum June 10th, 2014 What s it all about? Has anything changed in the world of BI? Is Cloud Compu2ng a Hype or here to

More information

Analytics on Spark & Shark @Yahoo

Analytics on Spark & Shark @Yahoo Analytics on Spark & Shark @Yahoo PRESENTED BY Tim Tully December 3, 2013 Overview Legacy / Current Hadoop Architecture Reflection / Pain Points Why the movement towards Spark / Shark New Hybrid Environment

More information

<Insert Picture Here> Oracle BI Standard Edition One The Right BI Foundation for the Emerging Enterprise

<Insert Picture Here> Oracle BI Standard Edition One The Right BI Foundation for the Emerging Enterprise Oracle BI Standard Edition One The Right BI Foundation for the Emerging Enterprise Business Intelligence is the #1 Priority the most important technology in 2007 is business intelligence

More information

Implementing a Data Warehouse with Microsoft SQL Server

Implementing a Data Warehouse with Microsoft SQL Server Course Code: M20463 Vendor: Microsoft Course Overview Duration: 5 RRP: 2,025 Implementing a Data Warehouse with Microsoft SQL Server Overview This course describes how to implement a data warehouse platform

More information

Big Data Analytics Platform @ Nokia

Big Data Analytics Platform @ Nokia Big Data Analytics Platform @ Nokia 1 Selecting the Right Tool for the Right Workload Yekesa Kosuru Nokia Location & Commerce Strata + Hadoop World NY - Oct 25, 2012 Agenda Big Data Analytics Platform

More information

xpaaerns on Spark, Shark, Tachyon and Mesos

xpaaerns on Spark, Shark, Tachyon and Mesos xpaaerns on Spark, Shark, Tachyon and Mesos Spark Summit 2014 Claudiu Barbura Sr. Director of Engineering A>geo Agenda xpa&erns Architecture From Hadoop to BDAS & our contribu

More information

MICROSTRATEGY ON AWS

MICROSTRATEGY ON AWS MICROSTRATEGY ON AWS Presented by: MicroStrategy World 2015 Tuesday, January 27th 3:30 4:30 PM Track 8 Session 3 WWW.IOLAP.COM 1 INTRODUCTIONS iolap Data Warehousing and Business Intelligence consultancy

More information

Big Data for everyone Democratizing big data with the cloud. Steffen Krause Technical Evangelist @AWS_Aktuell skrause@amazon.de

Big Data for everyone Democratizing big data with the cloud. Steffen Krause Technical Evangelist @AWS_Aktuell skrause@amazon.de Big Data for everyone Democratizing big data with the cloud Steffen Krause Technical Evangelist @AWS_Aktuell skrause@amazon.de Does this Data make me look big? Overview Designing big data solutions in

More information

How to Leverage Cloud to Quickly Build Scalable Applications

How to Leverage Cloud to Quickly Build Scalable Applications How to Leverage Cloud to Quickly Build Scalable Applications Chris Keyser Principal Solution Architect David Polley Senior Director Cloud Product Management Cloud Growth Recent IDC cloud research shows

More information

Establish and maintain Center of Excellence (CoE) around Data Architecture

Establish and maintain Center of Excellence (CoE) around Data Architecture Senior BI Data Architect - Bensenville, IL The Company s Information Management Team is comprised of highly technical resources with diverse backgrounds in data warehouse development & support, business

More information

PLATFORA SOLUTION ARCHITECTURE

PLATFORA SOLUTION ARCHITECTURE WHITE PAPER PLATFORA SOLUTION ARCHITECTURE Implementing a Big Data Discovery Solution with Platfora WHITE PAPER PLATFORA SOLUTION ARCHITECTURE Implementing a Big Data Discovery Solution with Platfora INTRODUCTION

More information

Implementing a Data Warehouse with Microsoft SQL Server

Implementing a Data Warehouse with Microsoft SQL Server This course describes how to implement a data warehouse platform to support a BI solution. Students will learn how to create a data warehouse 2014, implement ETL with SQL Server Integration Services, and

More information

Big Data Use Case: Business Analytics

Big Data Use Case: Business Analytics Big Data Use Case: Business Analytics Starting point A telecommunications company wants to allude to the topic of Big Data. The established Big Data working group has access to the data stock of the enterprise

More information

THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS

THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS WHITE PAPER Successfully writing Fast Data applications to manage data generated from mobile, smart devices and social interactions, and the

More information

Scalable Architecture on Amazon AWS Cloud

Scalable Architecture on Amazon AWS Cloud Scalable Architecture on Amazon AWS Cloud Kalpak Shah Founder & CEO, Clogeny Technologies kalpak@clogeny.com 1 * http://www.rightscale.com/products/cloud-computing-uses/scalable-website.php 2 Architect

More information

SAP and Hortonworks Reference Architecture

SAP and Hortonworks Reference Architecture SAP and Hortonworks Reference Architecture Hortonworks. We Do Hadoop. June Page 1 2014 Hortonworks Inc. 2011 2014. All Rights Reserved A Modern Data Architecture With SAP DATA SYSTEMS APPLICATIO NS Statistical

More information

White Paper November 2015. Technical Comparison of Perspectium Replicator vs Traditional Enterprise Service Buses

White Paper November 2015. Technical Comparison of Perspectium Replicator vs Traditional Enterprise Service Buses White Paper November 2015 Technical Comparison of Perspectium Replicator vs Traditional Enterprise Service Buses Our Evolutionary Approach to Integration With the proliferation of SaaS adoption, a gap

More information

Federated SQL on Hadoop and Beyond: Leveraging Apache Geode to Build a Poor Man's SAP HANA. by Christian Tzolov @christzolov

Federated SQL on Hadoop and Beyond: Leveraging Apache Geode to Build a Poor Man's SAP HANA. by Christian Tzolov @christzolov Federated SQL on Hadoop and Beyond: Leveraging Apache Geode to Build a Poor Man's SAP HANA by Christian Tzolov @christzolov Whoami Christian Tzolov Technical Architect at Pivotal, BigData, Hadoop, SpringXD,

More information

Course Outline. Module 1: Introduction to Data Warehousing

Course Outline. Module 1: Introduction to Data Warehousing Course Outline Module 1: Introduction to Data Warehousing This module provides an introduction to the key components of a data warehousing solution and the highlevel considerations you must take into account

More information

Implement a Data Warehouse with Microsoft SQL Server 20463C; 5 days

Implement a Data Warehouse with Microsoft SQL Server 20463C; 5 days Lincoln Land Community College Capital City Training Center 130 West Mason Springfield, IL 62702 217-782-7436 www.llcc.edu/cctc Implement a Data Warehouse with Microsoft SQL Server 20463C; 5 days Course

More information

Course 20463:Implementing a Data Warehouse with Microsoft SQL Server

Course 20463:Implementing a Data Warehouse with Microsoft SQL Server Course 20463:Implementing a Data Warehouse with Microsoft SQL Server Type:Course Audience(s):IT Professionals Technology:Microsoft SQL Server Level:300 This Revision:C Delivery method: Instructor-led (classroom)

More information

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence Appliances and DW Architectures John O Brien President and Executive Architect Zukeran Technologies 1 TDWI 1 Agenda What

More information

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect on AWS Services Overview Bernie Nallamotu Principle Solutions Architect \ So what is it? When your data sets become so large that you have to start innovating around how to collect, store, organize, analyze

More information

Introduction to AWS in Higher Ed

Introduction to AWS in Higher Ed Introduction to AWS in Higher Ed Lori Clithero loricli@amazon.com 206.227.5054 University of Washington Cloud Day 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 2 Cloud democratizes

More information

Course 10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

Course 10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012 Course 10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012 OVERVIEW About this Course Data warehousing is a solution organizations use to centralize business data for reporting and analysis.

More information

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren News and trends in Data Warehouse Automation, Big Data and BI Johan Hendrickx & Dirk Vermeiren Extreme Agility from Source to Analysis DWH Appliances & DWH Automation Typical Architecture 3 What Business

More information

Building a BI Solution in the Cloud

Building a BI Solution in the Cloud Building a BI Solution in the Cloud Stacia Varga, Principal Consultant Email: stacia@datainspirations.com Twitter: @_StaciaV_ 2 SQLSaturday #467 Sponsors Stacia (Misner) Varga Over 30 years of IT experience,

More information

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies Big Data: Global Digital Data Growth Growing leaps and bounds by 40+% Year over Year! 2009 =.8 Zetabytes =.08

More information

Self-service BI for big data applications using Apache Drill

Self-service BI for big data applications using Apache Drill Self-service BI for big data applications using Apache Drill 2015 MapR Technologies 2015 MapR Technologies 1 Data Is Doubling Every Two Years Unstructured data will account for more than 80% of the data

More information

Luncheon Webinar Series May 13, 2013

Luncheon Webinar Series May 13, 2013 Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration

More information

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM QlikView Technical Case Study Series Big Data June 2012 qlikview.com Introduction This QlikView technical case study focuses on the QlikView deployment

More information

Big Data for Investment Research Management

Big Data for Investment Research Management IDT Partners www.idtpartners.com Big Data for Investment Research Management Discover how IDT Partners helps Financial Services, Market Research, and Investment Management firms turn big data into actionable

More information

AtScale Intelligence Platform

AtScale Intelligence Platform AtScale Intelligence Platform PUT THE POWER OF HADOOP IN THE HANDS OF BUSINESS USERS. Connect your BI tools directly to Hadoop without compromising scale, performance, or control. TURN HADOOP INTO A HIGH-PERFORMANCE

More information

Implementing a Data Warehouse with Microsoft SQL Server 2012

Implementing a Data Warehouse with Microsoft SQL Server 2012 Course 10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012 Length: Audience(s): 5 Days Level: 200 IT Professionals Technology: Microsoft SQL Server 2012 Type: Delivery Method: Course Instructor-led

More information

HADOOP BIG DATA DEVELOPER TRAINING AGENDA

HADOOP BIG DATA DEVELOPER TRAINING AGENDA HADOOP BIG DATA DEVELOPER TRAINING AGENDA About the Course This course is the most advanced course available to Software professionals This has been suitably designed to help Big Data Developers and experts

More information

Ganzheitliches Datenmanagement

Ganzheitliches Datenmanagement Ganzheitliches Datenmanagement für Hadoop Michael Kohs, Senior Sales Consultant @mikchaos The Problem with Big Data Projects in 2016 Relational, Mainframe Documents and Emails Data Modeler Data Scientist

More information

More Data in Less Time

More Data in Less Time More Data in Less Time Leveraging Cloudera CDH as an Operational Data Store Daniel Tydecks, Systems Engineering DACH & CE Goals of an Operational Data Store Load Data Sources Traditional Architecture Operational

More information

Data Virtualization for Agile Business Intelligence Systems and Virtual MDM. To View This Presentation as a Video Click Here

Data Virtualization for Agile Business Intelligence Systems and Virtual MDM. To View This Presentation as a Video Click Here Data Virtualization for Agile Business Intelligence Systems and Virtual MDM To View This Presentation as a Video Click Here Agenda Data Virtualization New Capabilities New Challenges in Data Integration

More information

Copyright 2014 Splunk Inc.

Copyright 2014 Splunk Inc. Copyright 2014 Splunk Inc. Extend Splunk by Visualizing Data using Tableau and the ODBC driver Sharad Kylasam Sr. Product Manager, Splunk Ashley Jaschke Product Manager, Tableau Joe Specht Sr. Director

More information

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2 Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue

More information

Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth

Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth MAKING BIG DATA COME ALIVE Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth Steve Gonzales, Principal Manager steve.gonzales@thinkbiganalytics.com

More information

SAS BI Course Content; Introduction to DWH / BI Concepts

SAS BI Course Content; Introduction to DWH / BI Concepts SAS BI Course Content; Introduction to DWH / BI Concepts SAS Web Report Studio 4.2 SAS EG 4.2 SAS Information Delivery Portal 4.2 SAS Data Integration Studio 4.2 SAS BI Dashboard 4.2 SAS Management Console

More information

A very short talk about Apache Kylin Business Intelligence meets Big Data. Fabian Wilckens EMEA Solutions Architect

A very short talk about Apache Kylin Business Intelligence meets Big Data. Fabian Wilckens EMEA Solutions Architect A very short talk about Apache Kylin Business Intelligence meets Big Data Fabian Wilckens EMEA Solutions Architect 1 The challenge today 2 Very quickly: OLAP Online Analytical Processing How many beers

More information

Dashboard Engine for Hadoop

Dashboard Engine for Hadoop Matt McDevitt Sr. Project Manager Pavan Challa Sr. Data Engineer June 2015 Dashboard Engine for Hadoop Think Big Start Smart Scale Fast Agenda Think Big Overview Engagement Model Solution Offerings Dashboard

More information

SQL Server 2005 Features Comparison

SQL Server 2005 Features Comparison Page 1 of 10 Quick Links Home Worldwide Search Microsoft.com for: Go : Home Product Information How to Buy Editions Learning Downloads Support Partners Technologies Solutions Community Previous Versions

More information

Azure Data Lake Analytics

Azure Data Lake Analytics Azure Data Lake Analytics Compose and orchestrate data services at scale Fully managed service to support orchestration of data movement and processing Connect to relational or non-relational data

More information

Implementing a Data Warehouse with Microsoft SQL Server 2012

Implementing a Data Warehouse with Microsoft SQL Server 2012 Course 10777 : Implementing a Data Warehouse with Microsoft SQL Server 2012 Page 1 of 8 Implementing a Data Warehouse with Microsoft SQL Server 2012 Course 10777: 4 days; Instructor-Led Introduction Data

More information

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved Hortonworks & SAS Analytics everywhere. Page 1 A change in focus. A shift in Advertising From mass branding A shift in Financial Services From Educated Investing A shift in Healthcare From mass treatment

More information

Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com

Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com Data Warehousing and Analytics Infrastructure at Facebook Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com Overview Challenges in a Fast Growing & Dynamic Environment Data Flow Architecture,

More information

Cost-Effective Business Intelligence with Red Hat and Open Source

Cost-Effective Business Intelligence with Red Hat and Open Source Cost-Effective Business Intelligence with Red Hat and Open Source Sherman Wood Director, Business Intelligence, Jaspersoft September 3, 2009 1 Agenda Introductions Quick survey What is BI?: reporting,

More information

Exploring the Synergistic Relationships Between BPC, BW and HANA

Exploring the Synergistic Relationships Between BPC, BW and HANA September 9 11, 2013 Anaheim, California Exploring the Synergistic Relationships Between, BW and HANA Sheldon Edelstein SAP Database and Solution Management Learning Points SAP Business Planning and Consolidation

More information

MicroStrategy Course Catalog

MicroStrategy Course Catalog MicroStrategy Course Catalog 1 microstrategy.com/education 3 MicroStrategy course matrix 4 MicroStrategy 9 8 MicroStrategy 10 table of contents MicroStrategy course matrix MICROSTRATEGY 9 MICROSTRATEGY

More information

Logz.io See the logz that matter

Logz.io See the logz that matter See the logz that matter How Logz.io Secures Customer Log Data White Paper A certain amount of confidence is needed when relying on third party vendors to manage and handle your online data and log files

More information

BPO. Accerela*ng Revenue Enhancements Through Sales Support Services

BPO. Accerela*ng Revenue Enhancements Through Sales Support Services BPO Accerela*ng Revenue Enhancements Through Sales Support Services What is BPO? Business Process Outsorcing (BPO) is the process of outsourcing specific business func6ons to a third- party service provider

More information

Best Practices for Deploying Managed Self-Service Analytics and Why Tableau and QlikView Fall Short

Best Practices for Deploying Managed Self-Service Analytics and Why Tableau and QlikView Fall Short Best Practices for Deploying Managed Self-Service Analytics and Why Tableau and QlikView Fall Short Vijay Anand, Director, Product Marketing Agenda 1. Managed self-service» The need of managed self-service»

More information

Mobile Big Data AnalyEcs

Mobile Big Data AnalyEcs Copyright 2014 Splunk Inc. Mobile Big Data AnalyEcs Marc Courtemanche, Sr. Director Alain Brunet, Sr. Lead Developer Vantrix CorporaEon Disclaimer During the course of this presentaeon, we may make forward-

More information

Cyber Security With Big Data

Cyber Security With Big Data Cyber Security With Big Data Fast. Complete. Cost-Effec1ve. Harry J Foxwell, PhD Principal Consultant Oracle Public Sector Oct 2015 Safe Harbor Statement The following is intended to outline our general

More information

Beta: Implementing a Data Warehouse with Microsoft SQL Server 2012

Beta: Implementing a Data Warehouse with Microsoft SQL Server 2012 CÔNG TY CỔ PHẦN TRƯỜNG CNTT TÂN ĐỨC TAN DUC INFORMATION TECHNOLOGY SCHOOL JSC LEARN MORE WITH LESS! Course 10777: Beta: Implementing a Data Warehouse with Microsoft SQL Server 2012 Length: 5 Days Audience:

More information