Monetizing Millions of Mobile Users with Cloud Business Analytics

Size: px
Start display at page:

Download "Monetizing Millions of Mobile Users with Cloud Business Analytics"

Transcription

1 Monetizing Millions of Mobile Users with Cloud Business Analytics MicroStrategy World 2013 David Abercrombie Data Analytics Engineer

2 Agenda Tapjoy Big Data Architecture MicroStrategy Cloud Implementation MicroStrategy Cloud vs. Amazon EC2 Vertica

3 Tapjoy

4 Tapjoy by the Numbers 5,300+ Active Apps 1.3MM Daily Ad Conversions 339MM Global Reach

5 A Few of Our Advertisers CPG RETAIL ALCOHOL QSR INSURANCE TELCO FINANCIAL ENTERTAINMEN T HEALTH & BEAUTY AUTOMOTIVE TRAVEL TECHNOLOGY

6 A Few of Our Brand Advertisers

7 Recent Recognition

8 Big Data Architecture

9 Operational Data Systems Ruby application servers in Amazon EC2 Memcached Amazon Simple DB => Riak in data center Amazon RDS => Heroku PostgreSQL Hadoop on EC2 (HDFS, Hive, Pig, etc.) Vertica on EC2 => data center R, Mahout, Mallet, etc. Syslog-ng => RabbitMQ Amazon S3

10 Analytic Data Volume About one billion analytic rows per day 10 to 15+ thousand per second Even more in transactional databases Several dozen terabytes online Compression and parallelism: Vertica Aggregate 100 billion rows in a second!

11 Transactional System Ruby Apps web_requests Gzip Files Other Data Files Was Syslog NG Will be Rabbit MQ ETL 1 Amazon S3 Storage Area ETL 2 Hadoop Main Backup DW Vertica BI Prod MSTR ETL 4 RDS MySQL ETL 3 ETLs 1) Hadoop 2) Load Vertica 3) Load Vertica 4) Aggregation

12 MicroStrategy Cloud Implementation

13 MicroStrategy Uses Executive dashboards Canned reports Ad hoc analysis External partner dashboards Mobile Mobile, web and iframe Replace Tableau, etc.

14 Why Cloud? Obvious: No installation No maintenance No need for special admin skills Other: We have had no data center! Start-up, we are all stretched thin

15 Cloud challenges User and group management (MSTR LDAP) Project-level setup Mobile URL configs Deployment (dev, QA, prod) Training, docs, and consulting Password reset No end-to-end metrics Work your Cloud reps they are great!

16 Schema Design 25 dimensions 100 metrics 80 tables Partly denormalized snowflake Aggregated facts, one-hour granularity Custom ELT (within Vertica)

17 MicroStrategy Design Challenges Non-additive metrics Type 2 Slow Changing Dimensions Tech Note Row-level security for external partners Tech Note Ensuring parallel SQL for Vertica

18 MicroStrategy Cloud vs. Amazon EC2

19 EC2 response time anomalies

20 EC2 response time anomalies Good ( prod ) Bad ( bi ) Minimum 120 ms 150 ms First Quartile 132 ms 344 ms Median 153 ms 2,660 ms Mean 432 ms 10,180 ms Third Quartile 219 ms 11,230 ms Maximum 9,652 ms 211,500 ms Standard Deviation 976 ms 18,080 ms

21 EC2 load anomalies (response time) top - 18:53:52 up 16 days, 1:09, 4 users, load average: 82.65, 86.79, Tasks: 147 total, 1 running, 144 sleeping, 2 stopped, 0 zombie Cpu(s): 0.1%us, 2.0%sy, 11.9%ni, 81.9%id, 3.9%wa, 0.0%hi, 0.0%si, 0.2%st Mem: k total, k used, 44780k free, 1136k buffers Swap: 0k total, 0k used, 0k free, k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1666 dbadmin g 5.5g 76m S :02 vertica 175 root S :12.49 kswapd spread S :03.80 spread root S :08.50 pdflush 1 root S :02.01 init 2 root RT S :01.48 migration/0 3 root S :00.50 ksoftirqd/0 4 root RT S :00.01 watchdog/0 5 root S :00.95 events/0 6 root S :01.44 khelper

22 Vertica

23 Clustered Database Move the computation to the data, never move the data to the computation Dr. Michael Stonebraker Moving Computation is Cheaper than Moving Data Hadoop documentation

24 Vertica/MicroStrategy Resources Track 2 Session 8, 4:45 today in Lafleur 1: HP Vertica: A Step-by-Step Plan for Implementing Mobile Business Intelligence MicroStrategy Technical Note 42360

25 Vertica Magic MPP shared nothing Parallel SQL Column store Compression Run Length Encoding

26 Vertica Parallel SQL Client connects to any node Becomes the initiator node Initiator creates execution plan Initiator sends plans to other nodes Nodes execute plans on their local data Nodes return results to initiator Initiator combines result subsets from nodes Initiator send answer to client

27 Vertica Column Store Pioneered at C-Store project at MIT, etc. Read only the columns you need Reduces IO Each IO is more relevant Enables Run Length Encoding

28 Vertica Projections Disk data structures Like Oracle materialized views Like index-only access path Can have many per table Optimize for different queries Watch load time and disk Disk files each contain only one column

29 Run Length Encoding Sorted: identical values together in same bucket Each bucket maps to a set of rows in subsequent columns Gender Class Pass Name SELECT Name F M Fresh Junior Senior Soph Fresh Junior Senior Soph F T F T F T F T F T F T F T F T FROM Students WHERE Class= Junior AND Gender= M AND Pass= F

30 308 million rows, 28 columns, 8 GB Column Name Bytes Rows per Byte device_size_code 5,290 58,100 month_key 5,460 56,300 day_key 34,000 9,030 hour_key 571, platform_key 9,160, conversions 308,000, currency_key 609,000, publisher_partner_key 722,000, advertiser_partner_key 729,000, offer_key 742,000, publisher_app_key 784,000, tapjoy_revenue 903,000, virtual_currency_rewarded 1,110,000,

31 Denormalize! Column Name Bytes Rows per Byte device_size_code 5,290 58,100 month_key 5,460 56,300 day_key 34,000 9,030 hour_key 571, platform_key 9,160, Very little extra disk and IO to denormalize few MB on 8 GB table Moderately denormalized snowflake schema

32 Vertica Segmentation (sharding) Distribute data evenly among nodes For parallel query processing Typically hash() a high-cardinality column But not date or timestamp More than one segmentation allowed Mutiple Vertica projections

33 Parallel Joins Local joins require identical segmentation Must include equality on segmentation columns Replicate dimensions to all nodes

34 Predicate Pushdown and Fact Levels MicroStrategy facts must be at same level to be combined into a single metric 921 Advanced project design p 128. Vertica aggregation views can change a fact level Reduce dimensionality Vertica will push a filter predicate down into a view before aggregating, helping performance

35 Changing fact level with SQL view CREATE VIEW fact_table_1_ab ( dimension_a, dimension_b, high_cardinality_id, fact_1 ) AS SELECT dimension_a, dimension_b, high_cardinality_id, sum(metric_1) FROM fact_table_1 GROUP BY dimension_a, dimension_b, high_cardinality_id;

36 Predicate Pushdown and Fact Levels EXPLAIN SELECT * FROM fact_table_1_ab WHERE dimension_a = 'A'; Access Path: +-GROUPBY HASH (SORT OUTPUT) (PATH ID: 3) Aggregates: sum(fact_table_1.metric_1) Group By: fact_table_1.dimension_a, fact_table_1.dimension_b, fact_table_1.high_cardinality_id Execute on: All Nodes +---> STORAGE ACCESS for fact_table_1 (PATH ID: 4) Projection: mstr.fact_table_1_super_b0 Materialize: fact_table_1.dimension_a, fact_table_1.dimension_b, fact_table_1.high_cardinality_id, fact_table_1.metric_1 Filter: (fact_table_1.dimension_a = 'A') Execute on: All Nodes

37 Vertica Challenges Data structure design is key Must match SQL and business needs Parallel joins Poor update/delete and referential integrity Complicates ELT BTW, bulk loads are fast!

38 Agenda Tapjoy Big Data Architecture MicroStrategy Cloud Implementation MicroStrategy Cloud vs. Amazon EC2 Vertica

39 Tapjoy, Inc. All Rights Reserved. Tapjoy, Inc. Confidential and Proprietary - Please Do Not Copy or Distribute Without Tapjoy, Inc. s Prior Written Consent. The data provided herein is for information purposes only and while all efforts are made to ensure accuracy, errors may arise. Tapjoy and the Tapjoy logo are trademarks or registered trademarks of Tapjoy, Inc. All third party logos and trademarks mentioned are the property of their respective owners.

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam sastry.vedantam@oracle.com

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam sastry.vedantam@oracle.com Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam sastry.vedantam@oracle.com Agenda The rise of Big Data & Hadoop MySQL in the Big Data Lifecycle MySQL Solutions for Big Data Q&A

More information

How To Scale Out Of A Nosql Database

How To Scale Out Of A Nosql Database Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI

More information

Performance and Scalability Overview

Performance and Scalability Overview Performance and Scalability Overview This guide provides an overview of some of the performance and scalability capabilities of the Pentaho Business Analytics Platform. Contents Pentaho Scalability and

More information

Facultat d'informàtica de Barcelona Univ. Politècnica de Catalunya. Administració de Sistemes Operatius. System monitoring

Facultat d'informàtica de Barcelona Univ. Politècnica de Catalunya. Administració de Sistemes Operatius. System monitoring Facultat d'informàtica de Barcelona Univ. Politècnica de Catalunya Administració de Sistemes Operatius System monitoring Topics 1. Introduction to OS administration 2. Installation of the OS 3. Users management

More information

Cost-Effective Business Intelligence with Red Hat and Open Source

Cost-Effective Business Intelligence with Red Hat and Open Source Cost-Effective Business Intelligence with Red Hat and Open Source Sherman Wood Director, Business Intelligence, Jaspersoft September 3, 2009 1 Agenda Introductions Quick survey What is BI?: reporting,

More information

Performance and Scalability Overview

Performance and Scalability Overview Performance and Scalability Overview This guide provides an overview of some of the performance and scalability capabilities of the Pentaho Business Analytics platform. PENTAHO PERFORMANCE ENGINEERING

More information

Amazon Redshift & Amazon DynamoDB Michael Hanisch, Amazon Web Services Erez Hadas-Sonnenschein, clipkit GmbH Witali Stohler, clipkit GmbH 2014-05-15

Amazon Redshift & Amazon DynamoDB Michael Hanisch, Amazon Web Services Erez Hadas-Sonnenschein, clipkit GmbH Witali Stohler, clipkit GmbH 2014-05-15 Amazon Redshift & Amazon DynamoDB Michael Hanisch, Amazon Web Services Erez Hadas-Sonnenschein, clipkit GmbH Witali Stohler, clipkit GmbH 2014-05-15 2014 Amazon.com, Inc. and its affiliates. All rights

More information

Next-Generation Cloud Analytics with Amazon Redshift

Next-Generation Cloud Analytics with Amazon Redshift Next-Generation Cloud Analytics with Amazon Redshift What s inside Introduction Why Amazon Redshift is Great for Analytics Cloud Data Warehousing Strategies for Relational Databases Analyzing Fast, Transactional

More information

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon.

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon. Building Scalable Big Data Infrastructure Using Open Source Software Sam William sampd@stumbleupon. What is StumbleUpon? Help users find content they did not expect to find The best way to discover new

More information

Big Data? Definition # 1: Big Data Definition Forrester Research

Big Data? Definition # 1: Big Data Definition Forrester Research Big Data Big Data? Definition # 1: Big Data Definition Forrester Research Big Data? Definition # 2: Quote of Tim O Reilly brings it all home: Companies that have massive amounts of data without massive

More information

SQL Server PDW. Artur Vieira Premier Field Engineer

SQL Server PDW. Artur Vieira Premier Field Engineer SQL Server PDW Artur Vieira Premier Field Engineer Agenda 1 Introduction to MPP and PDW 2 PDW Architecture and Components 3 Data Structures 4 PDW Tools Data Load / Data Output / Administrative Console

More information

Big Data Use Case. How Rackspace is using Private Cloud for Big Data. Bryan Thompson. May 8th, 2013

Big Data Use Case. How Rackspace is using Private Cloud for Big Data. Bryan Thompson. May 8th, 2013 Big Data Use Case How Rackspace is using Private Cloud for Big Data Bryan Thompson May 8th, 2013 Our Big Data Problem Consolidate all monitoring data for reporting and analytical purposes. Every device

More information

Parallel Data Warehouse

Parallel Data Warehouse MICROSOFT S ANALYTICS SOLUTIONS WITH PARALLEL DATA WAREHOUSE Parallel Data Warehouse Stefan Cronjaeger Microsoft May 2013 AGENDA PDW overview Columnstore and Big Data Business Intellignece Project Ability

More information

Background on Elastic Compute Cloud (EC2) AMI s to choose from including servers hosted on different Linux distros

Background on Elastic Compute Cloud (EC2) AMI s to choose from including servers hosted on different Linux distros David Moses January 2014 Paper on Cloud Computing I Background on Tools and Technologies in Amazon Web Services (AWS) In this paper I will highlight the technologies from the AWS cloud which enable you

More information

Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1

Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1 Why NoSQL? Your database options in the new non- relational world 2015 IBM Cloudant 1 Table of Contents New types of apps are generating new types of data... 3 A brief history on NoSQL... 3 NoSQL s roots

More information

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW AGENDA What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story Hadoop PDW Our BIG DATA Roadmap BIG DATA? Volume 59% growth in annual WW information 1.2M Zetabytes (10 21 bytes) this

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

Big Data & Cloud Computing. Faysal Shaarani

Big Data & Cloud Computing. Faysal Shaarani Big Data & Cloud Computing Faysal Shaarani Agenda Business Trends in Data What is Big Data? Traditional Computing Vs. Cloud Computing Snowflake Architecture for the Cloud Business Trends in Data Critical

More information

Real Time Big Data Processing

Real Time Big Data Processing Real Time Big Data Processing Cloud Expo 2014 Ian Meyers Amazon Web Services Global Infrastructure Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure

More information

5 Signs You Might Be Outgrowing Your MySQL Data Warehouse*

5 Signs You Might Be Outgrowing Your MySQL Data Warehouse* Whitepaper 5 Signs You Might Be Outgrowing Your MySQL Data Warehouse* *And Why Vertica May Be the Right Fit Like Outgrowing Old Clothes... Most of us remember a favorite pair of pants or shirt we had as

More information

Scalable Architecture on Amazon AWS Cloud

Scalable Architecture on Amazon AWS Cloud Scalable Architecture on Amazon AWS Cloud Kalpak Shah Founder & CEO, Clogeny Technologies kalpak@clogeny.com 1 * http://www.rightscale.com/products/cloud-computing-uses/scalable-website.php 2 Architect

More information

Unlock your data for fast insights: dimensionless modeling with in-memory column store. By Vadim Orlov

Unlock your data for fast insights: dimensionless modeling with in-memory column store. By Vadim Orlov Unlock your data for fast insights: dimensionless modeling with in-memory column store By Vadim Orlov I. DIMENSIONAL MODEL Dimensional modeling (also known as star or snowflake schema) was pioneered by

More information

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world Analytics March 2015 White paper Why NoSQL? Your database options in the new non-relational world 2 Why NoSQL? Contents 2 New types of apps are generating new types of data 2 A brief history of NoSQL 3

More information

Exadata in the Retail Sector

Exadata in the Retail Sector Exadata in the Retail Sector Jon Mead Managing Director - Rittman Mead Consulting Agenda Introduction Business Problem Approach Design Considerations Observations Wins Summary Q&A What it is not... Introductions

More information

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate

More information

SQream Technologies Ltd - Confiden7al

SQream Technologies Ltd - Confiden7al SQream Technologies Ltd - Confiden7al 1 Ge#ng Big Data Done On a GPU- Based Database Ori Netzer VP Product 26- Mar- 14 Analy7cs Performance - 3 TB, 18 Billion records SQream Database 400x More Cost Efficient!

More information

Zynga Analytics Leveraging Big Data to Make Games More Fun and Social

Zynga Analytics Leveraging Big Data to Make Games More Fun and Social Connecting the World Through Games Zynga Analytics Leveraging Big Data to Make Games More Fun and Social Daniel McCaffrey General Manager, Platform and Analytics Engineering World s leading social game

More information

Performance Management in Big Data Applica6ons. Michael Kopp, Technology Strategist @mikopp

Performance Management in Big Data Applica6ons. Michael Kopp, Technology Strategist @mikopp Performance Management in Big Data Applica6ons Michael Kopp, Technology Strategist NoSQL: High Volume/Low Latency DBs Web Java Key Challenges 1) Even Distribu6on 2) Correct Schema and Access paperns 3)

More information

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM QlikView Technical Case Study Series Big Data June 2012 qlikview.com Introduction This QlikView technical case study focuses on the QlikView deployment

More information

Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth

Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth MAKING BIG DATA COME ALIVE Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth Steve Gonzales, Principal Manager steve.gonzales@thinkbiganalytics.com

More information

Innovative technology for big data analytics

Innovative technology for big data analytics Technical white paper Innovative technology for big data analytics The HP Vertica Analytics Platform database provides price/performance, scalability, availability, and ease of administration Table of

More information

Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com

Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com Data Warehousing and Analytics Infrastructure at Facebook Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com Overview Challenges in a Fast Growing & Dynamic Environment Data Flow Architecture,

More information

Beyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc.

Beyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc. Beyond Web Application Log Analysis using Apache TM Hadoop A Whitepaper by Orzota, Inc. 1 Web Applications As more and more software moves to a Software as a Service (SaaS) model, the web application has

More information

System Requirements Table of contents

System Requirements Table of contents Table of contents 1 Introduction... 2 2 Knoa Agent... 2 2.1 System Requirements...2 2.2 Environment Requirements...4 3 Knoa Server Architecture...4 3.1 Knoa Server Components... 4 3.2 Server Hardware Setup...5

More information

hmetrix Revolutionizing Healthcare Analytics with Vertica & Tableau

hmetrix Revolutionizing Healthcare Analytics with Vertica & Tableau Powered by Vertica Solution Series in conjunction with: hmetrix Revolutionizing Healthcare Analytics with Vertica & Tableau The cost of healthcare in the US continues to escalate. Consumers, employers,

More information

Testing 3Vs (Volume, Variety and Velocity) of Big Data

Testing 3Vs (Volume, Variety and Velocity) of Big Data Testing 3Vs (Volume, Variety and Velocity) of Big Data 1 A lot happens in the Digital World in 60 seconds 2 What is Big Data Big Data refers to data sets whose size is beyond the ability of commonly used

More information

Sisense. Product Highlights. www.sisense.com

Sisense. Product Highlights. www.sisense.com Sisense Product Highlights Introduction Sisense is a business intelligence solution that simplifies analytics for complex data by offering an end-to-end platform that lets users easily prepare and analyze

More information

How, What, and Where of Data Warehouses for MySQL

How, What, and Where of Data Warehouses for MySQL How, What, and Where of Data Warehouses for MySQL Robert Hodges CEO, Continuent. Introducing Continuent The leading provider of clustering and replication for open source DBMS Our Product: Continuent Tungsten

More information

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect on AWS Services Overview Bernie Nallamotu Principle Solutions Architect \ So what is it? When your data sets become so large that you have to start innovating around how to collect, store, organize, analyze

More information

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344 Where We Are Introduction to Data Management CSE 344 Lecture 25: DBMS-as-a-service and NoSQL We learned quite a bit about data management see course calendar Three topics left: DBMS-as-a-service and NoSQL

More information

Student Project 1 - Explorative Data Analysis with Hadoop and Spark

Student Project 1 - Explorative Data Analysis with Hadoop and Spark Student Project 1 - Explorative Data Analysis with Hadoop and Spark 42matters is a rapidly growing start up, leading the development of next generation mobile user modeling technology. Our solutions are

More information

Apache Kylin Introduction Dec 8, 2014 @ApacheKylin

Apache Kylin Introduction Dec 8, 2014 @ApacheKylin Apache Kylin Introduction Dec 8, 2014 @ApacheKylin Luke Han Sr. Product Manager lukhan@ebay.com @lukehq Yang Li Architect & Tech Leader yangli9@ebay.com Agenda What s Apache Kylin? Tech Highlights Performance

More information

Exadata for Oracle DBAs. Longtime Oracle DBA

Exadata for Oracle DBAs. Longtime Oracle DBA Exadata for Oracle DBAs Longtime Oracle DBA Why this Session? I m an Oracle DBA Familiar with RAC, 11gR2 and ASM About to become a Database Machine Administrator (DMA) How much do I have to learn? How

More information

Session 1: IT Infrastructure Security Vertica / Hadoop Integration and Analytic Capabilities for Federal Big Data Challenges

Session 1: IT Infrastructure Security Vertica / Hadoop Integration and Analytic Capabilities for Federal Big Data Challenges Session 1: IT Infrastructure Security Vertica / Hadoop Integration and Analytic Capabilities for Federal Big Data Challenges James Campbell Corporate Systems Engineer HP Vertica jcampbell@vertica.com Big

More information

Resource Sizing: Spotfire for AWS

Resource Sizing: Spotfire for AWS Resource Sizing: for AWS With TIBCO for AWS, you can have the best in analytics software available at your fingertips in just a few clicks. On a single Amazon Machine Image (AMI), you get a multi-user

More information

Oracle Database 12c Plug In. Switch On. Get SMART.

Oracle Database 12c Plug In. Switch On. Get SMART. Oracle Database 12c Plug In. Switch On. Get SMART. Duncan Harvey Head of Core Technology, Oracle EMEA March 2015 Safe Harbor Statement The following is intended to outline our general product direction.

More information

Database Design Patterns. Winter 2006-2007 Lecture 24

Database Design Patterns. Winter 2006-2007 Lecture 24 Database Design Patterns Winter 2006-2007 Lecture 24 Trees and Hierarchies Many schemas need to represent trees or hierarchies of some sort Common way of representing trees: An adjacency list model Each

More information

Search Big Data with MySQL and Sphinx. Mindaugas Žukas www.ivinco.com

Search Big Data with MySQL and Sphinx. Mindaugas Žukas www.ivinco.com Search Big Data with MySQL and Sphinx Mindaugas Žukas www.ivinco.com Agenda Big Data Architecture Factors and Technologies MySQL and Big Data Sphinx Search Server overview Case study: building a Big Data

More information

2010 Ingres Corporation. Interactive BI for Large Data Volumes Silicon India BI Conference, 2011, Mumbai Vivek Bhatnagar, Ingres Corporation

2010 Ingres Corporation. Interactive BI for Large Data Volumes Silicon India BI Conference, 2011, Mumbai Vivek Bhatnagar, Ingres Corporation Interactive BI for Large Data Volumes Silicon India BI Conference, 2011, Mumbai Vivek Bhatnagar, Ingres Corporation Agenda Need for Fast Data Analysis & The Data Explosion Challenge Approaches Used Till

More information

An Architectural Review Of Integrating MicroStrategy With SAP BW

An Architectural Review Of Integrating MicroStrategy With SAP BW An Architectural Review Of Integrating MicroStrategy With SAP BW Manish Jindal MicroStrategy Principal HCL Objectives To understand how MicroStrategy integrates with SAP BW Discuss various Design Options

More information

Analyzing Big Data with AWS

Analyzing Big Data with AWS Analyzing Big Data with AWS Peter Sirota, General Manager, Amazon Elastic MapReduce @petersirota What is Big Data? Computer generated data Application server logs (web sites, games) Sensor data (weather,

More information

Delivering Intelligence to Publishers Through Big Data

Delivering Intelligence to Publishers Through Big Data Delivering Intelligence to Publishers Through Big Data 2015-05- 21 Jonathan Sharley Team Lead, Data Operations www.sovrn.com Who is Sovrn? Ø An advertising network with direct relationships to 20,000+

More information

Mark Bennett. Search and the Virtual Machine

Mark Bennett. Search and the Virtual Machine Mark Bennett Search and the Virtual Machine Agenda Intro / Business Drivers What to do with Search + Virtual What Makes Search Fast (or Slow!) Virtual Platforms Test Results Trends / Wrap Up / Q & A Business

More information

Enabling Database-as-a-Service (DBaaS) within Enterprises or Cloud Offerings

Enabling Database-as-a-Service (DBaaS) within Enterprises or Cloud Offerings Solution Brief Enabling Database-as-a-Service (DBaaS) within Enterprises or Cloud Offerings Introduction Accelerating time to market, increasing IT agility to enable business strategies, and improving

More information

Moving From Hadoop to Spark

Moving From Hadoop to Spark + Moving From Hadoop to Spark Sujee Maniyam Founder / Principal @ www.elephantscale.com sujee@elephantscale.com Bay Area ACM meetup (2015-02-23) + HI, Featured in Hadoop Weekly #109 + About Me : Sujee

More information

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets

More information

The Arts & Science of Tuning HANA models for Performance. Abani Pattanayak, SAP HANA CoE Nov 12, 2015

The Arts & Science of Tuning HANA models for Performance. Abani Pattanayak, SAP HANA CoE Nov 12, 2015 The Arts & Science of Tuning HANA models for Performance Abani Pattanayak, SAP HANA CoE Nov 12, 2015 Disclaimer This presentation outlines our general product direction and should not be relied on in making

More information

HP Vertica and MicroStrategy 10: a functional overview including recommendations for performance optimization. Presented by: Ritika Rahate

HP Vertica and MicroStrategy 10: a functional overview including recommendations for performance optimization. Presented by: Ritika Rahate HP Vertica and MicroStrategy 10: a functional overview including recommendations for performance optimization Presented by: Ritika Rahate MicroStrategy Data Access Workflows There are numerous ways for

More information

KNIME & Avira, or how I ve learned to love Big Data

KNIME & Avira, or how I ve learned to love Big Data KNIME & Avira, or how I ve learned to love Big Data Facts about Avira (AntiVir) 100 mio. customers Extreme Reliability 500 employees (Tettnang, San Francisco, Kuala Lumpur, Bucharest, Amsterdam) Company

More information

INTRODUCING DRUID: FAST AD-HOC QUERIES ON BIG DATA MICHAEL DRISCOLL - CEO ERIC TSCHETTER - LEAD ARCHITECT @ METAMARKETS

INTRODUCING DRUID: FAST AD-HOC QUERIES ON BIG DATA MICHAEL DRISCOLL - CEO ERIC TSCHETTER - LEAD ARCHITECT @ METAMARKETS INTRODUCING DRUID: FAST AD-HOC QUERIES ON BIG DATA MICHAEL DRISCOLL - CEO ERIC TSCHETTER - LEAD ARCHITECT @ METAMARKETS MICHAEL E. DRISCOLL CEO @ METAMARKETS - @MEDRISCOLL Metamarkets is the bridge from

More information

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved. Preview of Oracle Database 12c In-Memory Option 1 The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any

More information

Parquet. Columnar storage for the people

Parquet. Columnar storage for the people Parquet Columnar storage for the people Julien Le Dem @J_ Processing tools lead, analytics infrastructure at Twitter Nong Li nong@cloudera.com Software engineer, Cloudera Impala Outline Context from various

More information

The Technology Evaluator s Cheat Sheets. Business Intelligence & Analy:cs

The Technology Evaluator s Cheat Sheets. Business Intelligence & Analy:cs The Technology Evaluator s Cheat Sheets Business Intelligence & Analy:cs Summary So1ware Stacks Full Stacks (DB + ETL Tools + Front- End So1ware) Back- End Stacks (DB and/or ETL Tools Only) Front- End

More information

Big Data Analytics Platform @ Nokia

Big Data Analytics Platform @ Nokia Big Data Analytics Platform @ Nokia 1 Selecting the Right Tool for the Right Workload Yekesa Kosuru Nokia Location & Commerce Strata + Hadoop World NY - Oct 25, 2012 Agenda Big Data Analytics Platform

More information

Tiber Solutions. Understanding the Current & Future Landscape of BI and Data Storage. Jim Hadley

Tiber Solutions. Understanding the Current & Future Landscape of BI and Data Storage. Jim Hadley Tiber Solutions Understanding the Current & Future Landscape of BI and Data Storage Jim Hadley Tiber Solutions Founded in 2005 to provide Business Intelligence / Data Warehousing / Big Data thought leadership

More information

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give

More information

America s Most Wanted a metric to detect persistently faulty machines in Hadoop

America s Most Wanted a metric to detect persistently faulty machines in Hadoop America s Most Wanted a metric to detect persistently faulty machines in Hadoop Dhruba Borthakur and Andrew Ryan dhruba,andrewr1@facebook.com Presented at IFIP Workshop on Failure Diagnosis, Chicago June

More information

Monitor and Manage Your MicroStrategy BI Environment Using Enterprise Manager and Health Center

Monitor and Manage Your MicroStrategy BI Environment Using Enterprise Manager and Health Center Monitor and Manage Your MicroStrategy BI Environment Using Enterprise Manager and Health Center Presented by: Dennis Liao Sales Engineer Zach Rea Sales Engineer January 27 th, 2015 Session 4 This Session

More information

SAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013

SAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013 SAP HANA SAP s In-Memory Database Dr. Martin Kittel, SAP HANA Development January 16, 2013 Disclaimer This presentation outlines our general product direction and should not be relied on in making a purchase

More information

Bringing Big Data to People

Bringing Big Data to People Bringing Big Data to People Microsoft s modern data platform SQL Server 2014 Analytics Platform System Microsoft Azure HDInsight Data Platform Everyone should have access to the data they need. Process

More information

A Comparison of Approaches to Large-Scale Data Analysis

A Comparison of Approaches to Large-Scale Data Analysis A Comparison of Approaches to Large-Scale Data Analysis Sam Madden MIT CSAIL with Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel Abadi, David DeWitt, and Michael Stonebraker In SIGMOD 2009 MapReduce

More information

Using Hadoop to Expand Data Warehousing

Using Hadoop to Expand Data Warehousing Using Hadoop to Expand Data Warehousing Mike Peterson VP of Platforms and Data Architecture, Neustar Feb 28, 2013 1 Copyright Think Big Analytics and Neustar Inc. Why do this? Transforming to an Information

More information

Bringing Big Data Modelling into the Hands of Domain Experts

Bringing Big Data Modelling into the Hands of Domain Experts Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks david.willingham@mathworks.com.au 2015 The MathWorks, Inc. 1 Data is the sword of the

More information

WINDOWS AZURE DATA MANAGEMENT AND BUSINESS ANALYTICS

WINDOWS AZURE DATA MANAGEMENT AND BUSINESS ANALYTICS WINDOWS AZURE DATA MANAGEMENT AND BUSINESS ANALYTICS Managing and analyzing data in the cloud is just as important as it is anywhere else. To let you do this, Windows Azure provides a range of technologies

More information

Open source Google-style large scale data analysis with Hadoop

Open source Google-style large scale data analysis with Hadoop Open source Google-style large scale data analysis with Hadoop Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory School of Electrical

More information

Data Warehouse and Business Intelligence Testing: Challenges, Best Practices & the Solution

Data Warehouse and Business Intelligence Testing: Challenges, Best Practices & the Solution Warehouse and Business Intelligence : Challenges, Best Practices & the Solution Prepared by datagaps http://www.datagaps.com http://www.youtube.com/datagaps http://www.twitter.com/datagaps Contact contact@datagaps.com

More information

Microsoft Analytics Platform System. Solution Brief

Microsoft Analytics Platform System. Solution Brief Microsoft Analytics Platform System Solution Brief Contents 4 Introduction 4 Microsoft Analytics Platform System 5 Enterprise-ready Big Data 7 Next-generation performance at scale 10 Engineered for optimal

More information

Integrating Apache Spark with an Enterprise Data Warehouse

Integrating Apache Spark with an Enterprise Data Warehouse Integrating Apache Spark with an Enterprise Warehouse Dr. Michael Wurst, IBM Corporation Architect Spark/R/Python base Integration, In-base Analytics Dr. Toni Bollinger, IBM Corporation Senior Software

More information

Jun Liu, Senior Software Engineer Bianny Bian, Engineering Manager SSG/STO/PAC

Jun Liu, Senior Software Engineer Bianny Bian, Engineering Manager SSG/STO/PAC Jun Liu, Senior Software Engineer Bianny Bian, Engineering Manager SSG/STO/PAC Agenda Quick Overview of Impala Design Challenges of an Impala Deployment Case Study: Use Simulation-Based Approach to Design

More information

Tuning Microsoft SQL Server for SharePoint. Daniel Glenn

Tuning Microsoft SQL Server for SharePoint. Daniel Glenn Tuning Microsoft SQL Server for SharePoint Daniel Glenn Daniel Glenn @DanielGlenn http://knowsp.com SharePoint and Collaboration Practice Leader @ InfoWorks, Inc. www.infoworks-tn.com PASS Nashville Business

More information

Data Warehouse in the Cloud Marketing or Reality? Alexei Khalyako Sr. Program Manager Windows Azure Customer Advisory Team

Data Warehouse in the Cloud Marketing or Reality? Alexei Khalyako Sr. Program Manager Windows Azure Customer Advisory Team Data Warehouse in the Cloud Marketing or Reality? Alexei Khalyako Sr. Program Manager Windows Azure Customer Advisory Team Data Warehouse we used to know High-End workload High-End hardware Special know-how

More information

Lecture Data Warehouse Systems

Lecture Data Warehouse Systems Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches in DW NoSQL and MapReduce Stonebraker on Data Warehouses Star and snowflake schemas are a good idea in the DW world C-Stores

More information

Open Source Technologies on Microsoft Azure

Open Source Technologies on Microsoft Azure Open Source Technologies on Microsoft Azure A Survey @DChappellAssoc Copyright 2014 Chappell & Associates The Main Idea i Open source technologies are a fundamental part of Microsoft Azure The Big Questions

More information

Hadoop & its Usage at Facebook

Hadoop & its Usage at Facebook Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System dhruba@apache.org Presented at the The Israeli Association of Grid Technologies July 15, 2009 Outline Architecture

More information

Open source large scale distributed data management with Google s MapReduce and Bigtable

Open source large scale distributed data management with Google s MapReduce and Bigtable Open source large scale distributed data management with Google s MapReduce and Bigtable Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory

More information

Bringing Big Data into the Enterprise

Bringing Big Data into the Enterprise Bringing Big Data into the Enterprise Overview When evaluating Big Data applications in enterprise computing, one often-asked question is how does Big Data compare to the Enterprise Data Warehouse (EDW)?

More information

Realtime Apache Hadoop at Facebook. Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens

Realtime Apache Hadoop at Facebook. Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens Realtime Apache Hadoop at Facebook Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens Agenda 1 Why Apache Hadoop and HBase? 2 Quick Introduction to Apache HBase 3 Applications of HBase at

More information

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc. Oracle BI EE Implementation on Netezza Prepared by SureShot Strategies, Inc. The goal of this paper is to give an insight to Netezza architecture and implementation experience to strategize Oracle BI EE

More information

How to Leverage Cloud to Quickly Build Scalable Applications

How to Leverage Cloud to Quickly Build Scalable Applications How to Leverage Cloud to Quickly Build Scalable Applications Chris Keyser Principal Solution Architect David Polley Senior Director Cloud Product Management Cloud Growth Recent IDC cloud research shows

More information

Data processing goes big

Data processing goes big Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,

More information

Increasing XenServer s VM density

Increasing XenServer s VM density Increasing XenServer s VM density Jonathan Davies, XenServer System Performance Lead XenServer Engineering, Citrix Cambridge, UK 24 Oct 2013 Jonathan Davies (Citrix) Increasing XenServer s VM density 24

More information

Replicating to everything

Replicating to everything Replicating to everything Featuring Tungsten Replicator A Giuseppe Maxia, QA Architect Vmware About me Giuseppe Maxia, a.k.a. "The Data Charmer" QA Architect at VMware Previously at AB / Sun / 3 times

More information

Big Data Approaches. Making Sense of Big Data. Ian Crosland. Jan 2016

Big Data Approaches. Making Sense of Big Data. Ian Crosland. Jan 2016 Big Data Approaches Making Sense of Big Data Ian Crosland Jan 2016 Accelerate Big Data ROI Even firms that are investing in Big Data are still struggling to get the most from it. Make Big Data Accessible

More information

Google Analytics and Google Analytics Premium: limits and quotas

Google Analytics and Google Analytics Premium: limits and quotas Table Of Contents Data collection & Processing limits Accounts and Profiles Reports Admin Area Google Analytics data fields Lengths Google Analytics API Data collection & Processing limits 10 million hits

More information

Testing Big data is one of the biggest

Testing Big data is one of the biggest Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing

More information

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct

More information

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014 BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014 Ralph Kimball Associates 2014 The Data Warehouse Mission Identify all possible enterprise data assets Select those assets

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

Inge Os Sales Consulting Manager Oracle Norway

Inge Os Sales Consulting Manager Oracle Norway Inge Os Sales Consulting Manager Oracle Norway Agenda Oracle Fusion Middelware Oracle Database 11GR2 Oracle Database Machine Oracle & Sun Agenda Oracle Fusion Middelware Oracle Database 11GR2 Oracle Database

More information

College of Engineering, Technology, and Computer Science

College of Engineering, Technology, and Computer Science College of Engineering, Technology, and Computer Science Design and Implementation of Cloud-based Data Warehousing In partial fulfillment of the requirements for the Degree of Master of Science in Technology

More information