BIG DATA ANALYTICS. Vishy Venugopalan

Similar documents

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

HDP Enabling the Modern Data Architecture

The little elephant driving Big Data

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

The Inside Scoop on Hadoop

Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?

Big Data at Cloud Scale

Big Data Market Size and Vendor Revenues

HDP Hadoop From concept to deployment.

Microsoft Big Data. Solution Brief

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Big Data and Data Science: Behind the Buzz Words

BIG DATA TRENDS AND TECHNOLOGIES

Tap into Hadoop and Other No SQL Sources

The Potential of Big Data in the Cloud. Juan Madera Technology Consultant

Bringing Big Data to People

SELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM

The Digital Enterprise Demands a Modern Integration Approach. Nada daveiga, Sr. Dir. of Technical Sales Tony LaVasseur, Territory Leader

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

Native Connectivity to Big Data Sources in MSTR 10

Il mondo dei DB Cambia : Tecnologie e opportunita`

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

White Paper: Datameer s User-Focused Big Data Solutions

Building Your Big Data Team

Information Builders Mission & Value Proposition

BIG DATA USING HADOOP

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Big Data and Industrial Internet

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012

W H I T E P A P E R. Building your Big Data analytics strategy: Block-by-Block! Abstract

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Big Data. Lyle Ungar, University of Pennsylvania

Big Data and Hadoop for the Executive A Reference Guide

Next-Generation Cloud Analytics with Amazon Redshift

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld

WHITE PAPER. Four Key Pillars To A Big Data Management Solution

Big Data Multi-Platform Analytics (Hadoop, NoSQL, Graph, Analytical Database)

Big Analytics: A Next Generation Roadmap

Understanding How Sensage Compares/Contrasts with Hadoop

BIG DATA SOLUTION DATA SHEET

Create and Drive Big Data Success Don t Get Left Behind

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Hadoop. Sunday, November 25, 12

Understanding Your Customer Journey by Extending Adobe Analytics with Big Data

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Hadoop Market - Global Industry Analysis, Size, Share, Growth, Trends, and Forecast,

An Integrated Big Data & Analytics Infrastructure June 14, 2012 Robert Stackowiak, VP Oracle ESG Data Systems Architecture

The Future of Data Management

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

DATAMEER WHITE PAPER. Beyond BI. Big Data Analytic Use Cases

Step by Step: Big Data Technology. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 25 August 2015

Big data for the Masses The Unique Challenge of Big Data Integration

BITKOM& NIK - Big Data Wo liegen die Chancen für den Mittelstand?

Big Data Architectures. Tom Cahill, Vice President Worldwide Channels, Jaspersoft

Addressing Open Source Big Data, Hadoop, and MapReduce limitations

Actian SQL in Hadoop Buyer s Guide

Introduction to Apache Cassandra

Ubuntu and Hadoop: the perfect match

Customized Report- Big Data

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

How Big Is Big Data Adoption? Survey Results. Survey Results Big Data Company Strategy... 6

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

Data Analytics Infrastructure

Data Warehouse design

Microsoft Big Data Solutions. Anar Taghiyev P-TSP

IBM BigInsights for Apache Hadoop

Big Data: Beyond the Hype. Why Big Data Matters to You. White Paper

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Big Data Technologies Compared June 2014

Hadoop & Big Data Market [Hardware, Software, Services, Hadoop-as-a- Service] - Trends, Geographical Analysis & Worldwide Market Forecasts ( )

Introducing the Reimagined Power BI Platform. Jen Underwood, Microsoft

IBM InfoSphere BigInsights Enterprise Edition

How To Handle Big Data With A Data Scientist

BIG DATA What it is and how to use?

Big Data Analytics Nokia

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

Cost-Effective Business Intelligence with Red Hat and Open Source

VIEWPOINT. High Performance Analytics. Industry Context and Trends

Big Data and Apache Hadoop Adoption:

Transcription:

+ BIG DATA ANALYTICS Vishy Venugopalan

+ AGENDA n Introduction: The Age of Big Data n The Analytics Adoption Curve n The New Data Stack n Opportunities in the Big Data Analytics Market n Investment Candidates

+ WE LIVE IN THE AGE OF BIG DATA n IDC: Worldwide Big Data market (excluding infrastructure and storage) projected to be a $16.5bn market in 2015, growing at a 40% CAGR between 2010-15. Source: The Economist, Data Data Everywhere, Feb 2010 n McKinsey: 15 out of 17 sectors in the US economy have more data stored per company than the US Library of Congress.

+ THE TRADITIONAL DATA STACK Applications Transactions and analytics (OLTP/OLAP) Business intelligence Data management Infrastructure Row storage Columnar storage Hardware Disk Solid-state devices

+ THE ANALYTICS ADOPTION CURVE EARLY GROWTH MATURE Who drives data analyses? Engineers Technicallyoriented business analysts All business analysts Type of analyses conducted Custom-built, high-touch Simple, selfservice Complex, selfservice, ad hoc Analysis tools used Programming languages Query languages Visual, drag and drop tools

+ THE TRADITIONAL DATA STACK IS FACING CHALLENGES n Not built for petabyte scale, for semi-structured data or realtime data n Relational databases are being complemented by NoSQL databases and alternative storage technologies n Hadoop: open source community + commercial innovation is building a parallel data stack that overcomes these limitations n Pioneered at Internet scale companies (Google, Yahoo, Amazon)

+ THE NEW DATA STACK Applications Infrastructure Query+Analytics (Hive, Pig) DB management (Zookeeper) Distributed file system (HDFS) Full-text search capabilities (Solr) NoSQL /alt. storage Hardware Distributed storage Solid-state devices

+ OPPORTUNITIES IN THE BIG DATA ANALYTICS MARKET EARLY GROWTH MATURE Who drives data analyses? Engineers Technicallyoriented business analysts All business analysts Toughest challenges Workflow and coordination Analysis tools for standalone data Integrating disparate data sources Startups to watch: Short-term: startups offering platforms that address the workflow, coordination and handoff problems Medium-to-long term: startups that provide effective tools for selfservice analyses and integration with traditional data stack

+ INVESTMENT CANDIDATES Seed/Bootstrapped Seeking Series A Post-Series A

+ THE BIG PLAYERS ARE UNSURE OF THE WAY FORWARD n The data and analytics stack is undergoing a generational shift. Big Data represents a new kind of product (petabytescale) running on a new kind of infrastructure (cloud-scale). n For now, major data players IBM, Oracle, Microsoft are making partnerships and formulating strategies for the world of Big Data. n From a product perspective, changes are akin to platform shifts from mainframe to PC, or more recently, the systems management shift from physical servers to virtual servers.

+ CONCLUSION n We are in the age of Big Data, where the amount of data generated by businesses and consumers is unprecedented. n The mainstream data stack today, particularly the Business Intelligence subsegment, is built for datasets of the 90s and is ripe for change. n Internet-scale companies were the first to notice this problem. Their efforts seeded a new data stack. n In the short term, startups that solve the workflow and coordination problems are attractive investment candidates; in the longer term, tools and data integration will produce winners.

+ APPENDIX Detailed individual summaries of companies

+ MORTAR DATA (Boston, MA) VALUE PROPOSITION USER & CUSTOMER PROFILE USE CASES COMPETITION STATUS TEAM Easy browser-based environment to run Hadoop jobs on data that lives in the cloud. Business analyst at an SMB Finding patterns in clickstream data, log data. Requires Hadoop jobs written in a consumable manner by developers. Amazon Elastic MapReduce at the basic level; Hapyrus; StackIQ Founded Aug 2011. Seed stage. Raising $450K ($110 committed). Just started TechStars Boston 3 employees. All technical. Met at university. Worked at Wireless Generation together.

+ DEMYST.DATA (New York, NY) VALUE PROPOSITION USER & CUSTOMER PROFILE USE CASES COMPETITION A way to predict consumer credit risk profiles from unstructured data available all over the Web. Credit underwriter at a prepaid card issuer, check casher, payday loans provider An alternative to FICO scores, our algorithm picks 2-3 attributes of an individual s online presence relevant to their credit risk. TransUnion, Equifax etc; Limited functional overlap (but no customer overlap) with Palantir STATUS F&F funded. Series A raise in Q412. TEAM Two Columbia MBA grads. One of them is ex-lexis Nexis.

+ HADAPT (Cambridge, MA) VALUE PROPOSITION USER & CUSTOMER PROFILE USE CASES COMPETITION STATUS TEAM An analytic database that enables SQL queries against Hadoop data. Anyone who uses a data warehouse. Finding patterns in unstructured data that lives in a database (e.g. BLOBs); eventually, integrating unstructured and structured data in one warehouse Apache Hive; Vertica (only for structured data queries) Late beta. 10/11: $9.5m Series A (Norwest, BVP); Series B in early 13. CTO worked at MIT on C-STOR, which later became Vertica. Now a Yale professor. Management comes from Endeca, Aster Data etc.

+ MAPR TECHNOLOGIES (San Jose, CA) VALUE PROPOSITION USER & CUSTOMER PROFILE USE CASES COMPETITION STATUS TEAM Enterprise-class Hadoop distribution with proprietary extensions Data scientists and software engineers at large and small organizations. Processing semi-structured data using Hadoop. Integrates easily with existing enterprise storage (NAS clusters etc). Allows stream-based processing. Cloudera, HortonWorks, Apache Hadoop 8/2011: Series B $20m (Redpoint, Lightspeed, NEA) CTO headed up Google BigTable group. Founded fast clustered NAS startup before.

+ ZETTASET (Mountain View, CA) VALUE PROPOSITION USER & CUSTOMER PROFILE USE CASES COMPETITION STATUS TEAM Enterprise-class Hadoop management and deployment tools Data scientists and software engineers at large and small organizations. Processing semi-structured data using Hadoop. Has a particular risk management and information governance focus. MapR, Cloudera, HortonWorks, Apache Hadoop 4/2011: Series A $3m (DFJ, Epic Ventures) 15 employees (12 technical). Founder founded SPI Dynamics, web application security software (acq by HP)

+ HSTREAMING (Chicago, IL) VALUE PROPOSITION USER & CUSTOMER PROFILE USE CASES COMPETITION Stream-based processing for Hadoop, similar to Complex Event Processing Data scientists and software engineers at Fortune 500 organizations. Over 20 customers at this time. Processing semi-structured data using Hadoop. Has a particular risk management and information governance focus. IBM InfoSphere, Microsoft StreamInsight, StreamBase, S4, Storm STATUS Self-funded so far. Raising Series A. TEAM 3 employees (2 technical). One of the founders worked on similar product at IBM.

+ RADOOP (Budapest, Hungary) VALUE PROPOSITION USER & CUSTOMER PROFILE USE CASES COMPETITION STATUS TEAM Graphical data mining interface (a la RapidMiner) for Hadoop data stores Code free product, used by business analysts at large+small Hadoop shops Requires a Hadoop cluster at the moment. However, GA product will provide them log reduction and analytics tools without exposing Hadoop. Datameer, Karmasphere, Splunk, RapidMiner Private beta. Self-funded. 1000 beta users. 6 engineers (all technical). Recent PhD candidates in computer science from Hungary

+ HAPYRUS (Palo Alto, CA) VALUE PROPOSITION USER & CUSTOMER PROFILE USE CASES COMPETITION STATUS TEAM Browser-based solution that enables more effective collaboration and workflow between engineers and data analysts analyzing large datasets. Companies with rapidly growing data that already lives in the cloud. Ideally uses S3 and Elastic MapReduce. Engineers can write templated Hadoop jobs in which business analysts can change parameters and perform Datameer, Apache Hive $700K from 500 Startups and Japanese angel investors. Next round in 2013. 3 employees (2 technical).

+ COGNIER (Santa Clara, CA) VALUE PROPOSITION USER & CUSTOMER PROFILE USE CASES COMPETITION STATUS TEAM Solution for analyzing timestamped, semi-structured data Business analysts with limited technical backgrounds, who want to graphically visualize analyses Analyze unusual variations in the data, particularly over time. E- commerce, SaaS and mobile app customers are most common. Web analytics (Google Analytics, TeaLeaf); BI companies (Cognos, Business Objects); Splunk Bootstrapped. 3 months from GA. Looking for Series A in late 2012. 3 employees, ex-stratify (ediscovery startup acq by Autonomy)

+ KAGGLE (San Francisco, CA) VALUE PROPOSITION USER & CUSTOMER PROFILE USE CASES COMPETITION STATUS TEAM Statistical outsourcing platform for modeling and prediction competitions. Turns data science into a sport. Organizations of all sizes with limited resources (talent or infrastructure) to analyze large datasets internally Anyone can post a competition on Kaggle with a well-defined objective and a prize for the IP behind solution. (indirect) Crowdflower, Innocentive, TekScout Nov 2011: $11m Series A by Index Ventures and Khosla Ventures. Max Levchin, Hal Varian are also investors. Under 10 employees. Founded by Australian data scientists.

+ TRESATA (Charlotte, NC) VALUE PROPOSITION USER & CUSTOMER PROFILE USE CASES COMPETITION STATUS TEAM Bringing the power of Hadoop to financial industry data (structured and unstructured). Available onpremise or on the cloud. Retail and institutional financial services customers. Massively parallel analytics correlating own data and public data: financial and non-financial (e.g. social). Note: certain data can be provided by Tresata s own partners. Datameer, Palantir $1.5m in seed and angel financing. <10 employees. Founders are ex- Bank of America.

+ PLATFORA (San Mateo, CA) VALUE PROPOSITION USER & CUSTOMER PROFILE USE CASES COMPETITION STATUS TEAM Platform offering interactive business intelligence reports that are translated on the fly into scalable, parallel Hadoop jobs. Visualization in the form of dashboards and reports. Business-facing data analysts at companies with large datasets: Internet/e-commerce, telecom, logistics, finance Any currently fulfilled by traditional data warehouses, BI and ETL tools. Datameer, Apache Hive Series A $7.2m by Andreessen- Horowitz, Sutter Hill Ventures, In-Q- Tel 10 employees. Founder/CEO is ex- Greenplum.