Analytics. For Anyone. Be Heroic Turn Data into Action
Progressive businesses must accelerate time-to-value not only to thrive, but survive. Analytics on big data is no longer just a competitive advantage. It s a Business Requirement. 2
Built by data scientists for data scientists, businesses analysts, and developers. RapidMiner is the industry's easiest-to-use Modern Analytics Platform that significantly accelerates productivity from data prep to predictive action. Unlike traditional analytics providers, RapidMiner enables anyone to make the most of all data in all environments, creating a powerful advantage from the wisdom of over 250,000 users. 3
The Analytics Spectrum SQL Analytics Descriptive Statistics Data Mining Predictive Analytics Simulation Optimization Count Mean Univariate distribution Central tendency Dispersion Association rules Clustering Feature extraction Classification Regression Time series Text Spatial Machine learning Monte Carlo Agent based modeling Discrete event modeling Linear optimization Non-linear optimization Business Intelligence Advanced Analytics 4
Key Trends & Drivers in Modern Analytics Market Forces Internet of Things Consumerization Mass Personalization Modern Business Accelerate time-to-value Maximize business value Simplify getting to value Technology Big Data New compute engines Cloud 5
Evolving Advanced Analytics Market Limitless Big Data New compute engines Cloud T Traditional Limitations Limited handling of variety of data source Legacy compute engines On-premises, if not offline Modern 6
Advanced Analytics Market Maturity High-velocity innovation Lagging innovation T Traditional Modern 7
Traditional vs. Modern Analytics Market Challengers Leaders Magic Quadrant for Advanced Analytics Platforms February 2014 StatSoft SAS RapidMiner IBM Angoss SAP Knime Oracle Megaputer FICO Revolution Analytics Microsoft InfoCentricity Alpine Data Labs Alteryx Ability to execute Actuate Niche players Visionaries Completeness of vision 8
Skills Gap in the Modern Analytics Market Computer Science Searching for Unicorns McKinsey projects by 2018 there will be a shortage of 1.7M professionals with analytics expertise in the U.S. alone. Domain Expertise + + + Data Scientist + Statistician Actuarial Quant Math Data Science Skills Gap in 2018 (McKinsey 2012) 9
Unlocking Value with Modern Analytics Computer Science Business Analysts Next Generation Data Scientist (aka: the hero you are looking for) Domain Expertise + + + Data Scientist + Statistician Actuarial Quant Math 10
Enter RapidMiner. Analytics. For Anyone. Accelerate Pre-Built Models One-Click Deployments Connect All Data All Environments Simplify Code Free Wisdom of Crowds 11
Wisdom of Crowds How do we create data science heroes? 2 Store them in a knowledge base of analytic best practices 1 Anonymously collect analytic models from analysts across the enterprise 3 Use machine learning algorithms to recommend and empower any user at any skill level to become a data science hero 12
RapidMiner Modern Analytics Platform RapidMiner Studio Code free design your analytics using 1500+ operators Studio Design Business Analysts Code Free GUI Orchestrate Engine Data Scientists Consume Business Users Web App Biz App BI Machine Custom App Viz RapidMiner Radoop Push down computations to where your data lives RapidMiner Streams Analyze streaming data while in motion Studio Engine In-Memory Radoop Streams Cloud Server Engine Engine Engine Engine Engine Engine Web Services API Compute In-Hadoop In-Database In-Stream RapidMiner Cloud Elastic compute environment for high performance analytics RapidMiner Server Enterprise analytics environment for integration with business processes 13
RapidMiner Radoop Architecture Studio Server Code free design in RapidMiner with 70+ Operators VISUAL DEVELOPMENT Radoop Data Integration Data Discovery Data Prep Model Building Model Validation Model Scoring PROGRAMMING CODE Hive (SQL) Hadoop environment Pig (Scripting) MapReduce Mahout (Machine Learning) HDFS Impala (In-memory SQL) YARN Spark (MLib) One-click push down to Hadoop environment Optimized distributed execution in Hadoop environment 14
RapidMiner Streams Architecture Studio Server Code free design in RapidMiner leveraging 1500+ Operators VISUAL DEVELOPMENT Streams Data Integration Data Prep Model Scoring Application push Message broker Apache Kafka or Amazon SQS pull deploy process as topology Apache Storm cluster Storm Topology Spout Node Bolt Node Node Engine Bolt Bolt monitor and manage Bolt Bolt store One-click push down to Storm environment Distributed execution in Storm environment Node Application pull Storm Topology Streams Engine Cassandra MongoDB Redis SOLR 15
RapidMiner Server Architecture Web App Machine Web Services API Integrate analytic models into any type of application RESTful API Server Biz App BI Custom App Viz App Designer Build web based predictive reports Create ad hoc reports Build predictive apps without coding Embed app into business process Web Services API App Designer Shared Repository User Management Scheduler Engine Shared Repository Collaborate on analytic processes Share data, processes and models User Management Create and manage users, roles and access rights Scheduler Execute analytics at certain times or automate repeating execution patterns Engine High-performance compute engine for distributed and remote work 16
RapidMiner Modern Analytics Flow RapidMiner Data Sources Compute Engines Model Building & Scoring Model Deployment Model Consumption Work with any data, from any source Work in any environment, at any time Data Integration, Discovery & Preparation Model Building, Validation & Scoring Deploy models any way you want Embed your insights and take action 17
One Platform To Rule Them All Model Building and Scoring (1500+ operators, 200+ community contributed operators) Data Integration Data Discovery Data Preparation Model Building Model Validation Model Scoring 50+ data connectors with access to 100+ sources including 40+ file types Any data type Structured Semi-structured Unstructured Binary 700+ data parsing, data blending, data cleansing, transformations, aggregations, set operations, rotations,filtering, outlier detection, value type transformations, feature creation, window functions, feature extraction 20+ process control structures 25+ interactive data visualizations including: Data tables Scatter matrices Bubble charts Parallel coordinates Deviation plots 3-D scatter plots Density plots Histograms Survey plots Andrews curve Quartile Pareto charts Network & tree visualizations 30+ image format exports 45+ feature selection automatic & manual 20+ missing value replacement & Imputation 96+ feature creation automatic & manual 20+ anomaly & outlier detection 60+ dimension reduction / feature selection 20+ segmentation & clustering 80+ processing & feature extraction from unstructured data 25+ statistical 250+ machine learning 200+ association mining, frequent item set, similarity computation, feature weighting 10+ ensemble and hierarchical models 10+ model and parameter optimization Automatic model fitting Integration of 3 rd party analytics, optimization solvers or simulations tools 10+ cross validation 20+ visual evaluation 30+ numerical / nominal / categorical model performance criteria 10+ significance tests 5+ optimal threshold cutoff for binomial classes 5+ cluster performance measures Model scoring for all applicable model building Add l Analytics 50+ text analytics 15+ web mining 30+ image / audio / video mining 85+ time series 30+ financial & economics 18
Work with Any Data, from Any Source Data Sources (access to 450+ data sources) Flat Files Text Files Databases via JDBC or ODBC Database & Cube Queries MDX Hadoop Sources NoSQL database Cloud Data Sources Web Services Web pages Web services Mail Services POP3 IMAP 19 Logos & icons represent a partial list. Full list available upon request.
Work in Any Environment, at Any Time Compute Engines (in-memory, in-sql, in-database, in-cluster, in-hadoop, in-cloud, in-stream) In-memory In-SQL & In-database In-cluster In-Hadoop In-Cloud In-stream 20 Logos & icons represent a partial list. Full list available upon request.
Deploy Models Any Way You Want Model Deployment Model Scheduling Scheduled model execution Model Publishing Model Embedding Publishing model results via web services API into: Web services Web application Business application (ERP, CRM, Marketing Automation, etc.) Machines Streaming application Rule engines Complex event processing Business intelligence Data visualization Cloud application Custom application Embedding of model via Java API into: Any application Callable from any application Model Export PMML export 21
Embed Your Insights and Take Immediate Action Model Consumption Business Intelligence CRM Marketing Automation Cloud Applications (connecting to 300+ cloud services) ERP Custom Custom web applications, web portals 22 Logos & icons represent a partial list. Full list available upon request.
Get to Meaningful Business Value in a Snap Accelerate Drop development time from days to minutes Connect Automate data integration Simplify Make data science accessible to all 23
Design Your Analytics. Coding Not Required. Supercharge your results with +1500 analytic operators Liberate your business analysts with a code free environment Leverage the wisdom of over 250,000 users worldwide Boost your data science knowledge with interactive help 24
Machine Learning on Hadoop How do we become big data heroes??! Pushing data prep and machine learning into Hadoop clusters is complex and requires coding. Not an viable option! Push computations into Hadoop clusters from a code free environment. Heroes use RapidMiner!?! 25
Use Case Example: Churn Prevention with Hadoop Task: Separate loyal customers from customers who are likely to churn. Solution with Hadoop + Mahout + (a lot of) custom coding DAY 1 DAY 3 DAY 12 DAY 18 1. Define a schema and create tables for customer data, past transactions, service usage log files, and so on. Manually list columns, types, defining separator characters, etc. 2. Write HiveQL queries (or Pig scripts or other code) to aggregate transactions and service logs for each customer and calculate attributes describing them 3. Implement and execute a custom MapReduce job to convert data to Mahout s input format 4. Run the Mahout Naïve Bayes algorithm with proper parameters from the command line 5. Repeat each step for the customers you want to apply the model on 6. Implement and execute a custom MapReduce job to convert predictions back into a delimited format 7. Export the result from HDFS 8. Import the result into an RDBMS TIME: 3 WEEKS Disconnected individuals get bogged down in endless process, coding and queries. In the meantime, your competition beats you to the punch. 26
Use Case Example: Churn Prevention with Hadoop Task: Separate loyal customers from customers who are likely to churn. Solution with RapidMiner 1 Combine data from Hadoop and any traditional source 2 3 Train model in distributed Hadoop cluster Apply model in RapidMiner and integrate seamlessly TIME: 10 MINUTES Your team designs the process in collaboration with each other just like they would on a white board. And then you press play. That s it. 27
RapidMiner Radoop consistently delivers performance increases of up to 4,000% compared to pure scripting approaches* * RapidMiner results compared against traditional Hadoop approaches including data integration, data prep, modeling, deployment and maintenance. 28
RapidMiner Fills In The Skills Gap Computer Science Business Analysts Next Generation Data Scientists (this is the realm of heroes) Domain Expertise + + + Data Scientist + Statistician Actuarial Quant Math 29
Companies Around The World Use RapidMiner Technology Pharma & Healthcare Oil & Gas, Chemicals Government & Defense Consulting Manufacturing Aerospace Consumer Products Business Services Software & Analytics Financial Services Entertainment Academia Retail 30
Signature Customers 31
Process Customer Feedback In Multiple Languages To Increase Retention Rates Challenge: Applying basic voice-of-the-customerconcepts and text analytics to customer feedback in over 60 countries worldwide. Solution: Use RapidMiner s Platform to detect churn and identify customer service issues regardless of time, location or language. 150,000 customer comments and tweets in almost every language processed on RapidMiner Data Science Hero Spotlight Business executives, who hold the power to allocate text analytics resources, are beginning to see and realize the benefits to help better focus and solve business problems. -- Han-Sheong Lai Director of Operational Excellence & Customer Advocacy Accelerate Process massive amounts of text at high speed Connect Analyze multiple silos of global customer data Simplify Automatically determine intent-to-churn 32
Quickly Prototype Analytics Models for Under Armour Challenge Wearables Data Challenge: Quickly prototype analytics processes for Under Armour wearable data, for the Under Armour39 Challenge. Solution: Use RapidMiner s code free, drag and drop GUI to quickly design 11 analytics processes, iterate them for optimization, and win the challenge. 1.8M data points analyzed, per hour, by the Under Armour39 wearable Data Science Hero Spotlight RapidMiner is extremely powerful, has the best operators, and can handle Big Data from wearables. It also allows us to rapidly prototype sophisticated analytics, machine learning and classification applications, saving time and money. -- Kevin Logan CEO Accelerate Prototype multiple analytics processes quickly and easily Connect Analyze Big Data from wearables devices Simplify Use code free, drag and drop GUI for analytics 33
Track Data from Millions of Companies to Identify Critical Economic Drivers Challenge: Monitor corporate performance data in real time, and identify correlations, outliers, and economic drivers., Solution: Use RapidMiner s algorithms for rapid prototyping and visualizations for correlations, and to identify outlying, unusual, data. 4.5 M subject matter experts content analyzed in the United Kingdom Data Science Hero Spotlight We benefit from the public availability of extensions and the RapidMiner Marketplace. We can easily search for what others have designed in RapidMiner, and use the extensions that are a fit for us. -- Tom Gatten CEO Accelerate Prototype analytics and visualizations quickly Connect Analyze data from the digital footprint of UK businesses Simplify RapidMiner Marketplace public extensions 34
Search Millions of Patents Online and Automatically Mine Image Data Challenge: Search millions of patents online and automatically mine image data for applicable information., Solution: Use RapidMiner text and image mining to quickly and easily identify several thousand images of interest. 1M+ detailed patent records mined online, including images Data Science Hero Spotlight Some years ago (the patent team) had tried a dedicated patent classification tool that didn t work - RapidMiner does. It provides a framework for substantially reducing the time it takes us to find interesting patents. -- Thomas Hartmann Business Engineer Accelerate Automatically mine millions of online patent images Connect Search through a wide variety online data sources Simplify No programming required to connect insights to action 35
Television Broadcasters Project Drive Broadcast Revenues and Customer Retention with Streaming, Real-Time Analytics Challenge: Better understand TV viewing habits to prevent churn and optimize advertising. Solution: Process streaming Big Data from three million TV viewers, in real-time, to make program content recommendations and target advertising. <5s time to generate high value activities based on predictive analytics Data Science Hero Spotlight: RapidMiner allows us to leverage Big Data, in real-time, for the TV industry. -- Avi Bernstein Professor at the University of Zurich, Department of Informatics Accelerate Personalized recommendations in less than five seconds Connect Stream and analyze from set-top boxes, mobile devices and PCs Simplify Code free design of streaming analytics 36
Don t Take It From Us RapidMiner was most frequently selected based on ease of use, license cost, and speed of model development/ability to build large numbers of models. A number of templates guide users on the most common set of predictive use cases. Customer references cite high levels of satisfaction with the data access, data filtering and manipulation, predictive analytics and further advanced analytics components of the product. Gareth Herschel Research Director "Radoop also makes an eponymous product, focused on Hadoop analytics functionality, that is also visually-oriented and is 'powered by' RapidMiner itself, making the union quite logical. Andrew Brust Research Director RapidMiner is an excellent data mining and statistics platform with a large following. With version 6 the product and company became much more commercial, and the recent acquisition of Radoop puts it in the big data league. Martin Butler Research Director 37
Recognized Leader in Advanced Analytics Challengers StatSoft Angoss SAP Leaders SAS RapidMiner Knime IBM "Customer references cite high levels of satisfaction with the data access, data filtering and manipulation, predictive analytics and further advanced analytics components of the product. Oracle Megaputer FICO Revolution Analytics Microsoft InfoCentricity Alpine Data Labs Alteryx Ability to execute Actuate Niche players Visionaries Completeness of vision As of February 2014. Gartner Magic Quadrant for Advanced Analytics Platforms (Feb. 14). www.rapidminer.com/gartner2014 38
Our History RapidMiner was born from a data science project at the University of Dortmund, Germany, by Ingo Mierswa, Ralf Klinkenberg and Simon Fischer. Initially known as YALE in 2001, the product led to Rapid-I, a company founded by Ingo and Ralf in 2007. Later, the company was renamed to RapidMiner and in 2012, global HQ were established in Cambridge, Massachusetts, USA. Our Milestones 2007 Open Source 2010 Open Core 2013 Business Source 2014 Big Data & Cloud Global Users 5,000 30,000 150,000 250,000 2007 2010 2013 2014 Customers 600+ worldwide Corporate Locations North America EMEA Industries Manufacturing Retail/CPG Financial Utilities/Energy Investors Government Automotive Life Science Telecom Earlybird Venture Capital Open Ocean Capital 39
Activating the data science hero in every business analyst! www.rapidminer.com 40