The most powerful open source data science technologies in your browser.!! Yves Hilpisch



Similar documents
DATA BREACH RISK INTELLIGENCE FOR HIGHER ED. Financial prioritization of data breach risk in the language of the C-suite

Open Cloud System. (Integration of Eucalyptus, Hadoop and AppScale into deployment of University Private Cloud)

Data Centers and Cloud Computing. Data Centers

ENTERPRISE-CLASS MONITORING SOLUTION FOR EVERYONE ALL-IN-ONE OPEN-SOURCE DISTRIBUTED MONITORING

Technology and Cost Considerations for Cloud Deployment: Amazon Elastic Compute Cloud (EC2) Case Study

User-friendly access to Grid and Cloud resources for 18scientific th 19 th computing January / 21

PERFORMANCE CLOUD SERVERS...

Data Centers and Cloud Computing. Data Centers. MGHPCC Data Center. Inside a Data Center

Building Optimized Scale-Out NAS Solutions with Avere and Arista Networks

RevoScaleR Speed and Scalability

ISPS & WEBHOSTS SETUP REQUIREMENTS & SIGNUP FORM LOCAL CLOUD

Data Centers and Cloud Computing. Data Centers

DSS. Diskpool and cloud storage benchmarks used in IT-DSS. Data & Storage Services. Geoffray ADDE

LabStats 5 System Requirements

Cloud Hosting. About Our Hosting

Moving From Hadoop to Spark

Introducing AI. A new, free, comprehensive way to manage servers. For press inquiries please contact:

Behind the scene III Cloud computing

Cloud Hosting. About Our Hosting

OTM in the Cloud. Ryan Haney

Building a Continuous Integration Pipeline with Docker

grow your storage business

Kaseya IT Automation Framework

Twitter the love to #smbnation and #gs103 My twitter handle = directorcia Survey forms. Questions at the end.

Make technology your business advantage

Web Hosting Recommendation Report

Early SaaS Adoption for Your Existing Applications

Cloud UT. Pay-as-you-go computing explained

Proposal for Virtual Private Server Provisioning

Build Your Managed Services Business with ScienceLogic

SAP Crystal Reports & SAP HANA: Integration & Roadmap Kenneth Li SAP SESSION CODE: 0401

Cisco Integration Platform

Data Lab System Architecture

Integrating Web - based Services with Distributed Computing over a Network

NetQoS Delivers Distributed Network

Journey to the Intelligent Cloud - Part 2 -

Bringing Big Data Modelling into the Hands of Domain Experts

Protecting Data with a Unified Platform

Cloud Computing & Hosting Solutions

Is my site ready for upgrade to v7.6?

Android In The Cloud: A New PaaS Computing Platform

Shark Installation Guide Week 3 Report. Ankush Arora

Laurence Liew General Manager, APAC. Economics Is Driving Big Data Analytics to the Cloud

Cheminformatics in the Cloud. Michael A. Dippolito DeltaSoft, Inc. 3-June-2009 ChemAxon European User Group Meeting

ISPS & WEBHOSTS SETUP REQUIREMENTS & SIGNUP FORM LOCAL CLOUD

CRM FAQs. What is CRM Software?

CRITEO INTERNSHIP PROGRAM 2015/2016

Distributed Cloud Environment for PL-Grid Applications

VPS Cloud Hosting. Call (02)

GEM Network Advantages and Disadvantages for Stand-Alone PC

Automating Big Data Benchmarking for Different Architectures with ALOJA

DreamObjects. Cloud Object Storage Powered by Ceph. Monday, November 5, 12

CLOUD. MADE EASY. vnebula Portal

Data Centers and Cloud Computing

Software Product Information. Faba5 Website

Optimize Business Productivity Evolve Your Business with the Cloud. Slide 1

MALAYSIAN PUBLIC SECTOR OPEN SOURCE SOFTWARE (OSS) PROGRAMME. COMPARISON REPORT ON NETWORK MONITORING SYSTEMS (Nagios and Zabbix)

Creating Microsoft Azure Web Sites

Cascade Collaboration Solutions 5 Aug 2014

Realizing the Potential: Selling SAP Business One in the Cloud

OpenShift. OpenShift platform features. Benefits Document. openshift. Feature Benefit OpenShift. Enterprise

Build Your Own Performance Test Lab in the Cloud. Leslie Segal Testware Associate, Inc.

Public Cloud Offerings and Private Cloud Options. Week 2 Lecture 4. M. Ali Babar

MassTransit vs. FTP Comparison

REDEFINING THE ENTERPRISE OS RED HAT ENTERPRISE LINUX 7

An Oracle White Paper July Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide

ARC VIEW. Inductive Automation s Ignition Technology Offers Potential to Disrupt HMI/SCADA Market VISION, EXPERIENCE, ANSWERS FOR INDUSTRY

Ø Teaching Evaluations. q Open March 3 through 16. Ø Final Exam. q Thursday, March 19, 4-7PM. Ø 2 flavors: q Public Cloud, available to public

Security Certification of Third- Parties Applications

The Ultimate Business & Enterprise Hosting Solutions.

Cloud Computing and Business Intelligence

Appendix A Current Scope of Government Public Cloud Services and Government Public Cloud Related Services

Metalogix Replicator. Quick Start Guide. Publication Date: May 14, 2015

ENTERPRISE MOBILITY STRATEGY. We work for you, not your technology vendors.

An Architecture Vision

How To Create A Data Visualization With Apache Spark And Zeppelin

SuiteCRM Customer Relationship Management System

Performance Comparison Analysis of Linux Container and Virtual Machine for Building Cloud

Cloud n Service Presentation. NTT Communications Corporation Cloud Services

OVERVIEW. The complete IaaS platform for service providers

DVS-100 Installation Guide

ITP 140 Mobile App Technologies. Web Hosting and Cloud by Nathan Greenfield

Client Technology Solutions Suresh Kumar Chief Information Officer

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

CLOUDFORMS Open Hybrid Cloud

Cloud Hosting. QCLUG presentation - Aaron Johnson. Amazon AWS Heroku OpenShift

Applied Micro development platform. ZT Systems (ST based) HP Redstone platform. Mitac Dell Copper platform. ARM in Servers

Oracle Database Public Cloud Services

Using GitHub for Rally Apps (Mac Version)

Ubuntu and Hadoop: the perfect match

Boas Betzler. Planet. Globally Distributed IaaS Platform Examples AWS and SoftLayer. November 9, IBM Corporation

Restricted Document. Pulsant Technical Specification

Scalability and Performance Report - Analyzer 2007

Simpli Networks. m

Automated Performance Testing of Desktop Applications

Security in the Sauce Labs Cloud

Michelle Metzger TLG Learning. Support:

FRAFOS GmbH Windscheidstr. 18 Ahoi Berlin Germany

VDI Clients. Delivering Tomorrow's Virtual Desktop Today

Transcription:

The most powerful open source data science technologies in your browser.!! Yves Hilpisch

I. The Market and The Problem II. How We Solve The Problem III. Market Size and Facts IV. Strategic Opportunities

Mega Trends Mega trends that influence data science Today s standard is open source, even for key technologies. More and more data sets are open and free. Complex analytics work flows are coded in the browser. Dynamic communities shape the way knowledge is transmitted Individuals and institutions store more and more data in the cloud. Infrastructure is a standardized commodity, billed by the hour.

Open Source Software Revolution OSS revolutionizes data science both in the front & back end FRONT END In the front end, OSS revolutionizes how data scientists and developers work on a daily basis. BACK END In the back end, OSS revolutionizes how analytics workflows and data applications are deployed and scaled. DigitalOcean is a simple and fast cloud hosting provider built for developers. Customers can create a cloud server in 55 seconds, and pricing plans start at only $5 per month for 512MB of RAM, 20GB SSD, 1 CPU, and 1TB Transfer.

The Problem Obstacles to using OSS for data science Open Source fast changing environment Vendors & Partners almost no vendors that provide help & support Libraries huge amount of libraries to manage Tools multitude of useful standalone tools Deployment complex, lengthy, costly, risky Maintenance how to update, maintain infrastructure? Diverse End Users computer & data scientists as well as domain experts Training how to train and re-train people? Start where and how to start, who to talk to?

I. The Market and The Problem II. How We Solve The Problem III. Market Size and Facts IV. Strategic Opportunities

datapark.io Open source data science technologies in your browser Tools and technologies data scientists know and love.

Browser-based Data Science datapark capitalizes on new Web technologies and tools! 1. Generation: Move Data Around data analytics started by moving data from one place to another, analyzing it locally and moving results back to the remote data source 2. Generation: Move Code Around moving tons of data is costly and time consuming; moving small code sets is faster and less costly 3. Generation: Don't Move Anything the Browser and Web technologies allow to work directly and in real-time on the infrastructure where data and code are stored (replacing e.g. remote ssh access)

Feature Rich datapark is essentially a data scientist s wish list

The Platform Bringing the best of Open Source together in the browser Absorb what is useful, discard what is not,! and add what is uniquely your own.! Bruce Lee

Natural Evolution From Python for Finance to Open Source for Data Analytics PRIMARY USE PYTHON DATA ANALYTICS PYTHON QUANT PLATFORM QUANT PLATFORM TECHNOLOGY

I. The Market and The Problem II. How We Solve The Problem III. Market Size and Facts IV. Strategic Opportunities

Data Scientists and Engineers There are about 10mn people in technical computing Source: diverse Web resources; in mn people

Data Analytics Data analytics is a top priority of almost any organisation Companies will spend an average of $7.4M on data-related initiatives over the next twelve months, with enterprises investing $13.8M, and small & medium businesses (SMBs) investing $1.6M.! 80% of enterprises and 63% of small & medium businesses (SMBs) already have deployed or are planning to deploy big data projects in the next twelve months.! 83% of organizations are prioritizing structured data initiatives as critical or high priority in 2015, and 36% planning to increase their budgets for data-driven initiatives in 2015. Source: http://www.forbes.com

Open Source Data Science OS languages dominate data science these days fastest growing Poll data from August 2014. Source: http://www.kdnuggets.com

Open Source Data Science R, Python and SQL dominate OS data science Poll data from August 2014; usage in %. Source: http://www.kdnuggets.com

Vendor Criteria in Data Analytics Integration, security, ease of use & scalability important open in all directions decades of Linux only standards Docker, Cloud Source: 2015 Big Data Analytics Survey (Summary Slides)

Platform Competitors trying to solve the platform problem for data scientists Proprietary Notebook solution, closed platform. sense.io SQL focus, closed platform. modeanalytics.com Python focus, cloud version not maintained. wakari.io

Major Competitor The major competitor is Jupyter deployed in the cloud How to maintain, how to ensure security, how to share, how to control? DigitalOcean droplet for 5 USD p.m., Jupyter with Python 3.4, deployed via Docker for 20+ users The MVP: http://jupyter.quant-platform.com

I. The Market and The Problem II. How We Solve The Problem III. Market Size and Facts IV. Strategic Opportunities

Use Cases for datapark.io From teaching to data science to a social app store Teaching Programming & Data Science Data Science Platform in Institutions and Corporations Analytics-as-a-Service for Open Source Projects and Proprietary Data and Code Market Place for Ideas, Projects, Apps etc. ( Social Data Science )

Establishing a Standard Building critical mass & social components, improve scalability Goal of 100,000 users to learn from Building out Social Components Improving Deployment and Scalability Becoming the Github for Data Science

How do we want to reach our goals Making usage as simple as possible based on standards Sign-up Two fields only 30 seconds, immediate full fledged functionality Infrastructure Well established components Ubuntu, Anaconda, Docker, Tools All that you know & love IPython Notebook, ACE, Shell (Git, Vim), Open Using standards only IPYNBs, Linux FS, Dropbox, Drive (easy in/out)

Just try it. http://datapark.io Give us feedback. team@datapark.io

! Dr. Yves J. Hilpisch! datapark.io team@datapark.io @dataparkio! The Python Quants GmbH