IBM Machine Learning and Data Analytics Collaboration Opportunities. Graham Mackintosh IBM Emerging Technology Project Executive 28 Sept 2016

Size: px
Start display at page:

Download "IBM Machine Learning and Data Analytics Collaboration Opportunities. Graham Mackintosh IBM Emerging Technology Project Executive 28 Sept 2016"

Transcription

1 IBM Machine Learning and Data Analytics Collaboration Opportunities Graham Mackintosh IBM Emerging Technology Project Executive 28 Sept 2016

2 Topics IBM Emerging Technology Quick Introduction Workshop Context Machine Learning and Deep Learning Apache Spark open CERN openlabs, opendata, SWAN, A few ideas to kick things off

3 IBM jstart IBM Emerging Technology jstart is the IBM Emerging Technologies client engagement team (ibm.com/jstart) Solutions for global customers using open & emerging technologies. Two examples of our active projects: - Spark Machine learning for signal classification - NASA, SETI, Stanford - Predictive analytics and real time streaming with the US Cycling Team Knowledge & experience transfer through customer engagements to IBM organizations and products.

4 jstart Projects and POC Process Requirements driven - start with a simple use case and iterate Low-friction PoC process to explore options & ideas - in-kind contribution Every jstart engagement has an assigned jstart Project Manager and an experienced Architect with ML experience Development Labs and Cloud-based POC environments Experience with a variety of ML technologies (scikit-learn, MLLib, Keras, etc.) Leverage third party & open source packages (e.g. HEP_ML for high energy physics) The jstart Engagement Process Solution Drivers & Boundaries Requirements & Solution Scope Constant feedback on Business & Technology Detailed Design Iterative Development Deployment & Skills Transfer

5 Workshop Context 1. openlabs is promoting the use of Machine Learning at CERN in collaboration with external companies and research institutions CERN openlab Machine Learning and Data Analytics workshop April Apache Spark enables interesting analytic capabilities and is well accepted by the global data science community SWAN open service for interactive analysis in the cloud CERN evaluation of Spark to predict CMS data set popularity Interest in MLLib, scikit-learn, Keras distributed deep learning, etc. 3. CERN is increasingly open to external citizen scientists Collaboration with LAL for the Higgs ML Challenge in 2014 opendata portal access with controls for data embargoes

6 Workshop Context 1. openlabs is promoting the use of Machine Learning at CERN in collaboration with external companies, and research institutions CERN openlab Machine Learning and Data Analytics workshop April 2016 IBM Watson - $1B investment in deep learning and cognitive computing IBM DataWorks launched (Watson, Spark, Data SWAN Science Experience) open service for interactive analysis in the cloud IBM SystemML CERN now evaluation an open Apache of Spark to predict CMS data set popularity incubator project IBM is a core Interest contributor in MLLib, to MLLib scikit-learn, Keras distributed deep learning, etc. IBM 3. Cognitive CERN Compute is increasingly Cluster for open to external citizen scientists Deep Learning opendata portal controlled access that respects data embargoes 2. Apache Spark enables interesting analytic capabilities and is well accepted by the global data science community

7 IBM will open source its breakthrough IBM SystemML machine learning technology and collaborate with Databricks to advance Spark s machine learning capabilities. Workshop IBM will commit more than Context 3,500 researchers and developers to work on Spark-related projects at more than a dozen labs worldwide, and open a Spark Technology Center in San Francisco for the 1. openlabs Data Science and is Developer promoting community to foster the use of Machine Learning at CERN in design-led innovation in intelligent applications. collaboration with external companies, and research institutions IBM will educate more than 1 million data scientists and data CERN engineers on openlab Spark through extensive Machine partnerships Learning and Data Analytics workshop April 2016 with AMPLab, DataCamp, MetiStream, Galvanize and Big Data University MOOC. 2. Apache Spark enables interesting analytic capabilities and is well accepted by the global data science community SWAN open service for interactive analysis in the cloud IBM has announced strategic investment in Spark now #2 contributor to Spark open source IBM Spark Technology Center opened in the heart of Silicon Valley Spark is linked to hundreds of other cloud services on IBM BlueMix Multiple Spark deployments and active POCs CERN evaluation of Spark to predict CMS data set popularity Interest in MLLib, scikit-learn, Keras distributed deep learning, etc. 3. CERN is increasingly open to external citizen scientists opendata portal controlled access that respects data embargoes

8 Workshop Context IBM Data Science Experience IBM Data Exchange IBM collaboration with NASA Advanced Super Computer Division to create training/test sets for ML models Example: 1. openlabs is promoting the use of Machine Learning at CERN in collaboration with external companies, and research institutions CERN openlab Machine Learning and Data Analytics workshop April Apache Spark enables interesting analytic capabilities and is well accepted by the global data science community SWAN open service for interactive analysis in the cloud CERN evaluation of Spark to predict CMS data set popularity Interest in MLLib, scikit-learn, Keras distributed deep learning, etc. 3. CERN is increasingly open to external citizen scientists opendata portal controlled access that respects data embargoes

9 For example. SETI Institute Backgrounder Headquartered in Mountain View, CA. Founded Scientists, researchers and staff. The mission of the SETI Institute is to explore the potential for extra-terrestrial life. search for narrow band radio signals in the frequency range of 1GHz to 10GHz which could be evidence of intelligence outside our solar system. Allen Telescope Array (ATA) Phased Array Synthetic Dish 3 Beams The Allen Telescope Array 4.5TB data every hour 42 Receiving Dishes Each 6m diameter 1GHz to 10GHz Only the data with detected signals is saved for later analysis

10 jstart project in collaboration with NASA and the SETI Institute IBM Apache Spark Services allows large volumes of radio signal data to be analyze in news ways Deep data mining the SETI 10-Year data archives Spark-enable analysis of long-duration observations (~5TB each) Intelligent signal classification with deep learning (Cognitive Compute Cluster) Open environment to allow other institutions and world-experts to participate NASA Space Science Division Stanford University Multiple concurrent research teams Swinburne University, Australia Wide-band signal detection experts IBM Research Johannesburg Square Kilometer Array research team

11 IBM GitHub repository Python Jupyter notebooks Python code install packages Standard GitHub Collaboration functions Import of signal data from SETI radio telescope data archives ~ 10 years SWIFT IBM Object Storage Shared repository of SETI data in Object Store 200M rows of signal event data 360,000 raw recordings of signals of interest Large long duration observations (~5TB each) ~20TB accessible data in storage

12 Example Notebook Jupyter notebook showing complex radio signals being classified based on morphology and other features. Neural net model was developed on the IBM Cognitive Compute Cluster (GPU enhanced) and ported IBM Spark on the cloud for use by other researchers

13 Technical pathfinder Multi-terabyte data sources 100 s of millions of records, millions of binary files ranging from 5MB to 5TB hardened SWIFT connectivity from Spark to Object Store CPU intensive algorithms for multi-variant data processing hardened Spark services for multi-day wall time workloads Multi-terabyte Ground-to-Cloud uploads IBM TS2270 tapes, Softlayer Data Transfer Services, etc. Advanced data visualization and notebook distribution Integration with the IBM Cognitive Compute Cluster Leverage deep learning models for real-time signal triage Cluster availability monitoring and support

14 Open invitation for external researchers and citizen scientists to analyze ATA signal data Gallery of greatest hits and github of notebooks for collaborative outcomes Analytic challenges and hackathons Review of results for potential use by the SETI Institute on the internal Spark environment

15 Stanford University Signal classification based on morphology and selected scalar metrics

16 Stanford University Signal classification based on morphology and selected scalar metrics Example: Randomly (?) modulated signals which are occasionally detected signal of interest? faulty equipment? The scalarinvariant feature transform (SIFT) Fisher Vector Squiggle Fingerprint

17 Getting back to the context of this workshop IBM experience is that these three are tightly linked 1. openlabs is promoting the use of Machine Learning at CERN in collaboration with external companies, and research institutions CERN openlab Machine Learning and Data Analytics workshop April Apache Spark enables interesting analytic capabilities and is well accepted by the global data science community SWAN open service for interactive analysis in the cloud CERN evaluation of Spark to predict CMS data set popularity Interest in MLLib, scikit-learn, Keras distributed deep learning, etc. 3. CERN is increasingly open to external citizen scientists opendata portal controlled access that respects data embargoes IBM is investing strategically in both Spark and DL Spark community is hotbed of ML and DL activity IBM DSX and Spark Services are ideally suited to support public-facing initiatives Externally contributed innovations can be leveraged for internal use (which is often the motivator) This convergence is the basis for proposing that CERN & IBM should collaborate in these areas

18 Ideas: Two parallel work streams 1. POC for Internal Use Case many possibilities from April workshop jstart collaboration no-charge exploration of the potential, iterative development/demos, begin knowledge transfer Leverage of IBM Cognitive Compute Cluster and access to IBM Spark and DSX, Softlayer, Object Store, BlueMix services. 2. POC for Public facing Use Case IBM Data Science Experience Fully support IBM cloud infrastructure 24x7 Expand and extend the reach of SWAN Controlled access to CMS data Hack-a-thons and ML challenges

19 Thank you

20 Supporting Material

21 The jstart Engagement Process Solution Drivers & Boundaries Requirements & Solution Scope Detailed Design Iterative Development Deployment & Skills Transfer Clear understanding of business problem to be solved Business and technical management commitment Funding in place Right skills identified and committed to project Decision making context Solution definition Small team Define scope Map business needs and technology Deliverables Use cases Preliminary design Tentative schedule Initial sizing Detailed schedule Finalize scope Final technology selections Deliverables Design documents Project schedule Early prototyping Regular code drops Testing throughout cycle Constant feedback from users Modifications via change request Solution deployment Customer selfsufficiency Reusable assets Other business areas or technology

22 jstart and Apache Spark Ideal for Rapid Results POCs! Apache Foundation open source project In-memory compute engine that works with data; not a data store Enables highly iterative analysis on large volumes of data at scale Unified rapid dev environment for developers and data engineers Greatly simplifies the development of intelligent apps fueled by data

23 Thank You!

IBM Big Data. Hadoop-tietoisku kumppaneille Pekka Leppänen, IBM Analytics Platform Leader Finland. 2015 IBM Corporation

IBM Big Data. Hadoop-tietoisku kumppaneille Pekka Leppänen, IBM Analytics Platform Leader Finland. 2015 IBM Corporation IBM Big Data Hadoop-tietoisku kumppaneille Pekka Leppänen, IBM Analytics Platform Leader Finland 2015 IBM Corporation Agenda 8.30 Aamiainen ja ilmoittautuminen 9:10 9:45 Keskeiset toimijat ja trendit markkinoilla

More information

Databricks. A Primer

Databricks. A Primer Databricks A Primer Who is Databricks? Databricks vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache Spark, a powerful

More information

Databricks. A Primer

Databricks. A Primer Databricks A Primer Who is Databricks? Databricks was founded by the team behind Apache Spark, the most active open source project in the big data ecosystem today. Our mission at Databricks is to dramatically

More information

Making big data simple with Databricks

Making big data simple with Databricks Making big data simple with Databricks We are Databricks, the company behind Spark Founded by the creators of Apache Spark in 2013 Data 75% Share of Spark code contributed by Databricks in 2014 Value Created

More information

Manjula Ambur NASA Langley Research Center April 2014

Manjula Ambur NASA Langley Research Center April 2014 Manjula Ambur NASA Langley Research Center April 2014 Outline What is Big Data Vision and Roadmap Key Capabilities Impetus for Watson Technologies Content Analytics Use Potential use cases What is Big

More information

Unified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia

Unified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia Unified Big Data Processing with Apache Spark Matei Zaharia @matei_zaharia What is Apache Spark? Fast & general engine for big data processing Generalizes MapReduce model to support more types of processing

More information

Ali Ghodsi Head of PM and Engineering Databricks

Ali Ghodsi Head of PM and Engineering Databricks Making Big Data Simple Ali Ghodsi Head of PM and Engineering Databricks Big Data is Hard: A Big Data Project Tasks Tasks Build a Hadoop cluster Challenges Clusters hard to setup and manage Build a data

More information

Big Data Event. ACSIP & IBM Big Data University

Big Data Event. ACSIP & IBM Big Data University ACSIP- Association of Chinese Senior IT Professionals Big Data Event Brought you By ACSIP & IBM Big Data University 2015-12-06 Join ACSIP WeChat Group by scanning QR Code with a note ACSIP Group for invitation

More information

The Big Data Revolution: welcome to the Cognitive Era.

The Big Data Revolution: welcome to the Cognitive Era. The Big Data Revolution: welcome to the Cognitive Era. Yves Eychenne, Cloud Advisor, IBM Email: yves.eychenne@fr.ibm.com @yeychenne 2015 INTERNATIONAL BUSINESS MACHINES CORPORATION Agenda Big Data and

More information

Moving From Hadoop to Spark

Moving From Hadoop to Spark + Moving From Hadoop to Spark Sujee Maniyam Founder / Principal @ www.elephantscale.com sujee@elephantscale.com Bay Area ACM meetup (2015-02-23) + HI, Featured in Hadoop Weekly #109 + About Me : Sujee

More information

BIG DATA & DATA SCIENCE

BIG DATA & DATA SCIENCE BIG DATA & DATA SCIENCE ACADEMY PROGRAMS IN-COMPANY TRAINING PORTFOLIO 2 TRAINING PORTFOLIO 2016 Synergic Academy Solutions BIG DATA FOR LEADING BUSINESS Big data promises a significant shift in the way

More information

I. Justification and Program Goals

I. Justification and Program Goals MS in Data Science proposed by Department of Computer Science, B. Thomas Golisano College of Computing and Information Sciences Department of Information Sciences and Technologies, B. Thomas Golisano College

More information

Microsoft Research Windows Azure for Research Training

Microsoft Research Windows Azure for Research Training Copyright 2013 Microsoft Corporation. All rights reserved. Except where otherwise noted, these materials are licensed under the terms of the Apache License, Version 2.0. You may use it according to the

More information

From Spark to Ignition:

From Spark to Ignition: From Spark to Ignition: Fueling Your Business on Real-Time Analytics Eric Frenkiel, MemSQL CEO June 29, 2015 San Francisco, CA What s in Store For This Presentation? 1. MemSQL: A real-time database for

More information

Networking in the Hadoop Cluster

Networking in the Hadoop Cluster Hadoop and other distributed systems are increasingly the solution of choice for next generation data volumes. A high capacity, any to any, easily manageable networking layer is critical for peak Hadoop

More information

Machine Learning and Predictive Analytics Foster Growth Convert Edit Feb. 21 2014

Machine Learning and Predictive Analytics Foster Growth Convert Edit Feb. 21 2014 Machine Learning and Predictive Analytics Foster Growth Convert Edit Feb. 21 2014 By Janet Wagner, PW Staff Machine learning technology, which is defined in this ProgrammableWeb article, is starting to

More information

ANALYTICS CENTER LEARNING PROGRAM

ANALYTICS CENTER LEARNING PROGRAM Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals

More information

Information management software solutions White paper. Powerful data warehousing performance with IBM Red Brick Warehouse

Information management software solutions White paper. Powerful data warehousing performance with IBM Red Brick Warehouse Information management software solutions White paper Powerful data warehousing performance with IBM Red Brick Warehouse April 2004 Page 1 Contents 1 Data warehousing for the masses 2 Single step load

More information

Analysis Tools and Libraries for BigData

Analysis Tools and Libraries for BigData + Analysis Tools and Libraries for BigData Lecture 02 Abhijit Bendale + Office Hours 2 n Terry Boult (Waiting to Confirm) n Abhijit Bendale (Tue 2:45 to 4:45 pm). Best if you email me in advance, but I

More information

Microsoft Research Microsoft Azure for Research Training

Microsoft Research Microsoft Azure for Research Training Copyright 2014 Microsoft Corporation. All rights reserved. Except where otherwise noted, these materials are licensed under the terms of the Apache License, Version 2.0. You may use it according to the

More information

HDP Hadoop From concept to deployment.

HDP Hadoop From concept to deployment. HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some

More information

2015 Ironside Group, Inc. 2

2015 Ironside Group, Inc. 2 2015 Ironside Group, Inc. 2 Introduction to Ironside What is Cloud, Really? Why Cloud for Data Warehousing? Intro to IBM PureData for Analytics (IPDA) IBM PureData for Analytics on Cloud Intro to IBM dashdb

More information

Machine Learning and Predictive Analytics Foster Growth [1]

Machine Learning and Predictive Analytics Foster Growth [1] Machine Learning and Predictive Analytics Foster Growth [1] Machine learning technology, which is defined in this ProgrammableWeb article [2], is starting to become a common component in many types of

More information

Predictive Modeling Techniques in Insurance

Predictive Modeling Techniques in Insurance Predictive Modeling Techniques in Insurance Tuesday May 5, 2015 JF. Breton Application Engineer 2014 The MathWorks, Inc. 1 Opening Presenter: JF. Breton: 13 years of experience in predictive analytics

More information

Technology Enablement

Technology Enablement SOLUTION OVERVIEW 1 ABOUT TECHMILEAGE Founded in 2008 / Tempe, Arizona Over 100 engagements Full range of business & technology services Software Development, Big Data, Cloud/AWS, BI, Advanced Analytics

More information

Customer Case Study. Automatic Labs

Customer Case Study. Automatic Labs Customer Case Study Automatic Labs Customer Case Study Automatic Labs Benefits Validated product in days Completed complex queries in minutes Freed up 1 full-time data scientist Infrastructure savings

More information

Analytics In the Cloud

Analytics In the Cloud Analytics In the Cloud 9 th September Presented by: Simon Porter Vice President MidMarket Sales Europe Disruptors are reinventing business processes and leading their industries with digital transformations

More information

Performance Architect Remote Storage (Intern)

Performance Architect Remote Storage (Intern) Performance Architect Remote Storage (Intern) Samsung Semiconductor, Inc. is a world leader in Memory, System LSI and LCD technologies. We are currently looking for a Performance Architect (Intern) to

More information

Unlocking the True Value of Hadoop with Open Data Science

Unlocking the True Value of Hadoop with Open Data Science Unlocking the True Value of Hadoop with Open Data Science Kristopher Overholt Solution Architect Big Data Tech 2016 MinneAnalytics June 7, 2016 Overview Overview of Open Data Science Python and the Big

More information

Big Data Web Analytics Platform on AWS for Yottaa

Big Data Web Analytics Platform on AWS for Yottaa Big Data Web Analytics Platform on AWS for Yottaa Background Yottaa is a young, innovative company, providing a website acceleration platform to optimize Web and mobile applications and maximize user experience,

More information

IBM Smarter Analytics für Big Data

IBM Smarter Analytics für Big Data FZI Forschungszentrum Informatik am Karlsruher Institut für Technologie Technologie Workshop Big Data 22. Juni 2015 Axel J. Schwarz Mobil: +49 171 5619419 E-Mail: axel.j.schwarz@de.ibm.com IBM Smarter

More information

Customer Case Study. Sharethrough

Customer Case Study. Sharethrough Customer Case Study Customer Case Study Benefits Faster prototyping of new applications Easier debugging of complex pipelines Improved overall engineering team productivity Summary offers a robust advertising

More information

Big Data Architect Certification Self-Study Kit Bundle

Big Data Architect Certification Self-Study Kit Bundle Big Data Architect Certification Bundle This certification bundle provides you with the self-study materials you need to prepare for the exams required to complete the Big Data Architect Certification.

More information

Frequently Asked Questions Plus What s New for CA Application Performance Management 9.7

Frequently Asked Questions Plus What s New for CA Application Performance Management 9.7 Frequently Asked Questions Plus What s New for CA Application Performance Management 9.7 CA Technologies is announcing the General Availability (GA) of CA Application Performance Management (CA APM) 9.7

More information

2015 IBM Continuous Engineering Open Labs Target to better LEARNING

2015 IBM Continuous Engineering Open Labs Target to better LEARNING 2015 IBM Continuous Engineering Open Labs Target to better LEARNING (NO COST - not a substitute for full training courses) Choose from one or more of these Self-Paced, Hands-On Labs: DMT 3722 - Learn to

More information

Performance and Scalability Overview

Performance and Scalability Overview Performance and Scalability Overview This guide provides an overview of some of the performance and scalability capabilities of the Pentaho Business Analytics Platform. Contents Pentaho Scalability and

More information

Towards a Thriving Data Economy: Open Data, Big Data, and Data Ecosystems

Towards a Thriving Data Economy: Open Data, Big Data, and Data Ecosystems Towards a Thriving Data Economy: Open Data, Big Data, and Data Ecosystems Volker Markl volker.markl@tu-berlin.de dima.tu-berlin.de dfki.de/web/research/iam/ bbdc.berlin Based on my 2014 Vision Paper On

More information

Worldwide Advanced and Predictive Analytics Software Market Shares, 2014: The Rise of the Long Tail

Worldwide Advanced and Predictive Analytics Software Market Shares, 2014: The Rise of the Long Tail MARKET SHARE Worldwide Advanced and Predictive Analytics Software Market Shares, 2014: The Rise of the Long Tail Alys Woodward Dan Vesset IDC MARKET SHARE FIGURE FIGURE 1 Worldwide Advanced and Predictive

More information

Bright Idea: GE s Storage Performance Best Practices Brian W. Walker

Bright Idea: GE s Storage Performance Best Practices Brian W. Walker Bright Idea: GE s Storage Performance Best Practices Brian W. Walker Principal Architect, Cloud Solutions 1 Speaker Introduction Brian Walker Principal Architect, Cloud Solutions Brian brings more than

More information

Big Data Processing. Patrick Wendell Databricks

Big Data Processing. Patrick Wendell Databricks Big Data Processing Patrick Wendell Databricks About me Committer and PMC member of Apache Spark Former PhD student at Berkeley Left Berkeley to help found Databricks Now managing open source work at Databricks

More information

How To Create A Data Visualization With Apache Spark And Zeppelin 2.5.3.5

How To Create A Data Visualization With Apache Spark And Zeppelin 2.5.3.5 Big Data Visualization using Apache Spark and Zeppelin Prajod Vettiyattil, Software Architect, Wipro Agenda Big Data and Ecosystem tools Apache Spark Apache Zeppelin Data Visualization Combining Spark

More information

Analytics-as-a-Service: From Science to Marketing

Analytics-as-a-Service: From Science to Marketing Analytics-as-a-Service: From Science to Marketing Data Information Knowledge Insights (Discovery & Decisions) Kirk Borne George Mason University, Fairfax, VA www.kirkborne.net @KirkDBorne Big Data: What

More information

What s next for the Berkeley Data Analytics Stack?

What s next for the Berkeley Data Analytics Stack? What s next for the Berkeley Data Analytics Stack? Michael Franklin June 30th 2014 Spark Summit San Francisco UC BERKELEY AMPLab: Collaborative Big Data Research 60+ Students, Postdocs, Faculty and Staff

More information

Big Data at Spotify. Anders Arpteg, Ph D Analytics Machine Learning, Spotify

Big Data at Spotify. Anders Arpteg, Ph D Analytics Machine Learning, Spotify Big Data at Spotify Anders Arpteg, Ph D Analytics Machine Learning, Spotify Quickly about me Quickly about Spotify What is all the data used for? Quickly about Spark Hadoop MR vs Spark Need for (distributed)

More information

Disrupting The Market: Predictive Analytics As A Service

Disrupting The Market: Predictive Analytics As A Service Disrupting The Market: Predictive Analytics As A Service 0 Problem 8.7 Billion Connected Devices 1 Growing 25% Annually What Does This Data Tell Us About Sensor Use? 1 Study conducted by Cisco 1 Solution

More information

Towards Smart and Intelligent SDN Controller

Towards Smart and Intelligent SDN Controller Towards Smart and Intelligent SDN Controller - Through the Generic, Extensible, and Elastic Time Series Data Repository (TSDR) YuLing Chen, Dell Inc. Rajesh Narayanan, Dell Inc. Sharon Aicler, Cisco Systems

More information

Map-Reduce for Machine Learning on Multicore

Map-Reduce for Machine Learning on Multicore Map-Reduce for Machine Learning on Multicore Chu, et al. Problem The world is going multicore New computers - dual core to 12+-core Shift to more concurrent programming paradigms and languages Erlang,

More information

SURVEY REPORT DATA SCIENCE SOCIETY 2014

SURVEY REPORT DATA SCIENCE SOCIETY 2014 SURVEY REPORT DATA SCIENCE SOCIETY 2014 TABLE OF CONTENTS Contents About the Initiative 1 Report Summary 2 Participants Info 3 Participants Expertise 6 Suggested Discussion Topics 7 Selected Responses

More information

SOLUTION BRIEF BIG DATA MANAGEMENT. How Can You Streamline Big Data Management?

SOLUTION BRIEF BIG DATA MANAGEMENT. How Can You Streamline Big Data Management? SOLUTION BRIEF BIG DATA MANAGEMENT How Can You Streamline Big Data Management? Today, organizations are capitalizing on the promises of big data analytics to innovate and solve problems faster. Big Data

More information

CA Workload Automation for SAP Software

CA Workload Automation for SAP Software CA Workload Automation for SAP Software 2 The Application Economy Spurs New SAP System Workload Challenges Business is being shaped more and more by what has become an application-based world. In this

More information

Introduction of thesis topics

Introduction of thesis topics Introduction of thesis topics ICT thesis contest 2015 June 5, 2015 Tallinn, Mektory conference hall Maidu Harjak IBM Eesti 1 Global University Programs IBM Academic Initiative Resources for faculty, teachers,

More information

An Open-Source Streaming Machine Learning and Real-Time Analytics Architecture

An Open-Source Streaming Machine Learning and Real-Time Analytics Architecture An Open-Source Streaming Machine Learning and Real-Time Analytics Architecture Using an IoT example (incubating) (incubating) Fred Melo @fredmelo_br 1 William Markito @william_markito Traditional Data

More information

CRITEO INTERNSHIP PROGRAM 2015/2016

CRITEO INTERNSHIP PROGRAM 2015/2016 CRITEO INTERNSHIP PROGRAM 2015/2016 A. List of topics PLATFORM Topic 1: Build an API and a web interface on top of it to manage the back-end of our third party demand component. Challenge(s): Working with

More information

IBM Big Data in Government

IBM Big Data in Government IBM Big in Government Turning big data into smarter decisions Deepak Mohapatra Sr. Consultant Government IBM Software Group dmohapatra@us.ibm.com The Big Paradigm Shift 2 Big Creates A Challenge And an

More information

Big Data Research in the AMPLab: BDAS and Beyond

Big Data Research in the AMPLab: BDAS and Beyond Big Data Research in the AMPLab: BDAS and Beyond Michael Franklin UC Berkeley 1 st Spark Summit December 2, 2013 UC BERKELEY AMPLab: Collaborative Big Data Research Launched: January 2011, 6 year planned

More information

Dell* In-Memory Appliance for Cloudera* Enterprise

Dell* In-Memory Appliance for Cloudera* Enterprise Built with Intel Dell* In-Memory Appliance for Cloudera* Enterprise Find out what faster big data analytics can do for your business The need for speed in all things related to big data is an enormous

More information

Getting Started with IBM Bluemix: Web Application Hosting Scenario on Java Liberty IBM Redbooks Solution Guide

Getting Started with IBM Bluemix: Web Application Hosting Scenario on Java Liberty IBM Redbooks Solution Guide Getting Started with IBM Bluemix: Web Application Hosting Scenario on Java Liberty IBM Redbooks Solution Guide Based on the open source Cloud Foundry technology, IBM Bluemix is an open-standard, cloud-based

More information

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first

More information

THE ENTERPRISE GAMING COOKBOOK

THE ENTERPRISE GAMING COOKBOOK THE ENTERPRISE GAMING COOKBOOK Learn how game studios in our Ecosystem are using Bluemix to build the world s most advanced serious games We break down the web services needed to develop a variety of experiences

More information

LIVEPERSON SOLUTIONS BRIEF. Identify Your Highest Value Visitors for Real-Time Engagement and Increased Sales

LIVEPERSON SOLUTIONS BRIEF. Identify Your Highest Value Visitors for Real-Time Engagement and Increased Sales LIVEPERSON SOLUTIONS BRIEF Identify Your Highest Value Visitors for Real-Time Engagement and Increased Sales 2014 LIVEPERSON SOLUTIONS BRIEF Targeting Targeting technology in LivePerson s LiveEngage digital

More information

PyCompArch: Python-Based Modules for Exploring Computer Architecture Concepts

PyCompArch: Python-Based Modules for Exploring Computer Architecture Concepts PyCompArch: Python-Based Modules for Exploring Computer Architecture Concepts Workshop on Computer Architecture Education 2015 Dan Connors, Kyle Dunn, Ryan Bueter Department of Electrical Engineering University

More information

API MORNING. IBM Bluemix. The Digital Innovation Platform. yves.holvoet@fr.ibm.com. 2015 IBM Corporation

API MORNING. IBM Bluemix. The Digital Innovation Platform. yves.holvoet@fr.ibm.com. 2015 IBM Corporation API MORNING IBM Bluemix The Digital Innovation Platform yves.holvoet@fr.ibm.com Timing is critical Today s apps must keep up with the speed of the app revolution. Customer Managed Code Data Runtime Middleware

More information

Transforming Analytics for Cognitive Business

Transforming Analytics for Cognitive Business Transforming Analytics for Cognitive Business Alistair Rennie General Manager Solutions, IBM Analytics @alistair_rennie IBM Chief Data Officer Strategy Summit Data fuels innovative offerings 28% of car

More information

Sustainability in Action

Sustainability in Action Summary Sketch FY2012 February 2011 January 2012 Sustainability in Action Tackling sustainability issues and enabling our planet, its people, and all living things to continue to thrive is a design challenge,

More information

Tableau Server 7.0 scalability

Tableau Server 7.0 scalability Tableau Server 7.0 scalability February 2012 p2 Executive summary In January 2012, we performed scalability tests on Tableau Server to help our customers plan for large deployments. We tested three different

More information

Microsoft Big Data. Solution Brief

Microsoft Big Data. Solution Brief Microsoft Big Data Solution Brief Contents Introduction... 2 The Microsoft Big Data Solution... 3 Key Benefits... 3 Immersive Insight, Wherever You Are... 3 Connecting with the World s Data... 3 Any Data,

More information

SCALABLE FILE SHARING AND DATA MANAGEMENT FOR INTERNET OF THINGS

SCALABLE FILE SHARING AND DATA MANAGEMENT FOR INTERNET OF THINGS Sean Lee Solution Architect, SDI, IBM Systems SCALABLE FILE SHARING AND DATA MANAGEMENT FOR INTERNET OF THINGS Agenda Converging Technology Forces New Generation Applications Data Management Challenges

More information

MASTER OF SCIENCE IN Computing & Data Analytics. (M.Sc. CDA)

MASTER OF SCIENCE IN Computing & Data Analytics. (M.Sc. CDA) MASTER OF SCIENCE IN Computing & Data Analytics (M.Sc. CDA) Admissions and Fee Application deadline: June 1 Admission requirements Learn. Generate. Innovate. 4-yr BSc in Computing Science (or equivalent),

More information

Step by Step: Big Data Technology. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 25 August 2015

Step by Step: Big Data Technology. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 25 August 2015 Step by Step: Big Data Technology Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 25 August 2015 Data Sources IT Infrastructure Analytics 2 B y 2015, 20% of Global 1000 organizations

More information

IBM Analytics The fluid data layer: The future of data management

IBM Analytics The fluid data layer: The future of data management IBM Analytics The fluid data layer: The future of data management Why flexibility and adaptability are crucial in the hybrid cloud world 1 2 3 4 5 6 The new world vision for data architects Why the fluid

More information

MASTER OF SCIENCE IN Computing & Data Analytics. (M.Sc. CDA)

MASTER OF SCIENCE IN Computing & Data Analytics. (M.Sc. CDA) MASTER OF SCIENCE IN Computing & Data Analytics (M.Sc. CDA) Learn. Generate. Innovate. Expand Your Skills to Meet the Demands of Big Data Saint Mary s new Master of Science in Computing & Data Analytics

More information

Advanced In-Database Analytics

Advanced In-Database Analytics Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??

More information

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved. Mike Maxey Senior Director Product Marketing Greenplum A Division of EMC 1 Greenplum Becomes the Foundation of EMC s Big Data Analytics (July 2010) E M C A C Q U I R E S G R E E N P L U M For three years,

More information

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Siva Ravada Senior Director of Development Oracle Spatial and MapViewer 2 Evolving Technology Platforms

More information

NIH Commons Overview, Framework & Pilots - Version 1. The NIH Commons

NIH Commons Overview, Framework & Pilots - Version 1. The NIH Commons The NIH Commons Summary The Commons is a shared virtual space where scientists can work with the digital objects of biomedical research, i.e. it is a system that will allow investigators to find, manage,

More information

Innovate with the Cloud built for Cognitive Business - IBM Cloud.

Innovate with the Cloud built for Cognitive Business - IBM Cloud. Innovate with the Cloud built for Cognitive Business - IBM Cloud. Nancy Pearson Vice President Corporate Marketing Cognitive Business Sandy Carter General Manager IBM Cloud Ecosystem and Developers What

More information

A Sumo Logic White Paper. Harnessing Continuous Intelligence to Enable the Modern DevOps Team

A Sumo Logic White Paper. Harnessing Continuous Intelligence to Enable the Modern DevOps Team A Sumo Logic White Paper Harnessing Continuous Intelligence to Enable the Modern DevOps Team As organizations embrace the DevOps approach to application development they face new challenges that can t

More information

Architecture & Experience

Architecture & Experience Architecture & Experience Data Mining - Combination from SAP HANA, R & Hadoop Markus Severin, Solution Principal Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein

More information

TIBCO Live Datamart: Push-Based Real-Time Analytics

TIBCO Live Datamart: Push-Based Real-Time Analytics TIBCO Live Datamart: Push-Based Real-Time Analytics ABSTRACT TIBCO Live Datamart is a new approach to real-time analytics and data warehousing for environments where large volumes of data require a management

More information

Welkom! Copyright 2014 Oracle and/or its affiliates. All rights reserved.

Welkom! Copyright 2014 Oracle and/or its affiliates. All rights reserved. Welkom! WIE? Bestuurslid OGh met BI / WA ervaring Bepalen activiteiten van de vereniging Deelname in organisatie commite van 1 of meerdere events Faciliteren van de SIG s Redactie van OGh-Visie Onderhouden

More information

STREAM ANALYTIX. Industry s only Multi-Engine Streaming Analytics Platform

STREAM ANALYTIX. Industry s only Multi-Engine Streaming Analytics Platform STREAM ANALYTIX Industry s only Multi-Engine Streaming Analytics Platform One Platform for All Create real-time streaming data analytics applications in minutes with a powerful visual editor Get a wide

More information

How to Run a Successful Big Data POC in 6 Weeks

How to Run a Successful Big Data POC in 6 Weeks Executive Summary How to Run a Successful Big Data POC in 6 Weeks A Practical Workbook to Deploy Your First Proof of Concept and Avoid Early Failure Executive Summary As big data technologies move into

More information

Linux A first-class citizen in Windows Azure. Bruno Terkaly bterkaly@microsoft.com Principal Software Engineer Mobile/Cloud/Startup/Enterprise

Linux A first-class citizen in Windows Azure. Bruno Terkaly bterkaly@microsoft.com Principal Software Engineer Mobile/Cloud/Startup/Enterprise Linux A first-class citizen in Windows Azure Bruno Terkaly bterkaly@microsoft.com Principal Software Engineer Mobile/Cloud/Startup/Enterprise 1 First, I am software developer (C/C++, ASM, C#, Java, Node.js,

More information

Next-Generation Mobile App Design and the Rise of Contextual Apps

Next-Generation Mobile App Design and the Rise of Contextual Apps Next-Generation Mobile App Design and the Rise of Contextual Apps Dustin Amrhein damrhei@us.ibm.com @damrhein IBM MobileFirst, North America GameCo wants to tap into the digital economy GameCo wants to

More information

NVIDIA GPUs in the Cloud

NVIDIA GPUs in the Cloud NVIDIA GPUs in the Cloud 4 EVOLVING CLOUD REQUIREMENTS On premises Off premises Hybrid Cloud Connecting clouds New workloads Components to disrupt 5 GLOBAL CLOUD PLATFORM Unified architecture enabled by

More information

BIG DATA: STORAGE, ANALYSIS AND IMPACT GEDIMINAS ŽYLIUS

BIG DATA: STORAGE, ANALYSIS AND IMPACT GEDIMINAS ŽYLIUS BIG DATA: STORAGE, ANALYSIS AND IMPACT GEDIMINAS ŽYLIUS WHAT IS BIG DATA? describes any voluminous amount of structured, semi-structured and unstructured data that has the potential to be mined for information

More information

A Hurwitz white paper. Inventing the Future. Judith Hurwitz President and CEO. Sponsored by Hitachi

A Hurwitz white paper. Inventing the Future. Judith Hurwitz President and CEO. Sponsored by Hitachi Judith Hurwitz President and CEO Sponsored by Hitachi Introduction Only a few years ago, the greatest concern for businesses was being able to link traditional IT with the requirements of business units.

More information

Applications of Deep Learning to the GEOINT mission. June 2015

Applications of Deep Learning to the GEOINT mission. June 2015 Applications of Deep Learning to the GEOINT mission June 2015 Overview Motivation Deep Learning Recap GEOINT applications: Imagery exploitation OSINT exploitation Geospatial and activity based analytics

More information

Software challenges in the implementation of large surveys: the case of J-PAS

Software challenges in the implementation of large surveys: the case of J-PAS Software challenges in the implementation of large surveys: the case of J-PAS 1/21 Paulo Penteado - IAG/USP pp.penteado@gmail.com http://www.ppenteado.net/ast/pp_lsst_201204.pdf (K. Taylor) (A. Fernández-Soto)

More information

SOCIAL MEDIA LISTENING AND ANALYSIS Spring 2014

SOCIAL MEDIA LISTENING AND ANALYSIS Spring 2014 SOCIAL MEDIA LISTENING AND ANALYSIS Spring 2014 EXECUTIVE SUMMARY In this digital age, social media has quickly become one of the most important communication channels. The shift to online conversation

More information

Using and Choosing a Cloud Solution for Data Warehousing

Using and Choosing a Cloud Solution for Data Warehousing TDWI RESEARCH TDWI CHECKLIST REPORT Using and Choosing a Cloud Solution for Data Warehousing By Colin White Sponsored by: tdwi.org JULY 2015 TDWI CHECKLIST REPORT Using and Choosing a Cloud Solution for

More information

Architectures for massive data management

Architectures for massive data management Architectures for massive data management Apache Spark Albert Bifet albert.bifet@telecom-paristech.fr October 20, 2015 Spark Motivation Apache Spark Figure: IBM and Apache Spark What is Apache Spark Apache

More information

The 4 Pillars of Technosoft s Big Data Practice

The 4 Pillars of Technosoft s Big Data Practice beyond possible Big Use End-user applications Big Analytics Visualisation tools Big Analytical tools Big management systems The 4 Pillars of Technosoft s Big Practice Overview Businesses have long managed

More information

An Overview of Predictive Analytics for Practitioners. Dean Abbott, Abbott Analytics

An Overview of Predictive Analytics for Practitioners. Dean Abbott, Abbott Analytics An Overview of Predictive Analytics for Practitioners Dean Abbott, Abbott Analytics Thank You Sponsors Empower users with new insights through familiar tools while balancing the need for IT to monitor

More information

How To Test The Performance Of An Ass 9.4 And Sas 7.4 On A Test On A Powerpoint Powerpoint 9.2 (Powerpoint) On A Microsoft Powerpoint 8.4 (Powerprobe) (

How To Test The Performance Of An Ass 9.4 And Sas 7.4 On A Test On A Powerpoint Powerpoint 9.2 (Powerpoint) On A Microsoft Powerpoint 8.4 (Powerprobe) ( White Paper Revolution R Enterprise: Faster Than SAS Benchmarking Results by Thomas W. Dinsmore and Derek McCrae Norton In analytics, speed matters. How much? We asked the director of analytics from a

More information

Scientific Computing Meets Big Data Technology: An Astronomy Use Case

Scientific Computing Meets Big Data Technology: An Astronomy Use Case Scientific Computing Meets Big Data Technology: An Astronomy Use Case Zhao Zhang AMPLab and BIDS UC Berkeley zhaozhang@cs.berkeley.edu In collaboration with Kyle Barbary, Frank Nothaft, Evan Sparks, Oliver

More information

The Rise of Industrial Big Data. Brian Courtney General Manager Industrial Data Intelligence

The Rise of Industrial Big Data. Brian Courtney General Manager Industrial Data Intelligence The Rise of Industrial Big Data Brian Courtney General Manager Industrial Data Intelligence Agenda Introduction Big Data for the industrial sector Case in point: Big data saves millions at GE Energy Seeking

More information

Next-Gen Big Data Analytics using the Spark stack

Next-Gen Big Data Analytics using the Spark stack Next-Gen Big Data Analytics using the Spark stack Jason Dai Chief Architect of Big Data Technologies Software and Services Group, Intel Agenda Overview Apache Spark stack Next-gen big data analytics Our

More information

Advanced analytics at your hands

Advanced analytics at your hands 2.3 Advanced analytics at your hands Neural Designer is the most powerful predictive analytics software. It uses innovative neural networks techniques to provide data scientists with results in a way previously

More information