Machine Learning for Data-Driven Discovery
|
|
|
- Evangeline Marshall
- 10 years ago
- Views:
Transcription
1 Machine Learning for Data-Driven Discovery Thoughts on the Past, Present and Future Sreenivas Rangan Sukumar, PhD Data Scientist and R&D Staff Computaional Data Analytics Group Computational Sciences and Engineering Division Oak Ridge National Laboratory Phone: Tomorrow: Experience with Data Parallel Frameworks Food-for-thought towards the exascale data analysis supercomputer ORNL is managed by UT-Battelle for the US Department of Energy
2 Today s Outline Scalable Machine Learning Recent Advances and Trends State of the Practice Philosophy, Engineering, Process, Paradigms Are we there yet? If yes, how so? If not, why not? Concluding Future Thoughts Offline Debate and Discussion 2
3 Rows Machine Learning Given examples of a function (x, f(x)), Predict function f(x) for new examples x Columns f 1 f 2 f d Labels c 1 c 1 c 3 N c 2 c k c k 3
4 Machine Learning in the Big Data Era Just in case you missed Assumption Complexity of data N ~ O(10 2 ), d ~ O(10 1 ) k ~ O(1) s 2010-Present Insight A model exists Better data will reveal the beautiful model (Knowing why is important) ( eg IRIS data) A model may not exist, but find a model anyway ( Why is not as important) N ~ O(10 6 ) d ~ O(10 4 ) (eg ImageNet) k ~ O(10 4 ) Dilemma: Better data or better algorithms Volume, Velocity, Variety and Veracity have all increased several orders of magnitude Data Model Relationship Model abstracts data Data is the model Models aggregated data It is not anymore about the 2 N ˆ 1 ( X ) 1 1 x x j ji i P ( X C c ) exp f ( x ) average It is about every j i G ( ) ji ji N i 1 h h i i individual data point Model Parameter Complexity (eg Size of Neural Network) Accuracy, Precision, Recall eg Face Recognition Visual Scene Recognition Computing Capability Personal Computing High Performance Computing O(10 3 ) O(10 10 ) O(10 8 ) to O(10 10 ) in months ~ 70% was accepted Not possible 1 core, 256MB RAM, 8GB disk 1000 cores, 1 teraflops ~95% is the norm ~10% is the best result to date 16 cores,64 GB RAM,2TB disk 3 million cores, 34 petaflops 10-billion parameter network learned to recognize cats from videos Big Data also means Big Expectations Commercial tools are keeping pace with the PC market and not HPC market Number of Dwarves! 7 13 Big Data Magic: Dwarves are doubling 4
5 Today s Talk Compute is scaling up commensurate the data Is machine learning keeping pace with the data and compute scale-up? If Yes : How so? If Not : Why not? 5
6 Scalable Machine Learning: Philosophy The Lifecycle of Data-Intensive Discovery Better Data Collection Querying and Retrieval eg Google, Databases Predictive Modeling eg Climate Change Prediction Association Interrogation Modeling, Simulation, & Validation Data-fusion eg Robotic Vision Off-the-shelf Parallel Hardware Custom ICs eg FPGAs, Adapteva, Rasberry Pi) Customized Processing Eg Nvidia GPGPUs, YarcData Urika Multi-core HPC eg ( Cray XK, Cray XC, IBM Blue Gene) Virtual clusters / Cloud computing eg Amazon AWS, SAS (PaaS, + SaaS) 6
7 Scalable Machine Learning: Philosophy The Lifecycle of Data-Intensive Discovery Querying and Retrieval eg Google, Databases Business Intelligence Relationship analytics Interrogation Better Data Collection Association Data-fusion eg Robotic Vision Predictive Modeling eg Climate Change Prediction Modeling, Simulation, & Validation Predictive modeling appliance Simulation 7
8 Scalable Machine Learning: Discovery Process The Lifecycle of Data-Intensive Discovery Data-Driven Discovery Process Better Data Collection Querying and Retrieval eg Google, Databases Predictive Modeling eg Climate Change Prediction Association Interrogation Data-fusion eg Robotic Vision Modeling, Simulation, & Validation Descriptive Analytics Diagnostic Analytics Predictive Analytics Prescriptive Analytics What happened? Hindsight Why did it happen? Insight Concept adapted from Gartner s Webinar on Big Data What will happen? Foresight How can we make it happen? 8
9 Scalable Machine Learning: System Engineering Data-Driven Discovery Process Descriptive Analytics Diagnostic Analytics Predictive Analytics Prescriptive Analytics What happened? Hindsight Why did it happen? Insight Concept adapted from Gartner s Webinar on Big Data What will happen? Foresight How can we make it happen? Staging for Predictive Modeling Extract, Transform, Load Data Pre-processing Feature Engineering Predictive Modeling Rule-base extraction Pairwise-similarity (Distance Computation) Model-parameter estimation Cross validation Inference/ Model Deployment Data is model? Model is data? Adaptive model? Reinforcement? 9
10 Scalable Machine Learning: Production Staging for Predictive Modeling Extract, Transform, Load Data Pre-processing Feature Engineering Predictive Modeling Rule-base extraction Pairwise-similarity (Distance Computation) Model-parameter estimation Inference/ Model Deployment Cross-validation Data is model? Model is data? Adaptive model? Reinforcement? Disk Intensive File processing and repeated retrieval best done in massively parallel file systems or databases Disk, Memory and Compute Intensive Typically computing an aggregate measure, vector product, a kernel function etc Memory + Compute Intensive Real-time requirements 10
11 Scalable Machine Learning: Bleeding Edge The Berkeley Data Analysis Stack In-database processing In-memory processing In-disk processing Storage scale-up This is tremendous progress 11
12 But Is machine learning keeping pace with the data and compute scale-up? If Yes : How so? If Not : Why not? 12
13 The Big Data Problem (Volume, Velocity, Veracity) Multi-class scaling (Number, Hierarchies) The 5 Challenges of Scalable Machine Learning Given examples of a function (x, f(x)), Predict function f(x) for new examples x Dimensionality (Variety) f 1 f 2 f d Labels 1 c 1 2 c 1 3 N Science of Data (Quality, Structure, Content, Signal-to-noise) Data Science (Infrastructure, Hardware, Software, Algorithms) c 3 c 2 c k c k 13
14 Challenge #1: Data Science Systems Data Compute Analysis Infrastructure Design Operations Management Architecture Design Operations Databases SQL NoSQL Graph Management Quality Privacy Provenance Governance Structure Matrix/ Table Text, Image, Video Graphs Sequences Spatiotemporal Schema HPC TITAN CADES Cloud Urika Hadoop Programming OpenMP/MPI CUDA/ OpenML RDF/SPARQL SQL Map-Reduce Algorithms In-database In-memory In-situ Design Scalability V&V Viz HCI Interfaces Viz-Analytics Theory Performance of algorithm dependent on architecture Most data scientists/algorithm specialists are used to in-memory tools such as R, MATLAB etc Existing cloud-based solutions are designed for high performance storage and not high-performance compute or in-memory operations Steep learning curve towards programming new innovative algorithms Too many options without guiding benchmarks 14
15 Challenge #2: Science of Data Data-science is not the same as science of data Is the process of understanding characteristics of data before applying/designing a machine-learning algorithm SNR > 1 SNR >>> 1 SNR << 1 SNR >> 1 Data characterization (Avoid using machine learning as a black box) Signal-noise-ratio, bound on noise iid sampling assumptions stationarity, randomness, ergodicity, periodicity Generating models behind data 15
16 Challenge #3: The N-d-k problem The Big Data Problem The future is unstructured Text, images, videos, sequences Algorithms and infrastructure expected to handle Big Data ie, increasing N, d and k Feature engineering and requires automation Self-feature extracting methodologies encouraged Traditional (pain staking) pipeline of SMEs creating features from the data will fail or transform into a collaborative-parallel effort Increasing N does not imply increasing information content (Samples can still be good if not better than all of the data statistically) There can be hierarchies within the N-d-k dimensions 16
17 Challenge #4: The N-d-k problem (d) Traditional algorithms assume N >> d and d > k Most tools available today scale well for increasing N Time-complexity analysis Data characteristics [Chu et al, NIPS 2007] Not so much for increasing d or k [Donoho, 2000] The curse and blessings of dimensionality Methods are emerging : Multi-task learning, Spectral Hashing etc 17
18 Challenge #5: The N-d-k problem (k) Hays et al, SUN Attribute Database: Discovering, Annotating, and Recognizing Scene Attributes, CVPR 2012 Hasegawa et al, Online Incremental Attribute-based Zeroshot Learning, CVPR 2012 Berg et al, What Does Classifying More Than 10, 000 Image categories Tell Us?, ECCV 2010 What happens when K= K + 1? (adding a new class) Engineered features may not be good enough Trained model has to relearn from the entire feature set without guarantees on accuracy 18
19 Concluding Thoughts What can be scaled up? Compute Storage Memory Cores What aspect of data that needs scale up? What aspect of algorithm that needs scale up? Dimensions of Big Data Software Volume Hadoop, MPP, Spider Archival Reports Discovery Velocity Analytical Requirements Algorithms Programming Data-Parallel Complexity of Algorithms Compute on Data MapReduce, MPI, Threads Task-parallel Linear Iterative > O(N 2 ) Streaming Batch Retrieval Machine Learning I/O? Network? Variety SQL NoSQL Graph Speed of Execution Real-time Feasibility Future We need benchmarks before me make big investments (Fox et al, 2014) 19
20 Concluding Thoughts Storage/Memory and Memory/Compute Ratios that are critical for machine learning are smaller than Storage/Compute Ratio Associative memory and cognitively-inspired architectures may prove better than the Von-Neuman store-fetch-execute paradigm May be time to redesign from scratch The machine learning algorithms that scale all use either data-parallelism or the dwarves of parallel computing in some form Encouraging because gives us an intuition to build custom hardware for learning algorithms We have done well so far by treating Analysis as a retrieval problem We can do better 20
21 Thank You Questions? 21
Machine Learning in the Big Data Era: Are We There Yet?
Machine Learning in the Big Data Era: Are We There Yet? Sreenivas R. Sukumar Computational Sciences and Engineering Division Oak Ridge National Laboratory 1 Bethel Valley Road, Oak Ridge, TN, 37831, USA
Big Data Analytics. Chances and Challenges. Volker Markl
Volker Markl Professor and Chair Database Systems and Information Management (DIMA), Technische Universität Berlin www.dima.tu-berlin.de Big Data Analytics Chances and Challenges Volker Markl DIMA BDOD
Towards a Thriving Data Economy: Open Data, Big Data, and Data Ecosystems
Towards a Thriving Data Economy: Open Data, Big Data, and Data Ecosystems Volker Markl [email protected] dima.tu-berlin.de dfki.de/web/research/iam/ bbdc.berlin Based on my 2014 Vision Paper On
BIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved
Hortonworks & SAS Analytics everywhere. Page 1 A change in focus. A shift in Advertising From mass branding A shift in Financial Services From Educated Investing A shift in Healthcare From mass treatment
Cray: Enabling Real-Time Discovery in Big Data
Cray: Enabling Real-Time Discovery in Big Data Discovery is the process of gaining valuable insights into the world around us by recognizing previously unknown relationships between occurrences, objects
Large-Scale Data Processing
Large-Scale Data Processing Eiko Yoneki [email protected] http://www.cl.cam.ac.uk/~ey204 Systems Research Group University of Cambridge Computer Laboratory 2010s: Big Data Why Big Data now? Increase
Challenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2
Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.
Mike Maxey Senior Director Product Marketing Greenplum A Division of EMC 1 Greenplum Becomes the Foundation of EMC s Big Data Analytics (July 2010) E M C A C Q U I R E S G R E E N P L U M For three years,
Big Data Technologies Compared June 2014
Big Data Technologies Compared June 2014 Agenda What is Big Data Big Data Technology Comparison Summary Other Big Data Technologies Questions 2 What is Big Data by Example The SKA Telescope is a new development
Oracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
Supercomputing and Big Data: Where are the Real Boundaries and Opportunities for Synergy?
HPC2012 Workshop Cetraro, Italy Supercomputing and Big Data: Where are the Real Boundaries and Opportunities for Synergy? Bill Blake CTO Cray, Inc. The Big Data Challenge Supercomputing minimizes data
Understanding the Value of In-Memory in the IT Landscape
February 2012 Understing the Value of In-Memory in Sponsored by QlikView Contents The Many Faces of In-Memory 1 The Meaning of In-Memory 2 The Data Analysis Value Chain Your Goals 3 Mapping Vendors to
How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning
How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume
Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum
Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Siva Ravada Senior Director of Development Oracle Spatial and MapViewer 2 Evolving Technology Platforms
Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata
Up Your R Game James Taylor, Decision Management Solutions Bill Franks, Teradata Today s Speakers James Taylor Bill Franks CEO Chief Analytics Officer Decision Management Solutions Teradata 7/28/14 3 Polling
SAS and Oracle: Big Data and Cloud Partnering Innovation Targets the Third Platform
SAS and Oracle: Big Data and Cloud Partnering Innovation Targets the Third Platform David Lawler, Oracle Senior Vice President, Product Management and Strategy Paul Kent, SAS Vice President, Big Data What
International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop
ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: [email protected]
Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics
Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics Dr. Liangxiu Han Future Networks and Distributed Systems Group (FUNDS) School of Computing, Mathematics and Digital Technology,
Introduction to Data Mining
Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:
Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014
Big Data Analytics An Introduction Oliver Fuchsberger University of Paderborn 2014 Table of Contents I. Introduction & Motivation What is Big Data Analytics? Why is it so important? II. Techniques & Solutions
The 4 Pillars of Technosoft s Big Data Practice
beyond possible Big Use End-user applications Big Analytics Visualisation tools Big Analytical tools Big management systems The 4 Pillars of Technosoft s Big Practice Overview Businesses have long managed
BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON
BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing
High-Performance Analytics
High-Performance Analytics David Pope January 2012 Principal Solutions Architect High Performance Analytics Practice Saturday, April 21, 2012 Agenda Who Is SAS / SAS Technology Evolution Current Trends
Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect
Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate
BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics
BIG DATA & ANALYTICS Transforming the business and driving revenue through big data and analytics Collection, storage and extraction of business value from data generated from a variety of sources are
ANALYTICS IN BIG DATA ERA
ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY, DISCOVER RELATIONSHIPS AND CLASSIFY HUGE AMOUNT OF DATA MAURIZIO SALUSTI SAS Copyr i g ht 2012, SAS Ins titut
ANALYTICS CENTER LEARNING PROGRAM
Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals
How To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
Data Centric Systems (DCS)
Data Centric Systems (DCS) Architecture and Solutions for High Performance Computing, Big Data and High Performance Analytics High Performance Computing with Data Centric Systems 1 Data Centric Systems
How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time
SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first
Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren
News and trends in Data Warehouse Automation, Big Data and BI Johan Hendrickx & Dirk Vermeiren Extreme Agility from Source to Analysis DWH Appliances & DWH Automation Typical Architecture 3 What Business
ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
The Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
Information Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli ([email protected])
Information Processing, Big Data, and the Cloud
Information Processing, Big Data, and the Cloud James Horey Computational Sciences & Engineering Oak Ridge National Laboratory Fall Creek Falls 2010 Information Processing Systems Model Parameters Data-intensive
Archiving and Sharing Big Data Digital Repositories, Libraries, Cloud Storage
Archiving and Sharing Big Data Digital Repositories, Libraries, Cloud Storage Cyrus Shahabi, Ph.D. Professor of Computer Science & Electrical Engineering Director, Integrated Media Systems Center (IMSC)
Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.
Collaborative Big Data Analytics 1 Big Data Is Less About Size, And More About Freedom TechCrunch!!!!!!!!! Total data: bigger than big data 451 Group Findings: Big Data Is More Extreme Than Volume Gartner!!!!!!!!!!!!!!!
Advanced Big Data Analytics with R and Hadoop
REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional
BIG DATA-AS-A-SERVICE
White Paper BIG DATA-AS-A-SERVICE What Big Data is about What service providers can do with Big Data What EMC can do to help EMC Solutions Group Abstract This white paper looks at what service providers
Find the Hidden Signal in Market Data Noise
Find the Hidden Signal in Market Data Noise Revolution Analytics Webinar, 13 March 2013 Andrie de Vries Business Services Director (Europe) @RevoAndrie [email protected] Agenda Find the Hidden
Luncheon Webinar Series May 13, 2013
Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration
The 3 questions to ask yourself about BIG DATA
The 3 questions to ask yourself about BIG DATA Do you have a big data problem? Companies looking to tackle big data problems are embarking on a journey that is full of hype, buzz, confusion, and misinformation.
Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84
Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics
Big Data Challenges in Bioinformatics
Big Data Challenges in Bioinformatics BARCELONA SUPERCOMPUTING CENTER COMPUTER SCIENCE DEPARTMENT Autonomic Systems and ebusiness Pla?orms Jordi Torres [email protected] Talk outline! We talk about Petabyte?
Steven C.H. Hoi School of Information Systems Singapore Management University Email: [email protected]
Steven C.H. Hoi School of Information Systems Singapore Management University Email: [email protected] Introduction http://stevenhoi.org/ Finance Recommender Systems Cyber Security Machine Learning Visual
A Novel Cloud Based Elastic Framework for Big Data Preprocessing
School of Systems Engineering A Novel Cloud Based Elastic Framework for Big Data Preprocessing Omer Dawelbeit and Rachel McCrindle October 21, 2014 University of Reading 2008 www.reading.ac.uk Overview
Hur hanterar vi utmaningar inom området - Big Data. Jan Östling Enterprise Technologies Intel Corporation, NER
Hur hanterar vi utmaningar inom området - Big Data Jan Östling Enterprise Technologies Intel Corporation, NER Legal Disclaimers All products, computer systems, dates, and figures specified are preliminary
Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011
Scalable Data Analysis in R Lee E. Edlefsen Chief Scientist UserR! 2011 1 Introduction Our ability to collect and store data has rapidly been outpacing our ability to analyze it We need scalable data analysis
Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data
INFO 1500 Introduction to IT Fundamentals 5. Database Systems and Managing Data Resources Learning Objectives 1. Describe how the problems of managing data resources in a traditional file environment are
Parallel Data Warehouse
MICROSOFT S ANALYTICS SOLUTIONS WITH PARALLEL DATA WAREHOUSE Parallel Data Warehouse Stefan Cronjaeger Microsoft May 2013 AGENDA PDW overview Columnstore and Big Data Business Intellignece Project Ability
Clustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012
Clustering Big Data Anil K. Jain (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012 Outline Big Data How to extract information? Data clustering
Hybrid Software Architectures for Big Data. [email protected] @hurence http://www.hurence.com
Hybrid Software Architectures for Big Data [email protected] @hurence http://www.hurence.com Headquarters : Grenoble Pure player Expert level consulting Training R&D Big Data X-data hot-line
Sense Making in an IOT World: Sensor Data Analysis with Deep Learning
Sense Making in an IOT World: Sensor Data Analysis with Deep Learning Natalia Vassilieva, PhD Senior Research Manager GTC 2016 Deep learning proof points as of today Vision Speech Text Other Search & information
Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013
Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013 James Maltby, Ph.D 1 Outline of Presentation Semantic Graph Analytics Database Architectures In-memory Semantic Database Formulation
Chapter 6. Foundations of Business Intelligence: Databases and Information Management
Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:
Let the data speak to you. Look Who s Peeking at Your Paycheck. Big Data. What is Big Data? The Artemis project: Saving preemies using Big Data
CS535 Big Data W1.A.1 CS535 BIG DATA W1.A.2 Let the data speak to you Medication Adherence Score How likely people are to take their medication, based on: How long people have lived at the same address
Bringing Big Data Modelling into the Hands of Domain Experts
Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks [email protected] 2015 The MathWorks, Inc. 1 Data is the sword of the
High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances
High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances Highlights IBM Netezza and SAS together provide appliances and analytic software solutions that help organizations improve
Big Data and Data Science: Behind the Buzz Words
Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing
Spatio-Temporal Networks:
Spatio-Temporal Networks: Analyzing Change Across Time and Place WHITE PAPER By: Jeremy Peters, Principal Consultant, Digital Commerce Professional Services, Pitney Bowes ABSTRACT ORGANIZATIONS ARE GENERATING
How To Learn To Use Big Data
Information Technologies Programs Big Data Specialized Studies Accelerate Your Career extension.uci.edu/bigdata Offered in partnership with University of California, Irvine Extension s professional certificate
The? Data: Introduction and Future
The? Data: Introduction and Future Husnu Sensoy Global Maksimum Data & Information Technologies Global Maksimum Data & Information Technologies The Data Company Massive Data Unstructured Data Insight Information
Industry 4.0 and Big Data
Industry 4.0 and Big Data Marek Obitko, [email protected] Senior Research Engineer 03/25/2015 PUBLIC PUBLIC - 5058-CO900H 2 Background Joint work with Czech Institute of Informatics, Robotics and
The Internet of Things and Big Data: Intro
The Internet of Things and Big Data: Intro John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 1 What This Is; What This Is Not It s not specific to IoT It s not about any specific
High Performance Predictive Analytics in R and Hadoop:
High Performance Predictive Analytics in R and Hadoop: Achieving Big Data Big Analytics Presented by: Mario E. Inchiosa, Ph.D. US Chief Scientist August 27, 2013 1 Polling Questions 1 & 2 2 Agenda Revolution
Internet of Things. Opportunity Challenges Solutions
Internet of Things Opportunity Challenges Solutions Copyright 2014 Boeing. All rights reserved. GPDIS_2015.ppt 1 ANALYZING INTERNET OF THINGS USING BIG DATA ECOSYSTEM Internet of Things matter for... Industrial
Reference Architecture, Requirements, Gaps, Roles
Reference Architecture, Requirements, Gaps, Roles The contents of this document are an excerpt from the brainstorming document M0014. The purpose is to show how a detailed Big Data Reference Architecture
Big Data. Lyle Ungar, University of Pennsylvania
Big Data Big data will become a key basis of competition, underpinning new waves of productivity growth, innovation, and consumer surplus. McKinsey Data Scientist: The Sexiest Job of the 21st Century -
Big-Data Computing with Smart Clouds and IoT Sensing
A New Book from Wiley Publisher to appear in late 2016 or early 2017 Big-Data Computing with Smart Clouds and IoT Sensing Kai Hwang, University of Southern California, USA Min Chen, Huazhong University
Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies
Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies Big Data: Global Digital Data Growth Growing leaps and bounds by 40+% Year over Year! 2009 =.8 Zetabytes =.08
Data management challenges in todays Healthcare and Life Sciences ecosystems
Data management challenges in todays Healthcare and Life Sciences ecosystems Jose L. Alvarez Principal Engineer, WW Director Life Sciences [email protected] Evolution of Data Sets in Healthcare
Oracle Big Data Building A Big Data Management System
Oracle Big Building A Big Management System Copyright 2015, Oracle and/or its affiliates. All rights reserved. Effi Psychogiou ECEMEA Big Product Director May, 2015 Safe Harbor Statement The following
Knowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs [email protected] Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
How To Use Hp Vertica Ondemand
Data sheet HP Vertica OnDemand Enterprise-class Big Data analytics in the cloud Enterprise-class Big Data analytics for any size organization Vertica OnDemand Organizations today are experiencing a greater
Teradata s Big Data Technology Strategy & Roadmap
Teradata s Big Data Technology Strategy & Roadmap Artur Borycki, Director International Solutions Marketing 18 March 2014 Agenda > Introduction and level-set > Enabling the Logical Data Warehouse > Any
Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing
Evaluating NoSQL for Enterprise Applications Dirk Bartels VP Strategy & Marketing Agenda The Real Time Enterprise The Data Gold Rush Managing The Data Tsunami Analytics and Data Case Studies Where to go
Advanced In-Database Analytics
Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??
Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum
Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All
From Spark to Ignition:
From Spark to Ignition: Fueling Your Business on Real-Time Analytics Eric Frenkiel, MemSQL CEO June 29, 2015 San Francisco, CA What s in Store For This Presentation? 1. MemSQL: A real-time database for
From Raw Data to. Actionable Insights with. MATLAB Analytics. Learn more. Develop predictive models. 1Access and explore data
100 001 010 111 From Raw Data to 10011100 Actionable Insights with 00100111 MATLAB Analytics 01011100 11100001 1 Access and Explore Data For scientists the problem is not a lack of available but a deluge.
RevoScaleR Speed and Scalability
EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution
Fast Analytics on Big Data with H20
Fast Analytics on Big Data with H20 0xdata.com, h2o.ai Tomas Nykodym, Petr Maj Team About H2O and 0xdata H2O is a platform for distributed in memory predictive analytics and machine learning Pure Java,
Harnessing the power of advanced analytics with IBM Netezza
IBM Software Information Management White Paper Harnessing the power of advanced analytics with IBM Netezza How an appliance approach simplifies the use of advanced analytics Harnessing the power of advanced
Big Data for Big Value @ Intel
Big Data for Big Value @ Intel Moty Fania, PE Big data Analytics Assaf Araki, Sr. Arch. Big data Analytics Advanced Analytics team @ Intel IT Corporate ownership of advanced analytics Team charter Solve
Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance
Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance IBM s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice
Big Data solutions to support Intelligent Systems and Applications
Big solutions to support Intelligent Systems and Applications Luciana Lima, Filipe Portela, Manuel Filipe Santos, António Abelha and José Machado. Abstract in the last years the number of data available
COMP9321 Web Application Engineering
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411
