Data Scientist: From Mathematics to data management
|
|
- Scarlett Lucas
- 8 years ago
- Views:
Transcription
1 Data Scientist: From Mathematics to data management Frederic Precioso 06/07/2015 Professor at University Nice Sophia Antipolis (UNS) Laboratory I3S Joint Research Unit from CNRS & UNS (UMR 7271) Team Scalable and Pervasive software and Knowledge Systems (SPARKS)
2 Outlook 1.Data scientist: the sexiest job of the 21st century? 2.Data scientist future profile 3.Data science in academic research 4.Future These slides are partially based on (Big) Data (Science) Skills by Oscar Corcho 2
3 Data Scientist: The Sexiest Job of the 21st Century? October 2012: the Harvard Business Review published the article "Data Scientist: The Sexiest Job of the 21st Century" in its issue "Getting control of Big Data". Since then a lot of work has been done to draw the conclusion that there are actually more than one data scientist profile. 3
4 Analyzing the Analyzers An Introspective Survey of Data Scientists and Their Work (June 2013) 4
5 Analyzing the Analyzers An Introspective Survey of Data Scientists and Their Work (June 2013) Based on the survey data of several hundred data science professionals, the authors applying data science algorithms found that data scientists could be clustered into 4 subgroups, each with a different mix of skillsets: Data Businessperson Data Creatives Data Developers Data Researchers 5
6 Analyzing the Analyzers An Introspective Survey of Data Scientists and Their Work (June 2013) ML = Machine Learning OR = Operations Research 6
7 Analyzing the Analyzers An Introspective Survey of Data Scientists and Their Work (June 2013) From their answers, the data scientists see themselves as T-shape experts. 7
8 More recently 8
9 Big Data Species 1. HPC and e-infrastructure Experts Background: Computer Science (Systems) System Administration Terms used in their native language: Blades, Infiniband, OpenMPI, racks, HDF, TBs, Gflops Their daily life: Check system logs Make sure that queues are active Install a new rack What s Big Data for them? A commercial term for something that they have done for a long time They really know how to configure and monitor a Hadoop cluster They would love seeing those talking about Big Data executing processes on fluid dynamics [source Oscar Corcho] 9
10 Big Data Species 2. Data Storage and Access Experts Background: Computer Science Database administration Terms used in their native language: SQL, NoSQL, Column store Transactions, Hive, TBs/PBs/, TPS (Transactions per s) Their daily life: Optimize several queries Run a new benchmark Design an optimizer/physical operator What s Big Data for them? A new opportunity to work on optimization algorithms They know how to configure a database They often laugh at those who deploy a NoSQL solution for a problem that can be solved with a relational database [source Oscar Corcho] 10
11 Big Data Species 3. Machine Learning Experts Background: Mathematics, Statistics, Physics, Computer Science Terms used in their native language: Complexity, algorithm, p-value, convergence, precision, recall ROC curves, Bayesian networks, R Their daily life: Read about a new problem Write down a few formulae in the whiteboard (even blackboards) Prove that the algorithm terminates What s Big Data for them? The same problems applied to data of larger size, with new challenges Problems are not only solved in Hadoop or a powerful NoSQL DB Astonished by those who still mix up correlation and causality [source Oscar Corcho] 11
12 Big Data Species 4. Slow-data Experts Background: Computer Science, Statistics, Library Sciences, Linguistics Terms used in their native language: Information model, vocabulary, ontology, data quality, curation Their daily life: Receive a database schema Talk to data producers and (re)users Obtain consensus and transform data What s Big Data for them? The difficulty lies on the variety of data formats and structures We may integrate data from varied sources, although this is not always possible When you manage to integrate heterogeneous data, you can achieve better results [source Oscar Corcho] 12
13 Big Data Species 5. (Big Data) Consultants Background: Computer Science, Economy, Terms used in their native language: Business model, business opportunity, Big Data, Data Value Chain, Hadoop, Spark, R, TBs, GFlops Their daily life: Read a Gartner Big Data report Talk to potential customers Transfer needs to technicians What s Big Data for them? It s the 4Vs, plus a few more I have a PPT presentation with a Big Data infrastructure, architecture, and previous projects, which I will use to sell a project to my customers [source Oscar Corcho] 13
14 BigData Ecosystem Visualization Dashboard (Kibana / Datameer) Maps (InstantAtlas, Leaflet, CartoDB ) Charts (GoogleCharts, Charts.js ) D3.js / Tableau / Flame Analysis Machine Learning (Scikit Learn, Mahout, Spark) Search / retrieval (Elastic Search, Solr) Storage / Access / Exploitation File System (HDFS, GGFS, Cassandra ) Access (Hadoop / Spark / Both, Sqoop) Databases / Indexing (SQL / NoSQL / Both, MongoDB, HBase, Infinispan) Exploit (LogStash, Flume ) Infrastructures Grid Computing / HPC Cloud / Virtualization 14
15 Intermediate Conclusions We all know that there are big opportunities in Big Data But we need to be more productive. For that we need: Understand that simply by using Hadoop, Spark or R we are not necessarily doing Big Data The same as by coding in Java we are not necessarily understanding object-oriented programming Understand that we have to interpret results adequately, from a scientific point of view Understand the importance of homogenizing datasets, in order to facilitate their integration (slow-data) Create real multidisciplinary teams [source Oscar Corcho] 15
16 Outlook 1.Data scientist: the sexiest job of the 21st century? 2.Data scientist future profile 3.Data science in academic research 4.Future These slides are partially based on (Big) Data (Science) Skills by Oscar Corcho 16
17 Future Profile: multidisciplinary Alex Szalay s T-shaped vs Pi-shaped Drew Conway's Data Science Venn Diagram Jim Gray's idea of the "Fourth Paradigm" of scientific discovery Volker Markl: Data Scientist Jack of All Trades! 17
18 Future Profile: multidisciplinary A recent report (in French) *, leads to the same conclusion: «The consensus nowadays is to define the data scientist at the intersection of three areas of expertise: (i) Computer Science, (ii) Statistics and Mathematics, and (iii) Business knowledge. ( ) Depending on the training program, one will most probably receive training with major either in Computer Science, in Statistics or Business knowledge.» * Serge Abiteboul, François Bancilhon, François Bourdoncle, Stephan Clemencon, Colin De La Higuera, et al.. L émergence d'une nouvelle filière de formation : " data scientists ". [Interne] INRIA Saclay <hal > 18
19 Outlook 1.Data scientist: the sexiest job of the 21st century? 2.Data scientist future profile 3.Data science in academic research 4.Future 19
20 BigData Academic Research Visualization R R Analysis R R R Storage / Access / Exploitation Infrastructures R R 20
21 What BigData Academic Research means? Push the limits of existing approaches or design new ones even if it is risky or (very) difficult Demonstrate that contributions are theoretically sound Compare to others through participating to challenges or at least on BigData benchmarks Complexity and scalability are always better when they can be proven 21
22 2 success stories of Machine learning among many Classification: How to separate the data? Machine Error Real (Algorithm) Error Empirical (Algorithm)+Capacity(Algorithm) 22
23 2 success stories of Machine learning among many Classification: How to separate the data? Error Real (Algorithm) Error Empirical (Algorithm)+Capacity(Algorithm) Boosting Machine Random Forests 23
24 Ideas of boosting: Football Bets If Varane and Sakho play together, French Football team wins. If Ntep is not injured, French Football team wins. If Benzema is substitued before the end, French Football team loses. If Pogba is happy, French Football team wins. From Antoine Cornuéjols Lecture slides 24
25 How to win? Ask to professional gamblers Lets assume: That professional gamblers can provide one single decision rule simple and relevant But that face to several games, they can always provide decision rules a little bit better than random Can we become rich? From Antoine Cornuéjols Lecture slides 25
26 Idea Ask heuristics to the expert Gather a set of cases for which these heuristics fail (difficult cases) Ask again the expert to provide heuristics for the difficult cases And so one Combine these heuristics expert stands for weak learner From Antoine Cornuéjols Lecture slides 26
27 Questions How to choose games (i.e. learning examples) at each step? Focus on games (examples) the most difficult (the ones on which previous heuristics are the less relevant) How to merge heuristics (decision rules) into one single decision rule? Take a weighted vote of all decision rules From Antoine Cornuéjols Lecture slides 27
28 Boosting boosting = general method to convert several poor decision rules into one very powerful decision rule More precisely: Let have a weak learner which can always provide a decision rule (even just little) better than random, A boosting algorithm can build (theoretically) a global decision rule with an error rate as low as desired. A theorem of Schapire on weak learning power proves that H gets a higher relevance than a global decision rule which would have been learnt directly on all training examples. From Antoine Cornuéjols Lecture slides 28
29 Probabilistic boosting: AdaBoost The standard algorithm is AdaBoost (Adaptive Boosting). 3 main ideas to generalize towards probabilistic boosting: 1. A set of specialized experts and ask them to vote to take a decision. 2. Adaptive weighting of votes by multiplicative update. 3. Modifying example distribution to train each expert, increasing the weights iteratively of examples misclassified at previous iteration. From Antoine Cornuéjols Lecture slides 29
30 AdaBoost: the algorithm A training set: S = {(x 1,y 1 ),,(x m,y m )} y i {-1,+1} label (annotation) of example x i S A set of weak learners {h t } For t = 0,,T: Give a weight to every sample in {1,,m} regarding its difficulty to be well classified by h t-1 : D t Find the weak decision ( heuristic ): h t : S {-1,+1} with the smallest error ε t on D t : εt = Pr D[ h( ) ] ( ) t t x y D i = i i t Compute the influence/impact of h t ih : t ( x ) i y i Final decision H final = a majority weighted vote of all the h t 30
31 Error of generalization for AdaBoost Error of generalization of H can be bounded by: E Real ( H ) = E ( H ) + Ο T Empirical T m T. d Error Iterations where T is the number of boosting iterations m the number of training examples d the dimension of H T space ( weaks learner complexity ) 31
32 The Task of Face Detection Many slides adapted from P. Viola 32
33 Basic Idea Slide a window across image and evaluate a face model at every location. 33
34 Image Features Feature Value = (Pixel in white area) (Pixel in black area) if < 29 1 if < 26 1 if > 11 h1 ( ) = h2 ( ) = h3 ( ) = 0 otherwise 0 otherwise 0 otherwise 34
35 AdaBoost Cascade Principle AdaBoost AdaBoost 1 Face x 99% 2 Face x 98% Non Face x 30% Non Face x 9% N Non Face x 70% Non Face x 21% Face x 90% Non Face x % 35
36 The Implemented System Training Data 5000 faces All frontal, rescaled to 24x24 pixels 300 million non-faces sub-windows 9500 non-face images Faces are normalized Scale, translation Many variations Across individuals Illumination Pose 36
37 Results Fixed images Video sequence Frontal face Left profile face Right profile face 37
38 Extension Fast and robust Other descriptors Other cascades (rotation ) Eye detection, Hand detection, Body detection 38
39 2 success stories of Machine learning among many Classification: How to separate the data? Error Real (Algorithm) Error Empirical (Algorithm)+Capacity(Algorithm) Boosting Machine Random Forests 39
40 Decision tree to decide playing tennis or not Objective 2 classes: yes & no Prediction if a game will be played or not Temperature will be easily converted into numerical I.H. Witten and E. Frank, Data Mining, Morgan Kaufmann Pub.,
41 Decision tree to decide playing tennis or not Class: NO Class:YES Class: YES 41
42 Final decision tree 42
43 Decision trees do not converge? Make a forest 43
44 Error of generalization for Random Forest Error of generalization of RF can be bounded by: E Real ( RF ) 2 2 ρ(1 s ) s where ρ is the mean correlation between two decision trees s is the quality of prediction of the set of decision trees 44
45 Success story: Kinect From Real-Time Human Pose Recognition in Parts from a Single Depth Image, Jamie Shotton, Andrew Fitzgibbon, 45 Mat Cook, Toby Sharp, Mark Finocchio, Richard Moore, Alex Kipman, and Andrew Blake at CVPR June 2011.
46 Success story: Kinect 46
47 Other success stories Support Vector Machines E Real ( SVM ) = ( SVM ) E Empirical d ln m: the number of training examples d: the dimension of decision space Bound valid with probability 1 - α + 2m d α + 1 ln 4 m Artificial Neural Network and Deep Learning 47
48 Outlook 1.Data scientist: the sexiest job of the 21st century? 2.Data scientist future profile 3.Data science in academic research 4.Future 48
49 Future trainees Before considering applying a method or a technology, be sure that original conditions are verified When a method is extended out of its domain of validity, intend to prove the mathematical consistency / stability of the new method Demonstrate or at least provide insights of its complexity and scalability In the very next years, new students will come out with a more global vision of data science challenges, a deep understanding of involved layers and a better knowledge of powerful techniques. 49
50 I will be glad to answer to any question Frederic Precioso 06/07/2015 Professor at University Nice Sophia Antipolis (UNS) Laboratory I3S Joint Research Unit from CNRS & UNS (UMR 7271) Team Scalable and Pervasive software and Knowledge Systems (SPARKS)
Robust Real-Time Face Detection
Robust Real-Time Face Detection International Journal of Computer Vision 57(2), 137 154, 2004 Paul Viola, Michael Jones 授 課 教 授 : 林 信 志 博 士 報 告 者 : 林 宸 宇 報 告 日 期 :96.12.18 Outline Introduction The Boost
More informationIntroduction to Big Data! with Apache Spark" UC#BERKELEY#
Introduction to Big Data! with Apache Spark" UC#BERKELEY# So What is Data Science?" Doing Data Science" Data Preparation" Roles" This Lecture" What is Data Science?" Data Science aims to derive knowledge!
More informationEnsemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification
More informationCollaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.
Collaborative Big Data Analytics 1 Big Data Is Less About Size, And More About Freedom TechCrunch!!!!!!!!! Total data: bigger than big data 451 Group Findings: Big Data Is More Extreme Than Volume Gartner!!!!!!!!!!!!!!!
More informationWROX Certified Big Data Analyst Program by AnalytixLabs and Wiley
WROX Certified Big Data Analyst Program by AnalytixLabs and Wiley Disclaimer: This material is protected under copyright act AnalytixLabs, 2011. Unauthorized use and/ or duplication of this material or
More informationLocal features and matching. Image classification & object localization
Overview Instance level search Local features and matching Efficient visual recognition Image classification & object localization Category recognition Image classification: assigning a class label to
More informationAzure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
More informationThe Internet of Things and Big Data: Intro
The Internet of Things and Big Data: Intro John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 1 What This Is; What This Is Not It s not specific to IoT It s not about any specific
More informationBIG DATA & DATA SCIENCE
BIG DATA & DATA SCIENCE ACADEMY PROGRAMS IN-COMPANY TRAINING PORTFOLIO 2 TRAINING PORTFOLIO 2016 Synergic Academy Solutions BIG DATA FOR LEADING BUSINESS Big data promises a significant shift in the way
More informationBig Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies
Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies Big Data: Global Digital Data Growth Growing leaps and bounds by 40+% Year over Year! 2009 =.8 Zetabytes =.08
More informationIntegrating a Big Data Platform into Government:
Integrating a Big Data Platform into Government: Drive Better Decisions for Policy and Program Outcomes John Haddad, Senior Director Product Marketing, Informatica Digital Government Institute s Government
More informationBIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON
BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing
More informationLearning outcomes. Knowledge and understanding. Competence and skills
Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges
More informationData Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
More informationSURVEY REPORT DATA SCIENCE SOCIETY 2014
SURVEY REPORT DATA SCIENCE SOCIETY 2014 TABLE OF CONTENTS Contents About the Initiative 1 Report Summary 2 Participants Info 3 Participants Expertise 6 Suggested Discussion Topics 7 Selected Responses
More informationBig Data & Security. Aljosa Pasic 12/02/2015
Big Data & Security Aljosa Pasic 12/02/2015 Welcome to Madrid!!! Big Data AND security: what is there on our minds? Big Data tools and technologies Big Data T&T chain and security/privacy concern mappings
More informationData Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
More informationBig Data Analytics. Copyright 2011 EMC Corporation. All rights reserved.
Big Data Analytics 1 Priority Discussion Topics What are the most compelling business drivers behind big data analytics? Do you have or expect to have data scientists on your staff, and what will be their
More informationBig Data and Data Science: Behind the Buzz Words
Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing
More informationPractical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
More informationANALYTICS CENTER LEARNING PROGRAM
Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals
More informationBig Data Analytics and Optimization
Big Data Analytics and Optimization C e r t i f i c a t e P r o g r a m i n E n g i n e e r i n g E x c e l l e n c e e.edu.in http://www.insof LIST OF COURSES Essential Business Skills for a Data Scientist...
More informationESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
More informationIntroduction to Data Mining
Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association
More informationHDP Hadoop From concept to deployment.
HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some
More informationBITKOM& NIK - Big Data Wo liegen die Chancen für den Mittelstand?
BITKOM& NIK - Big Data Wo liegen die Chancen für den Mittelstand? The Big Data Buzz big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database
More informationThe Visual Internet of Things System Based on Depth Camera
The Visual Internet of Things System Based on Depth Camera Xucong Zhang 1, Xiaoyun Wang and Yingmin Jia Abstract The Visual Internet of Things is an important part of information technology. It is proposed
More informationHas been into training Big Data Hadoop and MongoDB from more than a year now
NAME NAMIT EXECUTIVE SUMMARY EXPERTISE DELIVERIES Around 10+ years of experience on Big Data Technologies such as Hadoop and MongoDB, Java, Python, Big Data Analytics, System Integration and Consulting
More informationActive Learning with Boosting for Spam Detection
Active Learning with Boosting for Spam Detection Nikhila Arkalgud Last update: March 22, 2008 Active Learning with Boosting for Spam Detection Last update: March 22, 2008 1 / 38 Outline 1 Spam Filters
More informationKnowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
More informationSome Research Challenges for Big Data Analytics of Intelligent Security
Some Research Challenges for Big Data Analytics of Intelligent Security Yuh-Jong Hu hu at cs.nccu.edu.tw Emerging Network Technology (ENT) Lab. Department of Computer Science National Chengchi University,
More informationHadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
More informationThe? Data: Introduction and Future
The? Data: Introduction and Future Husnu Sensoy Global Maksimum Data & Information Technologies Global Maksimum Data & Information Technologies The Data Company Massive Data Unstructured Data Insight Information
More informationReference Architecture, Requirements, Gaps, Roles
Reference Architecture, Requirements, Gaps, Roles The contents of this document are an excerpt from the brainstorming document M0014. The purpose is to show how a detailed Big Data Reference Architecture
More informationGovernment of Russian Federation. Faculty of Computer Science School of Data Analysis and Artificial Intelligence
Government of Russian Federation Federal State Autonomous Educational Institution of High Professional Education National Research University «Higher School of Economics» Faculty of Computer Science School
More informationExperimentation on Cloud Databases to Handle Genomic Big Data
Experimentation on Cloud Databases to Handle Genomic Big Data Presented by: Abraham Gómez, M.Sc., B.Sc. Academic Advisor: Alain April. Ph.D,M.Sc.A, B.A. abraham-segundo.gomez.1@ens.etsmtl.ca Agenda 1 2
More informationBig Data. Lyle Ungar, University of Pennsylvania
Big Data Big data will become a key basis of competition, underpinning new waves of productivity growth, innovation, and consumer surplus. McKinsey Data Scientist: The Sexiest Job of the 21st Century -
More informationClient Overview. Engagement Situation. Key Requirements
Client Overview Our client is one of the leading providers of business intelligence systems for customers especially in BFSI space that needs intensive data analysis of huge amounts of data for their decision
More informationData Quality Mining: Employing Classifiers for Assuring consistent Datasets
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, fabian.gruening@informatik.uni-oldenburg.de Abstract: Independent
More informationIntroduction to Big Data Training
Introduction to Big Data Training The quickest way to be introduce with NOSQL/BIG DATA offerings Learn and experience Big Data Solutions including Hadoop HDFS, Map Reduce, NoSQL DBs: Document Based DB
More informationOutline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging
Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging
More informationAdvanced In-Database Analytics
Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??
More informationNAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE. Venu Govindaraju
NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE Venu Govindaraju BIOMETRICS DOCUMENT ANALYSIS PATTERN RECOGNITION 8/24/2015 ICDAR- 2015 2 Towards a Globally Optimal Approach for Learning Deep Unsupervised
More informationA Study Of Bagging And Boosting Approaches To Develop Meta-Classifier
A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet-524121,
More informationCloud Big Data Architectures
Cloud Big Data Architectures Lynn Langit QCon Sao Paulo, Brazil 2016 About this Workshop Real-world Cloud Scenarios w/aws, Azure and GCP 1. Big Data Solution Types 2. Data Pipelines 3. ETL and Visualization
More informationThe 4 Pillars of Technosoft s Big Data Practice
beyond possible Big Use End-user applications Big Analytics Visualisation tools Big Analytical tools Big management systems The 4 Pillars of Technosoft s Big Practice Overview Businesses have long managed
More informationMonitis Project Proposals for AUA. September 2014, Yerevan, Armenia
Monitis Project Proposals for AUA September 2014, Yerevan, Armenia Distributed Log Collecting and Analysing Platform Project Specifications Category: Big Data and NoSQL Software Requirements: Apache Hadoop
More informationBig Data and Analytics: Challenges and Opportunities
Big Data and Analytics: Challenges and Opportunities Dr. Amin Beheshti Lecturer and Senior Research Associate University of New South Wales, Australia (Service Oriented Computing Group, CSE) Talk: Sharif
More informationSo What s the Big Deal?
So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data
More informationBuilding Your Big Data Team
Building Your Big Data Team With all the buzz around Big Data, many companies have decided they need some sort of Big Data initiative in place to stay current with modern data management requirements.
More informationConsulting and Systems Integration (1) Networks & Cloud Integration Engineer
Ericsson is a world-leading provider of telecommunications equipment & services to mobile & fixed network operators. Over 1,000 networks in more than 180 countries use Ericsson equipment, & more than 40
More informationCS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing
CS Master Level Courses and Areas The graduate courses offered may change over time, in response to new developments in computer science and the interests of faculty and students; the list of graduate
More informationBig Data Cloud Services
Big Data Cloud Services G-Cloud IV Service Definition Lot 4 - SCS Contact us: Danielle Pratt Email: G-Cloud@esynergy-solutions.co.uk About is a leading provider of IT Consultancy Services operating within
More informationMachine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
More informationOpenChorus: Building a Tool-Chest for Big Data Science
OpenChorus: Building a Tool-Chest for Big Data Science Milind Bhandarkar Chief Scientist, Machine Learning Platforms EMC Greenplum 1 Agenda! Tools for Data Science! Data Science Workflow! Greenplum OpenChorus!
More informationCI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.
CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes
More informationConstructing a Data Lake: Hadoop and Oracle Database United!
Constructing a Data Lake: Hadoop and Oracle Database United! Sharon Sophia Stephen Big Data PreSales Consultant February 21, 2015 Safe Harbor The following is intended to outline our general product direction.
More informationPOSTGRAD PLACEMENTS. Placements are an integral part of the Masters programmes, so international students will not require additional work visas.
POSTGRAD PLACEMENTS COMPUTATIONAL FINANCE DATA SCIENCE AND ANALYTICS MACHINE LEARNING KEY INFORMATION Placements can start in the middle of June 2015 or later and must finish by the middle of June 2016
More informationCertificate Program in Applied Big Data Analytics in Dubai. A Collaborative Program offered by INSOFE and Synergy-BI
Certificate Program in Applied Big Data Analytics in Dubai A Collaborative Program offered by INSOFE and Synergy-BI Program Overview Today s manager needs to be extremely data savvy. They need to work
More informationMoving From Hadoop to Spark
+ Moving From Hadoop to Spark Sujee Maniyam Founder / Principal @ www.elephantscale.com sujee@elephantscale.com Bay Area ACM meetup (2015-02-23) + HI, Featured in Hadoop Weekly #109 + About Me : Sujee
More informationChapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
More informationWhat is Data Science? Data, Databases, and the Extraction of Knowledge Renée T., @becomingdatasci, November 2014
What is Data Science? { Data, Databases, and the Extraction of Knowledge Renée T., @becomingdatasci, November 2014 Let s start with: What is Data? http://upload.wikimedia.org/wikipedia/commons/f/f0/darpa
More informationBig Data Explained. An introduction to Big Data Science.
Big Data Explained An introduction to Big Data Science. 1 Presentation Agenda What is Big Data Why learn Big Data Who is it for How to start learning Big Data When to learn it Objective and Benefits of
More informationPromises and Pitfalls of Big-Data-Predictive Analytics: Best Practices and Trends
Promises and Pitfalls of Big-Data-Predictive Analytics: Best Practices and Trends Spring 2015 Thomas Hill, Ph.D. VP Analytic Solutions Dell Statistica Overview and Agenda Dell Software overview Dell in
More informationQUICK FACTS. Delivering a Unified Data Architecture for Sony Computer Entertainment America TEKSYSTEMS GLOBAL SERVICES CUSTOMER SUCCESS STORIES
[ Consumer goods, Data Services ] TEKSYSTEMS GLOBAL SERVICES CUSTOMER SUCCESS STORIES QUICK FACTS Objectives Develop a unified data architecture for capturing Sony Computer Entertainment America s (SCEA)
More informationHow To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
More informationChukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84
Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics
More informationAn interdisciplinary model for analytics education
An interdisciplinary model for analytics education Raffaella Settimi, PhD School of Computing, DePaul University Drew Conway s Data Science Venn Diagram http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
More informationData Analytics and Business Intelligence (8696/8697)
http: // togaware. com Copyright 2014, Graham.Williams@togaware.com 1/36 Data Analytics and Business Intelligence (8696/8697) Ensemble Decision Trees Graham.Williams@togaware.com Data Scientist Australian
More informationTensor Methods for Machine Learning, Computer Vision, and Computer Graphics
Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University
More informationCOMP9321 Web Application Engineering
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411
More informationBIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
More informationBig Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect
Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate
More informationChallenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2
More informationThe Future of Data Management
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class
More informationBIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &
BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research & Innovation 04-08-2011 to the EC 8 th February, Luxembourg Your Atos business Research technologists. and Innovation
More informationHadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
More informationData Science at U of U
Data Science at U of U Je M. Phillips Assistant Professor, School of Computing Center for Extreme Data Management, Analysis, and Visualization Director, Data Management and Analysis Track University of
More informationLeveraging Big Data Technologies to Support Research in Unstructured Data Analytics
Leveraging Big Data Technologies to Support Research in Unstructured Data Analytics BY FRANÇOYS LABONTÉ GENERAL MANAGER JUNE 16, 2015 Principal partenaire financier WWW.CRIM.CA ABOUT CRIM Applied research
More informationTransforming the Telecoms Business using Big Data and Analytics
Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe
More informationSpark: Cluster Computing with Working Sets
Spark: Cluster Computing with Working Sets Outline Why? Mesos Resilient Distributed Dataset Spark & Scala Examples Uses Why? MapReduce deficiencies: Standard Dataflows are Acyclic Prevents Iterative Jobs
More informationMonday Morning Data Mining
Monday Morning Data Mining Tim Ruhe Statistische Methoden der Datenanalyse Outline: - data mining - IceCube - Data mining in IceCube Computer Scientists are different... Fakultät Physik Fakultät Physik
More informationBig Data Analytics Platform @ Nokia
Big Data Analytics Platform @ Nokia 1 Selecting the Right Tool for the Right Workload Yekesa Kosuru Nokia Location & Commerce Strata + Hadoop World NY - Oct 25, 2012 Agenda Big Data Analytics Platform
More informationModel Combination. 24 Novembre 2009
Model Combination 24 Novembre 2009 Datamining 1 2009-2010 Plan 1 Principles of model combination 2 Resampling methods Bagging Random Forests Boosting 3 Hybrid methods Stacking Generic algorithm for mulistrategy
More informationUpcoming Announcements
Enterprise Hadoop Enterprise Hadoop Jeff Markham Technical Director, APAC jmarkham@hortonworks.com Page 1 Upcoming Announcements April 2 Hortonworks Platform 2.1 A continued focus on innovation within
More informationOracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
More informationAnalysis of Big Data Survey 2015 on Skills, Training and Capacity Building
Analysis of Big Data Survey 2015 on Skills, Training and Capacity Building D R A F T Version 1.0 12 Oct 2015 By UN Global Working Group on Big Data for Official Statistics Task Team on Skills, Training
More informationArchitectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase
Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform
More informationHuman Pose Estimation from RGB Input Using Synthetic Training Data
Human Pose Estimation from RGB Input Using Synthetic Training Data Oscar Danielsson and Omid Aghazadeh School of Computer Science and Communication KTH, Stockholm, Sweden {osda02, omida}@kth.se arxiv:1405.1213v2
More informationTraining for Big Data
Training for Big Data Learnings from the CATS Workshop Raghu Ramakrishnan Technical Fellow, Microsoft Head, Big Data Engineering Head, Cloud Information Services Lab Store any kind of data What is Big
More informationThe Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
More informationMACHINE LEARNING BASICS WITH R
MACHINE LEARNING [Hands-on Introduction of Supervised Machine Learning Methods] DURATION 2 DAY The field of machine learning is concerned with the question of how to construct computer programs that automatically
More informationHigh Productivity Data Processing Analytics Methods with Applications
High Productivity Data Processing Analytics Methods with Applications Dr. Ing. Morris Riedel et al. Adjunct Associate Professor School of Engineering and Natural Sciences, University of Iceland Research
More informationCOMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction
COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised
More informationSense Making in an IOT World: Sensor Data Analysis with Deep Learning
Sense Making in an IOT World: Sensor Data Analysis with Deep Learning Natalia Vassilieva, PhD Senior Research Manager GTC 2016 Deep learning proof points as of today Vision Speech Text Other Search & information
More informationMassive Labeled Solar Image Data Benchmarks for Automated Feature Recognition
Massive Labeled Solar Image Data Benchmarks for Automated Feature Recognition Michael A. Schuh1, Rafal A. Angryk2 1 Montana State University, Bozeman, MT 2 Georgia State University, Atlanta, GA Introduction
More informationIndustry 4.0 and Big Data
Industry 4.0 and Big Data Marek Obitko, mobitko@ra.rockwell.com Senior Research Engineer 03/25/2015 PUBLIC PUBLIC - 5058-CO900H 2 Background Joint work with Czech Institute of Informatics, Robotics and
More informationBig Data Analytics and Optimization
Big Data Analytics and Optimization C e r t i f i c a t e P r o g r a m i n E n g i n e e r i n g E x c e l l e n c e C e r t i f i c a t e P r o g r a m s i n A c c e l e r a t e d E n g i n e e r i n
More informationlocuz.com Big Data Services
locuz.com Big Data Services Big Data At Locuz, we help the enterprise move from being a data-limited to a data-driven one, thereby enabling smarter, faster decisions that result in better business outcome.
More information