SURVEY REPORT DATA SCIENCE SOCIETY 2014

Similar documents
Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Big Data Executive Survey

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

EMC Greenplum Driving the Future of Data Warehousing and Analytics. Tools and Technologies for Big Data

Statistics for BIG data

Bachelor Degree in Informatics Engineering Master courses

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

Advanced In-Database Analytics

Data Mining in the Swamp

2015 Analyst and Advisor Summit. Advanced Data Analytics Dr. Rod Fontecilla Vice President, Application Services, Chief Data Scientist

Sunnie Chung. Cleveland State University

High-Performance Analytics

How To Become A Data Scientist

ANALYTICS CENTER LEARNING PROGRAM

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

Promises and Pitfalls of Big-Data-Predictive Analytics: Best Practices and Trends

MS1b Statistical Data Mining

Software Development Training Camp 1 (0-3) Prerequisite : Program development skill enhancement camp, at least 48 person-hours.

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank

Big Data, Physics, and the Industrial Internet! How Modeling & Analytics are Making the World Work Better."

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

BIG DATA What it is and how to use?

SAP Predictive Analytics: An Overview and Roadmap. Charles Gadalla, SESSION CODE: 603

The 4 Pillars of Technosoft s Big Data Practice

Challenges for Data Driven Systems

Customized Report- Big Data

What is Customer Relationship Management? Customer Relationship Management Analytics. Customer Life Cycle. Objectives of CRM. Three Types of CRM

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

Big Data and Marketing

Why is Internal Audit so Hard?

Understanding Your Customer Journey by Extending Adobe Analytics with Big Data

Core Curriculum to the Course:

Improving Data Processing Speed in Big Data Analytics Using. HDFS Method

Data Warehouse design

Bringing Big Data Modelling into the Hands of Domain Experts

Azure Machine Learning, SQL Data Mining and R

Our Raison d'être. Identify major choice decision points. Leverage Analytical Tools and Techniques to solve problems hindering these decision points

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

Integrating a Big Data Platform into Government:

Data: To BI or not to BI?

IMAV: An Intelligent Multi-Agent Model Based on Cloud Computing for Resource Virtualization

Artificial Intelligence and Politecnico di Milano. Presented by Matteo Matteucci

Big Data. Fast Forward. Putting data to productive use

Mastering Big Data. Steve Hoskin, VP and Chief Architect INFORMATICA MDM. October 2015

Machine Learning with MATLAB David Willingham Application Engineer

Chapter ML:XI. XI. Cluster Analysis

Industry 4.0 and Big Data

Enhanced Boosted Trees Technique for Customer Churn Prediction Model

Confidently Anticipate and Drive Better Business Outcomes

Big Analytics: A Next Generation Roadmap

Big Data: Rethinking Text Visualization

This Symposium brought to you by

DATA SCIENCE CURRICULUM WEEK 1 ONLINE PRE-WORK INSTALLING PACKAGES COMMAND LINE CODE EDITOR PYTHON STATISTICS PROJECT O5 PROJECT O3 PROJECT O2

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

Zero-in on business decisions through innovation solutions for smart big data management. How to turn volume, variety and velocity into value

Data Science and Business Analytics Certificate Data Science and Business Intelligence Certificate

Descriptive to Predictive to Prescriptive Analytics: Move Up the Value Chain. Suren Nathan CTO

Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014

Defending Networks with Incomplete Information: A Machine Learning Approach. Alexandre

Getting to Know Big Data

MEDICAL DATA MINING. Timothy Hays, PhD. Health IT Strategy Executive Dynamics Research Corporation (DRC) December 13, 2012

A Framework of User-Driven Data Analytics in the Cloud for Course Management

Data Warehousing and Data Mining in Business Applications

Leveraging Big Data Technologies to Support Research in Unstructured Data Analytics

Foundations of Business Intelligence: Databases and Information Management

COURSE CATALOGUE

Unlocking the Intelligence in. Big Data. Ron Kasabian General Manager Big Data Solutions Intel Corporation

Research of Postal Data mining system based on big data

Pragmatic Web 4.0. Towards an active and interactive Semantic Media Web. Fachtagung Semantische Technologien September 2013 HU Berlin

Big Data and Analytics: Challenges and Opportunities

Doctor of Philosophy in Computer Science

SAP Solution Brief SAP HANA. Transform Your Future with Better Business Insight Using Predictive Analytics

SAP Predictive Analysis: Strategy, Value Proposition

Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

The Data Mining Process

ANALYTICS STRATEGY: creating a roadmap for success

COMP9321 Web Application Engineering

An interdisciplinary model for analytics education

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

Past, present, and future Analytics at Loyalty NZ. V. Morder SUNZ 2014

Hexaware E-book on Predictive Analytics

PDF PREVIEW EMERGING TECHNOLOGIES. Applying Technologies for Social Media Data Analysis

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Role Description. Position of a Data Scientist Machine Learning at Fractal Analytics

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

Business Analytics and Data Mining for CRM Business Analytics and Data Mining for CRM: Jumpstart workshop

Machine Learning using MapReduce

Digging for Gold: Business Usage for Data Mining Kim Foster, CoreTech Consulting Group, Inc., King of Prussia, PA

Transcription:

SURVEY REPORT DATA SCIENCE SOCIETY 2014

TABLE OF CONTENTS Contents About the Initiative 1 Report Summary 2 Participants Info 3 Participants Expertise 6 Suggested Discussion Topics 7 Selected Responses Summary 9 Areas of Interest 10 Contact information 12

ABOUT THE INITIATIVE About the Initiative Data Science Society is an initiative which enables faster growth and better performance for Education, Science and Business in the Data Science industry. Our community platform should facilitate collaboration, knowledge sharing, innovation and entrepreneurship. Our goal is to stimulate education, knowledge sharing and research. We provide new business opportunities and communication channels and increase public awareness about Data Science. Our next step is to start regular society meetings and present the most interesting topics selected by members. More information will be available on our website, which will be launched in the next few weeks. Thank you for your active participation, and we look forward to seeing you at the first society meeting! Data Science Society team August 25, 2014 Page 1

REPORT SUMMARY Report Summary THE SURVEY Between 10 of June and 15 of August a survey was conducted among volunteers with an incentive to gain th th and share knowledge about Data Science. The main scope of the survey was to validate the hypothesis that there is an existing knowledge and willingness to create a decentralized community which proactively can collaborate and share knowledge and expertise. The questionnaire was designed to gather initial information on: i) participant info; ii) expertise in the field; iii) various topics that members are willing to present at the regular society meetings; iv) platforms in operation and v) interest of business, science and universities. CONCLUSIONS There exist a good body knowledge in the field and a strong interest from the three groups (business, science and universities to collaborate. Various types of platforms are used with dominance of Wiki-s. 30 volunteers participated in the survey from various companies and universities with different expertise. They suggested more than 35 topics in various areas to be presented at the society meetings. NEXT STEPS The start of the topics selection process will require comments and ratings from the participants. Top 10 topics will be selected based on the vote and will be presented by originators or in discussion panels. The society understanding is that members are willing to participate and could present the topics of interest following one or two months of notification period. Information on speakers expertise and their companies will be provided during the selection process. Page 2

PARTICIPANTS INFO Participants Info A variety of participants with different occupation, expertise and level of employment took part in the survey. Main highlights are provided in this section. EDUCATION AND SCIENCE At the current stage a limited number of universities were contacted. We plan to gradually increase their number and variety. BUSINESS Startups The startup society in Bulgaria is increasing rapidly. It is willing to share knowledge and expertise and to collaborate. Page 3

PARTICIPANTS INFO Companies Different local and international businesses were contacted with diverse scope, goals and expertise. Some of them provide consultancy, analytical services, analytical tools, and solutions in the field of Data Science, while others are currently only interested in this area. Page 4

PARTICIPANTS INFO PARTICIPANTS PROFILES In the survey different level of participants took a part, some of them are employees with respective expertise as Subject Matter Experts, Managers which are on middle and top level, scientists with respective academicals rank PhD, Doctor and professor. Subject Matter Experts Mid and Top Level Managers PhD/Assistant Professor Professor and associate professors 12 13 3 2 Page 5

PARTICIPANTS EXPERTISE Participants Expertise This section provides information on the expertise of the survey participants. The results could be somewhat biased due to the use of self-assessment technique. The most widely covered knowledge areas among participants are Statistics and Business Analytics. Area Number of Participants Average knowledge level (1-10) Data Engineering 3 7.3 Data Management 1 7.0 Data itegration 1 8.0 Data Warehousing Infrastructure 4 8.3 Information retrieval Data wrangling 4 6.3 Mathematics Statistics 12 5.8 Learning Machine Learning 4 6.0 Neural Networks 1 8.0 Natural language processing 1 5.0 Data mining 4 8.3 Computer Vision 1 5.0 Complex event processing 1 8.0 Domain expertise Domain expertise: Cargo transport 1 8.0 Online poker 1 5.0 Marketing 2 6.0 e-government 1 5.0 credit risk 1 6.0 Business Analytics 7 7.3 software business analysis 1 6.0 Business Intelligence 5 7.6 Visualisation 4 7.0 Advanced computing 3 6.3 Others Business Development 1 8.0 Computer Science 1 7.0 Software development related to data scinece 1 10.0 Technology adoptions form businesses 1 8.0 open data 1 4.0 IT 1 7.0 Usefulness of the R&D from customer's point of view 1 8.0 Business process optimization with data science 2 8.0 Consulting services 1 10.0 Page 6

SUGGESTED DISCUSSION TOPICS Suggested Discussion Topics Different domain-specific and general discussion topics were suggested by the participants. DESCRIPTIONS 1 Databases, storing, indexing, quering 2 Normalizing data 3 Market system identification - demand model development (the accent could be automation of the process for model development) 4 Data Processes and Tools 5 R language for statistics 6 Data mining platforms 7 How to make profitable business from scientific research? 8 Machine learning algorithms - SVM, Artificial NN, Random forests and others - Strength and weaknesses, assumptions, optimizations. 9 Fraud-predictive analysis bases on Social Networks Data 10 Statistical data in online poker 11 Impact of incorrect application of data science, for instance saying we do big data and not understanding what it really means. 12 Usage of open data 13 What is Big Data? 14 Big data application case study: balancing of demand and supply in cargo transport 15 Big Data approaches to Linked / RDF data management 16 Citizen science 17 Parallel and Distributed Algorithms for Inference and Optimization. In particular I am interested in computational frameworks for horizontally scaling iterative algorithms for which Hadoop MapReduce framework might not be the best solution. 18 General intro to statistics 19 Many core CPU for high performance computing 20 What breaks the connection between business, the people that should use the results form the R&D and the scientists? 21 Using sophisticated machine learning models for credit risk prediction. 22 Complex algorithms based on a collaboration of ML algorithms Page 7

SUGGESTED DISCUSSION TOPICS 23 Health analytics/quantified self/bio feedback 24 What does Hadoop really do better than Oracle? 25 Implementation of e-government 26 What is the role of the Predictive Analytics in the new Ecomony? 27 Big data in digital humanities 28 Which are the most common/popular distributed platforms for storing large volumes of data in the industry. 29 How to fund join ventures between labs and business? 30 Computer vision - video image recognition. 31 Sport analytics 32 How to apply the Predictive Analytics in the Business (in any industry with "Big Data"); and how science can help in the process? 33 Predictive Analytics 34 Data discrepancy mitigation 35 Data Quality problems 36 Forecasting methods and trend adjustments 37 Online resources to build data science skills or "The Open-Source Data Science Masters" 38 Hadleyverse 39 Predictive Analytics for Credit Risk in Bulgarian Financial Institutions - Challenges and Opportunities 40 Forecasting Market Risk in the Context of Basel II Without Relying on External Software Solutions - Is It Possible? Page 8

SELECTED RESPONSES SUMMARY Selected Responses Summary Current colloboration platforms in use 80% 70% 73% 60% 57% 50% 40% 30% 20% 10% 43% 40% 40% 0% Wiki Forum Social Platform Stack Exchange Google docs Indicative Support for Data Science Society 120% 100% 97% 80% 60% 67% 57% 63% 40% 27% 20% 10% 10% 10% 0% I will attend Bring others Sponsorship Event fee Speaker Venue Media Volunteer Page 9

AREAS OF INTEREST Areas of Interest This section summarizes the presented areas of interest of the participants sorted in alphabetical order. DESCRIPTIONS 256 core CPU www.kalray.com Large scale, sparse optimization Analytic and Predictions Machine Learning Applications Market Basket Analysis Automated Decisioning Market research Banking Mathematics Bayesian Networks New Economy of Internet of things Behavioral Analytics NLP/text mining Big data Non linear system identification Business Analytics Non-structural data Business development based on technology Numerical methods solutions Business Intelligence Open data Churn Prediction Personalization Communication Internet Predictive Analysis Computational frameworks for iterative Predictive modelling mathematical algorithms running on big data (i.e beyond MapReduce) Computer Vision Process improvement using Data Customer Behavioral Segmentation Programming and Task Automation Data processing Public sources for big data Data scraping Real-time / stream data processing Databases, storing, indexing, querying Risk management & evaluation Democratization of everything Semantic analysis Digital Marketing Analytics Signal processing Domain Expertise Social causes and initiatives e-government Social network analysis Electric vehicles Sport Analytics Embedded devices Startups and entrepreneurships Page 10

AREAS OF INTEREST Financial markets Fuzzy Sets and Logic Health Analytics/bio feedback High-performance computing Information Analysis k-means Clustering Know-how exchange Large scale cloud platforms Large scale data mining State space system identification (approach for modelling of multivariable dynamic systems) Text Mining Tools Unstructured data analysis Unsupervised learning Very large digital libraries Visualization Where/how do companies accumulating large volumes of data keep it? Page 11

CONTACT INFORMATION Contact information Data Science Society Sofia, Bulgaria Email: info@datasciencesociety.net Website: http://datasciencesociety.net/ Tel: + 359 888 400 290 Page 12