SURVEY REPORT DATA SCIENCE SOCIETY 2014
TABLE OF CONTENTS Contents About the Initiative 1 Report Summary 2 Participants Info 3 Participants Expertise 6 Suggested Discussion Topics 7 Selected Responses Summary 9 Areas of Interest 10 Contact information 12
ABOUT THE INITIATIVE About the Initiative Data Science Society is an initiative which enables faster growth and better performance for Education, Science and Business in the Data Science industry. Our community platform should facilitate collaboration, knowledge sharing, innovation and entrepreneurship. Our goal is to stimulate education, knowledge sharing and research. We provide new business opportunities and communication channels and increase public awareness about Data Science. Our next step is to start regular society meetings and present the most interesting topics selected by members. More information will be available on our website, which will be launched in the next few weeks. Thank you for your active participation, and we look forward to seeing you at the first society meeting! Data Science Society team August 25, 2014 Page 1
REPORT SUMMARY Report Summary THE SURVEY Between 10 of June and 15 of August a survey was conducted among volunteers with an incentive to gain th th and share knowledge about Data Science. The main scope of the survey was to validate the hypothesis that there is an existing knowledge and willingness to create a decentralized community which proactively can collaborate and share knowledge and expertise. The questionnaire was designed to gather initial information on: i) participant info; ii) expertise in the field; iii) various topics that members are willing to present at the regular society meetings; iv) platforms in operation and v) interest of business, science and universities. CONCLUSIONS There exist a good body knowledge in the field and a strong interest from the three groups (business, science and universities to collaborate. Various types of platforms are used with dominance of Wiki-s. 30 volunteers participated in the survey from various companies and universities with different expertise. They suggested more than 35 topics in various areas to be presented at the society meetings. NEXT STEPS The start of the topics selection process will require comments and ratings from the participants. Top 10 topics will be selected based on the vote and will be presented by originators or in discussion panels. The society understanding is that members are willing to participate and could present the topics of interest following one or two months of notification period. Information on speakers expertise and their companies will be provided during the selection process. Page 2
PARTICIPANTS INFO Participants Info A variety of participants with different occupation, expertise and level of employment took part in the survey. Main highlights are provided in this section. EDUCATION AND SCIENCE At the current stage a limited number of universities were contacted. We plan to gradually increase their number and variety. BUSINESS Startups The startup society in Bulgaria is increasing rapidly. It is willing to share knowledge and expertise and to collaborate. Page 3
PARTICIPANTS INFO Companies Different local and international businesses were contacted with diverse scope, goals and expertise. Some of them provide consultancy, analytical services, analytical tools, and solutions in the field of Data Science, while others are currently only interested in this area. Page 4
PARTICIPANTS INFO PARTICIPANTS PROFILES In the survey different level of participants took a part, some of them are employees with respective expertise as Subject Matter Experts, Managers which are on middle and top level, scientists with respective academicals rank PhD, Doctor and professor. Subject Matter Experts Mid and Top Level Managers PhD/Assistant Professor Professor and associate professors 12 13 3 2 Page 5
PARTICIPANTS EXPERTISE Participants Expertise This section provides information on the expertise of the survey participants. The results could be somewhat biased due to the use of self-assessment technique. The most widely covered knowledge areas among participants are Statistics and Business Analytics. Area Number of Participants Average knowledge level (1-10) Data Engineering 3 7.3 Data Management 1 7.0 Data itegration 1 8.0 Data Warehousing Infrastructure 4 8.3 Information retrieval Data wrangling 4 6.3 Mathematics Statistics 12 5.8 Learning Machine Learning 4 6.0 Neural Networks 1 8.0 Natural language processing 1 5.0 Data mining 4 8.3 Computer Vision 1 5.0 Complex event processing 1 8.0 Domain expertise Domain expertise: Cargo transport 1 8.0 Online poker 1 5.0 Marketing 2 6.0 e-government 1 5.0 credit risk 1 6.0 Business Analytics 7 7.3 software business analysis 1 6.0 Business Intelligence 5 7.6 Visualisation 4 7.0 Advanced computing 3 6.3 Others Business Development 1 8.0 Computer Science 1 7.0 Software development related to data scinece 1 10.0 Technology adoptions form businesses 1 8.0 open data 1 4.0 IT 1 7.0 Usefulness of the R&D from customer's point of view 1 8.0 Business process optimization with data science 2 8.0 Consulting services 1 10.0 Page 6
SUGGESTED DISCUSSION TOPICS Suggested Discussion Topics Different domain-specific and general discussion topics were suggested by the participants. DESCRIPTIONS 1 Databases, storing, indexing, quering 2 Normalizing data 3 Market system identification - demand model development (the accent could be automation of the process for model development) 4 Data Processes and Tools 5 R language for statistics 6 Data mining platforms 7 How to make profitable business from scientific research? 8 Machine learning algorithms - SVM, Artificial NN, Random forests and others - Strength and weaknesses, assumptions, optimizations. 9 Fraud-predictive analysis bases on Social Networks Data 10 Statistical data in online poker 11 Impact of incorrect application of data science, for instance saying we do big data and not understanding what it really means. 12 Usage of open data 13 What is Big Data? 14 Big data application case study: balancing of demand and supply in cargo transport 15 Big Data approaches to Linked / RDF data management 16 Citizen science 17 Parallel and Distributed Algorithms for Inference and Optimization. In particular I am interested in computational frameworks for horizontally scaling iterative algorithms for which Hadoop MapReduce framework might not be the best solution. 18 General intro to statistics 19 Many core CPU for high performance computing 20 What breaks the connection between business, the people that should use the results form the R&D and the scientists? 21 Using sophisticated machine learning models for credit risk prediction. 22 Complex algorithms based on a collaboration of ML algorithms Page 7
SUGGESTED DISCUSSION TOPICS 23 Health analytics/quantified self/bio feedback 24 What does Hadoop really do better than Oracle? 25 Implementation of e-government 26 What is the role of the Predictive Analytics in the new Ecomony? 27 Big data in digital humanities 28 Which are the most common/popular distributed platforms for storing large volumes of data in the industry. 29 How to fund join ventures between labs and business? 30 Computer vision - video image recognition. 31 Sport analytics 32 How to apply the Predictive Analytics in the Business (in any industry with "Big Data"); and how science can help in the process? 33 Predictive Analytics 34 Data discrepancy mitigation 35 Data Quality problems 36 Forecasting methods and trend adjustments 37 Online resources to build data science skills or "The Open-Source Data Science Masters" 38 Hadleyverse 39 Predictive Analytics for Credit Risk in Bulgarian Financial Institutions - Challenges and Opportunities 40 Forecasting Market Risk in the Context of Basel II Without Relying on External Software Solutions - Is It Possible? Page 8
SELECTED RESPONSES SUMMARY Selected Responses Summary Current colloboration platforms in use 80% 70% 73% 60% 57% 50% 40% 30% 20% 10% 43% 40% 40% 0% Wiki Forum Social Platform Stack Exchange Google docs Indicative Support for Data Science Society 120% 100% 97% 80% 60% 67% 57% 63% 40% 27% 20% 10% 10% 10% 0% I will attend Bring others Sponsorship Event fee Speaker Venue Media Volunteer Page 9
AREAS OF INTEREST Areas of Interest This section summarizes the presented areas of interest of the participants sorted in alphabetical order. DESCRIPTIONS 256 core CPU www.kalray.com Large scale, sparse optimization Analytic and Predictions Machine Learning Applications Market Basket Analysis Automated Decisioning Market research Banking Mathematics Bayesian Networks New Economy of Internet of things Behavioral Analytics NLP/text mining Big data Non linear system identification Business Analytics Non-structural data Business development based on technology Numerical methods solutions Business Intelligence Open data Churn Prediction Personalization Communication Internet Predictive Analysis Computational frameworks for iterative Predictive modelling mathematical algorithms running on big data (i.e beyond MapReduce) Computer Vision Process improvement using Data Customer Behavioral Segmentation Programming and Task Automation Data processing Public sources for big data Data scraping Real-time / stream data processing Databases, storing, indexing, querying Risk management & evaluation Democratization of everything Semantic analysis Digital Marketing Analytics Signal processing Domain Expertise Social causes and initiatives e-government Social network analysis Electric vehicles Sport Analytics Embedded devices Startups and entrepreneurships Page 10
AREAS OF INTEREST Financial markets Fuzzy Sets and Logic Health Analytics/bio feedback High-performance computing Information Analysis k-means Clustering Know-how exchange Large scale cloud platforms Large scale data mining State space system identification (approach for modelling of multivariable dynamic systems) Text Mining Tools Unstructured data analysis Unsupervised learning Very large digital libraries Visualization Where/how do companies accumulating large volumes of data keep it? Page 11
CONTACT INFORMATION Contact information Data Science Society Sofia, Bulgaria Email: info@datasciencesociety.net Website: http://datasciencesociety.net/ Tel: + 359 888 400 290 Page 12