Essex Big Data and Analytics Summer School 2015, 24 th - 28 th August 2015

Size: px
Start display at page:

Download "Essex Big Data and Analytics Summer School 2015, 24 th - 28 th August 2015"

Transcription

1 Big Data Analytics Summer School 2015, 24 th - 28 th August 2015 Course code title Category Presenter Level BD001 Introduction to R 5 days BD002 BD003 Big Data Methods in R Science Big Data Leo Schalkwyk, Szymon Walkowiak, UK Data Service, Andrew Harrison, (TBC) R is an interactive computing environment programming language designed for statistical analysis graphics. Extensions to the basic capabilities of R are straightforward to produce share with others. It is widely increasingly used in many Big Data fields of research including bioinformatics. Because of its power flexibility, R is more deming to learn than traditional statistical packages but rewards some initial effort. This course is based tested material that we have been using for nearly 10 years to help research students, postdocs faculty get started in their own data analysis, is refined each time based on feedback. It is aimed at people who may have little or no programming experience. course will emphasize the fundamentals of the R language in an intensive format where each student has a computer 50% of the time is spent on practical exercises, will include a special module on techniques. This course will provide participants with an array of major techniques essential R programming skills in data analysis process of large complex socio-economic datasets. In particular, the participants will be introduced to: basics of Big Data extraction technical requirements for effective Big Data manipulation Methods of Big Data management including sub-setting, data transformations, screening for missing values etc. R packages supporting Big Data manipulation techniques e.g. extracting converting between dates times formats, text mining etc. Descriptive statistics frequency tables for Big Data Libraries facilitating Big Data statistical computation modelling Interactive Big Data visualisation techniques process of Big Data product development course will involve active learning methods with case studies real socio-economic data. Prerequisite(s): A working proficiency in R or attendance at the Summer School s introduction to R Course (BD001) history of science has shown on many occasions the benefits of bringing data sets together. It has also shown that deep insights into the Universe lead to theories that provide elegant explanations behind great unifications of knowledge ( hence data). se theories can be, in many cases, described by mathematical concepts, giving clues to how we should best represent the data in order to aid understing.

2 Course code title Category Presenter Level BD004 - BD005 BD006 BD007 - Clustering Classification with in R Bayesian Computational Methods with applications (in R) Actuarial/Finan cial modelling with applications in R Introduction to Data Mining Berthold Lausen, Hongsheng Dai Saeed Aldahmani, Spyros Vrontos, Beatriz de la Iglesia, East Anglia, ESRC Business Local Government Data Research Centre Advanced /intermediat e 8 hours (over 2 days) I will describe some of the best understood theories their representations. I will break scientific studies into those of simple, complex complicated systems. New sources of simple complex data may offer our best chance of providing new unifications understing of the causal structures within nature. Whereas complicated sources may offer little hope of inferring causality. short course gives an introduction in cluster analysis (unsupervised learning) classification (supervised learning). concept of k-means clustering hierarchical clustering are discussed applied in R. Linear discriminant analysis, logistic regression, classification regression trees (CART) rom forests are introduced as examples of statistical learning methods. Crossvalidation- bootstrap-methods are applied to assess classifiers. Using R, participants analyse example data sets compute estimates of the misclassification rate of the area under the receiver-operating characteristic (ROC). Prerequisite(s): Basic skills using R, basic concepts in statistics as correlation linear regression. course will first provide a brief introduction on Bayesian analysis then cover Markov chain Monte Carlo (MCMC) methods, such as Metropolis-Hastings algorithm Gibbs sampler. on mixture models, change-point problems regression analysis will also be covered in the lecture. course includes a 2-hour lab session to help audience be familiar with implementing MCMC algorithms using R. Prerequisite(s): Participants should have knowledge of at least first-year statistics probability R. Modelling claim frequency claim severity in general insurance, distribution fitting, application of generalised linear models in pricing, ratemaking bonus malus systems. Modelling the returns of financial assets. Option pricing in finance insurance. Monte Carlo methods their application in option pricing in pricing life insurance liabilities. Extensive applications in R with real simulated data sets. Prerequisite(s): Basic knowledge of statistics R. course will introduce the topic of data mining, will present a methodology for Knowledge Discovery in databases (KDD). tasks of clustering classification will be explored in some detail. We will look at an open source data mining package for some practical guidance on how to put what has been learned to practice.

3 Course code title Category Presenter Level BD008 BD009 BD010 BD011 BD012 A (gentle) introduction to reinforcement learning Search in big data Practical sentiment analysis High performance computing Data Protection Liability in the Age of Big Legal ethical issues Spyros Samothrakis, Allan Hanbury, Vienna Technology Diana Maynard, Sheffield Adrian Clark, Audrey Guinchard, 6 or 8 hours /advanced 8 hour Reinforcement learning is concerned with learning how to act optimally in the presence of rewards punishments. This short course on reinforcement learning will help you underst the basics provide a solid foundation necessary for advanced topics. It will have both a practical (two hour) a theoretical (two hour) component. Topics to be addressed are Markov Decision Processes, Monte Carlo methods, SARSA Q-Learning. Prerequisite(s): Some mathematical/computer science sophistication (e.g. understing summation, recursion, means/medians). As the amount of text data stored by organisations grows, information retrieval technologies become increasingly important. Effective use of search technologies are essential to ensuring that the key information is available when decisions are made. This course will start by covering the basics of information retrieval, such as indexing keyword search. It will then cover adapting search to specific domains (such as the technical health domains), will finally present how the effectiveness of search technologies is evaluated. Prerequisite(s): participants need to be comfortable in basic mathematics, especially linear algebra. This tutorial will introduce the concept of sentiment analysis from unstructured text. It will cover both rule-based machine learning techniques, provide some background information on the key underlying NLP text analysis processes required, look in detail at some of the major problems solutions, such as detection of sarcasm, use of informal language, spam opinion detection, trustworthiness of opinion holders, so on. techniques will be demonstrated with real applications developed in GATE, an opensource language processing toolkit. Hs-on exercises relevant materials will be provided for participants to try out the applications, to experiment with building their own tools, both in GATE with other common tools. Prerequisite(s): No prior knowledge of GATE, Java or Natural Language Processing (NLP) is required to attend this tutorial. However, it will include a hs-on element where you will be able to try simple things out in GATE, the tool we use for NLP tasks. This course introduces participants to high performance computing. first half of the course will cover principles (floating-point computation, speeding up code, compute clusters, using MPI) while, in the second part, participants will have the opportunity to build use a small cluster. Prerequisite(s): course assumes knowledge of programming in Python/C/C++. This session aims to introduce the current EU UK data protection regime the changes to be brought in by the future General Data Protection Regulation late Furthermore, the session will present allow for discussion of the specific challenges big

4 Course code title Category Presenter Level Data Analytics data bring, especially in light of the reports published by various data protection regulators on both at UK EU levels. BD013 BD014 BD015 Managing, curating publishing data Secure access protocols for Big Data Agent based modelling for business Curation management of data Curation management of data Sharon Bolton Louise Corti, UK Data Service, Libby Bishop Felix Ritchie, UK Data Service, Abhijit Sengupta Big data may come from a range of sources organisations, which may not be used to the idea of sharing their data with researchers. refore, they might not realise what researchers need so some of the features traditionally present that make research data easier to use might not be available. This can bring a range of problems, some of which can be addressed by good data curation. course will start with what the legal issues in brokering data. assessment of : issues of trust in quality of the source. Who is the provider? Also, it will highlight ethical issues content use of personal data. For example, some of the questions we plan to address in the session include: Data confidentiality are people identifiable from the data? Metadata accompanying documentation do users have enough information about what the data means how it can be used? Formats, size usability what kind of software, hardware techniques are needed? Publishing data products or data to support a journal article. What does the supporting data look like for verification? Run a hs-on exercise publishing a small datasets in a repository providing necessary metadata documentation. To learn what curation is what is needed. On aspects of accessing using confidential sources of Big Data. Five Safes of data access Big Data confidentiality/privacy/ethical considerations: what you need to know How to be a Safe Person when using confidential sources of Big Data Using Big Data responsibly Designing a Safe Setting for Big Data Disclosure control techniques: to data, to your research outputs objective is to prepare people who want to access confidential sources of Big Data. y might be making an application to a data owner, or for funding which has to go through an ethics panel. Or they might be using Big Data but unaware of some of the confidential/privacy/public-perception issues that surround collection analysis of Big Data. Advanced This course will start by providing students with an overview of the nature of business applications where Agent Based Modelling (ABM) can be useful, relevant practical. It will then proceed with some real world examples where ABM has been used, particularly in the context of the Fast Moving Consumer Goods (FMCG) sector. A few of these examples, which are in public domain have had an academic influence, will be discussed in detail.

5 Course code title Category Presenter Level BD016 BD017 BD018 Machine Learning with Mahout (tbc) Big Data Finance Analytics Cognitive Computing Richard Skeggs, ESRC Business Local Government Data Research Centre Neil Kellard, Detlef Nauck Martin Spott, British Telecom /advanced lecture will conclude with some indicators of where the future of this modelling paradigm lies in the context of business applications. Prerequisite(s): Understing of complex systems phenomena. Familiarity with social networks properties of networks. Reasonable knowledge of at least one ABM toolkit such as Repast or NetLogo. All practical examples in this course will be NetLogo based. This is an introduction into the use of machine learning algorithms supported by the Apache Mahout framework. class will concentrate on what problems can be solved using Mahout before looking at the common classifiers used by Mahout to achieve those objectives. Finally the class will look at building some simple working examples to see Mahout in practice. Prerequisite(s): Knowledge of the Java programming language is essential. Some statistical knowledge will be useful but not essential. Big data is the term for a collection of data sets so large complex that it becomes difficult to process using on-h database management tools or traditional data processing applications. Given contemporary computing power potential data collection, many firms, particularly those from the financial sector, wish to use. challenges include capture, curation, storage search, sharing, transfer, data visualization. primary purpose of this course is to provide the participant with an understing of data analytic approaches in finance. first part covers high frequency trading predictive. second part will concentrate on the application of data in risk modelling, corporate finance, fraud personal finance. Prerequisite(s): Some background in statistics/mathematics/econometrics is desirable but not essential. While we are successfully addressing the challenges behind storing managing massive amounts of data through technologies, we are still facing large obstacles in successfully quickly analysing that data. view that one analyst uses tools from statistics, machine learning data mining to find answers in data rapidly becomes outdated in the face of an overwhelming amount variety of data an ever increasing dem for evidence based decision making. We now need to look into concepts of collaborative distributed where analysts work together combine individual results to an overall answer. We need tools that can deal with uncertainty can assess the quality of potential answers. We need new human-computer interfaces that allow computers to really help analysts find answers that they could not have come up with themselves. We also need computers help analysts to illustrate explain the outcome of to decision makers so they have confidence in the results. Cognitive Computing addresses several of these issues. Cognitive Computing looks at how we get computers to behave interact the way humans do. Systems like IBM s Watson can deal with huge

6 Course code title Category Presenter Level BD019 BD020 BD021 BD022 Stream Processing Data Analytics for Smart City Crowdsourcing Human Computation From Big Data to Big Value Introduction to Big Data Statistics TBC Sefki Kolozali Nazli Farajidavar Surrey Jon Chamberlain, Richard Mason, Intel Nathan Cunningham, UK Data Service, /advanced volumes of data, identify knowledge patterns in the data apply this to the problem the analyst is trying to solve by giving them different alternatives to consider in particular the underlying evidence that supports those alternatives. This course looks at the challenges modern is facing explores how ideas from Cognitive Computing can lead to a new era of data. Prerequisite(s): A basic understing of what is involved in running a data science project. In this course we cover some of background concepts related to the Internet of Things Web of Things, Semantic Technologies in the smart city domain will describe solution for processing information extraction from real world data. Use-cases examples from the smart city domain will be described. We will also discuss some of the machine learning techniques data tools methods that can be used to process analyse the smart city data. Prerequisite(s): Familiarity with machine learning techniques semantic web technologies would be useful but is not compulsory Crowdsourcing has established itself in the mainstream of research methodology in recent years, using a variety of methods to engage many non-expert users to solve problems that computers or limited expert users cannot solve. Whilst the concept of human computation goes some way towards solving problems, it also introduces new challenges of data quality, participant recruitment incentivisation. This course will introduce 3 common methods of crowdsourcing: peer-production; microworking games-with-a-purpose, as well as an emerging approach using social networks as a powerful problem solving monitoring tool. Participants are encouraged to bring examples of data they would like annotated or tasks that need humans to solve for discussion as to which approach might be suitable how to implement it. Learn how Intel is harnessing Big Data to drive operational efficiency revenue optimisation across the organisation. Discuss trends how Intel is embracing these trends to gain further insights, adoption value. This is a short introductory course into understing Big Data, what it is what strategies you can adopt to make the most out of it. It would be useful to bring a device for note taking. This course will cover: Putting new knowledge first. What question do you want to answer? Defining metrics for success. What is Big Data? What Big Data solutions are available to me for free? Do you know what your real sample size is?

7 Course code title Category Presenter Level Testing hypotheses calling things significant. Managing spurious correlations. Smoothing data to understing significant relationships spatial/temporal data Make as small as possible as quick as possible. Plotting your so you don t miss the obvious. Strategies for improving prediction accuracy by averaging many models together. Prerequisite(s): Familiarity with using applying science/research data to answer questions. An understing of statistics how databases operate is desirable. A basic overview of computing infrastructure algorithm will be discussed but at an introductory level assuming no prior knowledge. Keynote Lectures Company Presenter Title of talk Abstract Thomson Jochen Leidner Reuters Small Data Big Data: Qualitative Differences Resulting from Quantitative Scale Intel Mark Woodward Using Big Data to Generate Real Revenue for Business Fujitsu Joe Duran Impact of Research on Computing in Society Citigroup Stuart Jones Bridging the Gap Between Big Data, Statistics Business While the Big Data topic has received a lot of attention, one may wonder why exactly "more of the same should constitute a step change; for instance, we haven t declared a new academic field of "Big Plastic" just because we consume process more plastic than ever. In this talk, I critically assess which, if any, quantitative changes induce qualitative changes, whether the talk of as a new area is merited. Along the way, we will revisit a couple of past ongoing efforts that fall into the space apply these findings. This talk will describe how Intel takes advantage of large, complex data sources to achieve greater efficiency, cost saving new revenue opportunities across its business. As part of the talk, examples of initiatives real world business scenarios in the Technology Manufacturing world will be discussed. TBC This talk will describe how understing business objectives including revenue, expense risk management can be satisfied with statistical analysis of. As part of the talk examples of initiatives that will assist in detecting preventing fraudulent money-laundering activity in the financial world will be discussed.

15.45-16.00 Coffee break 16.00-18.00 Parallel session 1 BD022 Introduction to Big Data and Statistics Nathan Cunningham CSEE lab 2

15.45-16.00 Coffee break 16.00-18.00 Parallel session 1 BD022 Introduction to Big Data and Statistics Nathan Cunningham CSEE lab 2 Day Time Activity Course no Monday 08.00-08.30 Registration Course title Speaker Room Parallel session 2 BD003 Science and Big Data Andrew Harrison TC2.12 Parallel session 2 BD003 Science and Big Data

More information

Learning outcomes. Knowledge and understanding. Competence and skills

Learning outcomes. Knowledge and understanding. Competence and skills Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges

More information

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

More information

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376 Course Director: Dr. Kayvan Najarian (DCM&B, kayvan@umich.edu) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.

More information

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

More information

MACHINE LEARNING BASICS WITH R

MACHINE LEARNING BASICS WITH R MACHINE LEARNING [Hands-on Introduction of Supervised Machine Learning Methods] DURATION 2 DAY The field of machine learning is concerned with the question of how to construct computer programs that automatically

More information

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume

More information

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD Predictive Analytics Techniques: What to Use For Your Big Data March 26, 2014 Fern Halper, PhD Presenter Proven Performance Since 1995 TDWI helps business and IT professionals gain insight about data warehousing,

More information

Masters in Information Technology

Masters in Information Technology Computer - Information Technology MSc & MPhil - 2015/6 - July 2015 Masters in Information Technology Programme Requirements Taught Element, and PG Diploma in Information Technology: 120 credits: IS5101

More information

DATA MINING TECHNIQUES AND APPLICATIONS

DATA MINING TECHNIQUES AND APPLICATIONS DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,

More information

ANALYTICS CENTER LEARNING PROGRAM

ANALYTICS CENTER LEARNING PROGRAM Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals

More information

Information Visualization WS 2013/14 11 Visual Analytics

Information Visualization WS 2013/14 11 Visual Analytics 1 11.1 Definitions and Motivation Lot of research and papers in this emerging field: Visual Analytics: Scope and Challenges of Keim et al. Illuminating the path of Thomas and Cook 2 11.1 Definitions and

More information

Government of Russian Federation. Faculty of Computer Science School of Data Analysis and Artificial Intelligence

Government of Russian Federation. Faculty of Computer Science School of Data Analysis and Artificial Intelligence Government of Russian Federation Federal State Autonomous Educational Institution of High Professional Education National Research University «Higher School of Economics» Faculty of Computer Science School

More information

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics. Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

More information

Faculty of Science School of Mathematics and Statistics

Faculty of Science School of Mathematics and Statistics Faculty of Science School of Mathematics and Statistics MATH5836 Data Mining and its Business Applications Semester 1, 2014 CRICOS Provider No: 00098G MATH5836 Course Outline Information about the course

More information

A GENERAL TAXONOMY FOR VISUALIZATION OF PREDICTIVE SOCIAL MEDIA ANALYTICS

A GENERAL TAXONOMY FOR VISUALIZATION OF PREDICTIVE SOCIAL MEDIA ANALYTICS A GENERAL TAXONOMY FOR VISUALIZATION OF PREDICTIVE SOCIAL MEDIA ANALYTICS Stacey Franklin Jones, D.Sc. ProTech Global Solutions Annapolis, MD Abstract The use of Social Media as a resource to characterize

More information

Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p.

Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p. Introduction p. xvii Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p. 9 State of the Practice in Analytics p. 11 BI Versus

More information

Masters in Computing and Information Technology

Masters in Computing and Information Technology Masters in Computing and Information Technology Programme Requirements Taught Element, and PG Diploma in Computing and Information Technology: 120 credits: IS5101 CS5001 or CS5002 CS5003 up to 30 credits

More information

MSCA 31000 Introduction to Statistical Concepts

MSCA 31000 Introduction to Statistical Concepts MSCA 31000 Introduction to Statistical Concepts This course provides general exposure to basic statistical concepts that are necessary for students to understand the content presented in more advanced

More information

How To Become A Data Scientist

How To Become A Data Scientist Programme Specification Awarding Body/Institution Teaching Institution Queen Mary, University of London Queen Mary, University of London Name of Final Award and Programme Title Master of Science (MSc)

More information

Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION Exploration is a process of discovery. In the database exploration process, an analyst executes a sequence of transformations over a collection of data structures to discover useful

More information

not possible or was possible at a high cost for collecting the data.

not possible or was possible at a high cost for collecting the data. Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day

More information

Machine Learning and Data Mining. Fundamentals, robotics, recognition

Machine Learning and Data Mining. Fundamentals, robotics, recognition Machine Learning and Data Mining Fundamentals, robotics, recognition Machine Learning, Data Mining, Knowledge Discovery in Data Bases Their mutual relations Data Mining, Knowledge Discovery in Databases,

More information

COLUMBIA UNIVERSITY IN THE CITY OF NEW YORK DEPARTMENT OF INDUSTRIAL ENGINEERING AND OPERATIONS RESEARCH

COLUMBIA UNIVERSITY IN THE CITY OF NEW YORK DEPARTMENT OF INDUSTRIAL ENGINEERING AND OPERATIONS RESEARCH Course: IEOR 4575 Business Analytics for Operations Research Lectures MW 2:40-3:55PM Instructor Prof. Guillermo Gallego Office Hours Tuesdays: 3-4pm Office: CEPSR 822 (8 th floor) Textbooks and Learning

More information

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

Is a Data Scientist the New Quant? Stuart Kozola MathWorks Is a Data Scientist the New Quant? Stuart Kozola MathWorks 2015 The MathWorks, Inc. 1 Facts or information used usually to calculate, analyze, or plan something Information that is produced or stored by

More information

Data Science and Business Analytics Certificate Data Science and Business Intelligence Certificate

Data Science and Business Analytics Certificate Data Science and Business Intelligence Certificate Data Science and Business Analytics Certificate Data Science and Business Intelligence Certificate Description The Helzberg School of Management has launched two graduate-level certificates: one in Data

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association

More information

Masters in Human Computer Interaction

Masters in Human Computer Interaction Masters in Human Computer Interaction Programme Requirements Taught Element, and PG Diploma in Human Computer Interaction: 120 credits: IS5101 CS5001 CS5040 CS5041 CS5042 or CS5044 up to 30 credits from

More information

Sanjeev Kumar. contribute

Sanjeev Kumar. contribute RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 sanjeevk@iasri.res.in 1. Introduction The field of data mining and knowledgee discovery is emerging as a

More information

Master of Science in Health Information Technology Degree Curriculum

Master of Science in Health Information Technology Degree Curriculum Master of Science in Health Information Technology Degree Curriculum Core courses: 8 courses Total Credit from Core Courses = 24 Core Courses Course Name HRS Pre-Req Choose MIS 525 or CIS 564: 1 MIS 525

More information

Masters in Advanced Computer Science

Masters in Advanced Computer Science Masters in Advanced Computer Science Programme Requirements Taught Element, and PG Diploma in Advanced Computer Science: 120 credits: IS5101 CS5001 up to 30 credits from CS4100 - CS4450, subject to appropriate

More information

MSCA 31000 Introduction to Statistical Concepts

MSCA 31000 Introduction to Statistical Concepts MSCA 31000 Introduction to Statistical Concepts This course provides general exposure to basic statistical concepts that are necessary for students to understand the content presented in more advanced

More information

Masters in Networks and Distributed Systems

Masters in Networks and Distributed Systems Masters in Networks and Distributed Systems Programme Requirements Taught Element, and PG Diploma in Networks and Distributed Systems: 120 credits: IS5101 CS5001 CS5021 CS4103 or CS5023 in total, up to

More information

BIG DATA What it is and how to use?

BIG DATA What it is and how to use? BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14

More information

Navigating Big Data business analytics

Navigating Big Data business analytics mwd a d v i s o r s Navigating Big Data business analytics Helena Schwenk A special report prepared for Actuate May 2013 This report is the third in a series and focuses principally on explaining what

More information

DATA SCIENCE CURRICULUM WEEK 1 ONLINE PRE-WORK INSTALLING PACKAGES COMMAND LINE CODE EDITOR PYTHON STATISTICS PROJECT O5 PROJECT O3 PROJECT O2

DATA SCIENCE CURRICULUM WEEK 1 ONLINE PRE-WORK INSTALLING PACKAGES COMMAND LINE CODE EDITOR PYTHON STATISTICS PROJECT O5 PROJECT O3 PROJECT O2 DATA SCIENCE CURRICULUM Before class even begins, students start an at-home pre-work phase. When they convene in class, students spend the first eight weeks doing iterative, project-centered skill acquisition.

More information

Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar

Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar Prepared by Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc. www.data-mines.com Louise.francis@data-mines.cm

More information

CLUSTER ANALYSIS WITH R

CLUSTER ANALYSIS WITH R CLUSTER ANALYSIS WITH R [cluster analysis divides data into groups that are meaningful, useful, or both] LEARNING STAGE ADVANCED DURATION 3 DAY WHAT IS CLUSTER ANALYSIS? Cluster Analysis or Clustering

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information

Masters in Artificial Intelligence

Masters in Artificial Intelligence Masters in Artificial Intelligence Programme Requirements Taught Element, and PG Diploma in Artificial Intelligence: 120 credits: IS5101 CS5001 CS5010 CS5011 CS4402 or CS5012 in total, up to 30 credits

More information

Advanced Big Data Analytics with R and Hadoop

Advanced Big Data Analytics with R and Hadoop REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional

More information

COLLEGE OF SCIENCE. John D. Hromi Center for Quality and Applied Statistics

COLLEGE OF SCIENCE. John D. Hromi Center for Quality and Applied Statistics ROCHESTER INSTITUTE OF TECHNOLOGY COURSE OUTLINE FORM COLLEGE OF SCIENCE John D. Hromi Center for Quality and Applied Statistics NEW (or REVISED) COURSE: COS-STAT-747 Principles of Statistical Data Mining

More information

EPSRC Cross-SAT Big Data Workshop: Well Sorted Materials

EPSRC Cross-SAT Big Data Workshop: Well Sorted Materials EPSRC Cross-SAT Big Data Workshop: Well Sorted Materials 5th August 2015 Contents Introduction 1 Dendrogram 2 Tree Map 3 Heat Map 4 Raw Group Data 5 For an online, interactive version of the visualisations

More information

Why is Internal Audit so Hard?

Why is Internal Audit so Hard? Why is Internal Audit so Hard? 2 2014 Why is Internal Audit so Hard? 3 2014 Why is Internal Audit so Hard? Waste Abuse Fraud 4 2014 Waves of Change 1 st Wave Personal Computers Electronic Spreadsheets

More information

Data Mining Applications in Higher Education

Data Mining Applications in Higher Education Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2

More information

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing CS Master Level Courses and Areas The graduate courses offered may change over time, in response to new developments in computer science and the interests of faculty and students; the list of graduate

More information

Text Analytics. A business guide

Text Analytics. A business guide Text Analytics A business guide February 2014 Contents 3 The Business Value of Text Analytics 4 What is Text Analytics? 6 Text Analytics Methods 8 Unstructured Meets Structured Data 9 Business Application

More information

KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics

KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics ROCHESTER INSTITUTE OF TECHNOLOGY COURSE OUTLINE FORM KATE GLEASON COLLEGE OF ENGINEERING John D. Hromi Center for Quality and Applied Statistics NEW (or REVISED) COURSE (KGCOE- CQAS- 747- Principles of

More information

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1. Dejan Sarka Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

More information

Auto-Classification for Document Archiving and Records Declaration

Auto-Classification for Document Archiving and Records Declaration Auto-Classification for Document Archiving and Records Declaration Josemina Magdalen, Architect, IBM November 15, 2013 Agenda IBM / ECM/ Content Classification for Document Archiving and Records Management

More information

Data analytics Delivering intelligence in the moment

Data analytics Delivering intelligence in the moment www.pwc.co.uk Data analytics Delivering intelligence in the moment January 2014 Our point of view Extracting insight from an organisation s data and applying it to business decisions has long been a necessary

More information

Lecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions

Lecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions SMA 50: Statistical Learning and Data Mining in Bioinformatics (also listed as 5.077: Statistical Learning and Data Mining ()) Spring Term (Feb May 200) Faculty: Professor Roy Welsch Wed 0 Feb 7:00-8:0

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

Using Data Mining for Mobile Communication Clustering and Characterization

Using Data Mining for Mobile Communication Clustering and Characterization Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer

More information

SAS JOINT DATA MINING CERTIFICATION AT BRYANT UNIVERSITY

SAS JOINT DATA MINING CERTIFICATION AT BRYANT UNIVERSITY SAS JOINT DATA MINING CERTIFICATION AT BRYANT UNIVERSITY Billie Anderson Bryant University, 1150 Douglas Pike, Smithfield, RI 02917 Phone: (401) 232-6089, e-mail: banderson@bryant.edu Phyllis Schumacher

More information

B.Sc. in Computer Information Systems Study Plan

B.Sc. in Computer Information Systems Study Plan 195 Study Plan University Compulsory Courses Page ( 64 ) University Elective Courses Pages ( 64 & 65 ) Faculty Compulsory Courses 16 C.H 27 C.H 901010 MATH101 CALCULUS( I) 901020 MATH102 CALCULUS (2) 171210

More information

NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE. Venu Govindaraju

NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE. Venu Govindaraju NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE Venu Govindaraju BIOMETRICS DOCUMENT ANALYSIS PATTERN RECOGNITION 8/24/2015 ICDAR- 2015 2 Towards a Globally Optimal Approach for Learning Deep Unsupervised

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing Introduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza Moise, Evangelos Pournaras, Dirk Helbing 1 Overview Main principles of data mining Definition

More information

Course Descriptions: Undergraduate/Graduate Certificate Program in Data Visualization and Analysis

Course Descriptions: Undergraduate/Graduate Certificate Program in Data Visualization and Analysis 9/3/2013 Course Descriptions: Undergraduate/Graduate Certificate Program in Data Visualization and Analysis Seton Hall University, South Orange, New Jersey http://www.shu.edu/go/dava Visualization and

More information

IT services for analyses of various data samples

IT services for analyses of various data samples IT services for analyses of various data samples Ján Paralič, František Babič, Martin Sarnovský, Peter Butka, Cecília Havrilová, Miroslava Muchová, Michal Puheim, Martin Mikula, Gabriel Tutoky Technical

More information

PROGRAM DIRECTOR: Arthur O Connor Email Contact: URL : THE PROGRAM Careers in Data Analytics Admissions Criteria CURRICULUM Program Requirements

PROGRAM DIRECTOR: Arthur O Connor Email Contact: URL : THE PROGRAM Careers in Data Analytics Admissions Criteria CURRICULUM Program Requirements Data Analytics (MS) PROGRAM DIRECTOR: Arthur O Connor CUNY School of Professional Studies 101 West 31 st Street, 7 th Floor New York, NY 10001 Email Contact: Arthur O Connor, arthur.oconnor@cuny.edu URL:

More information

Hexaware E-book on Predictive Analytics

Hexaware E-book on Predictive Analytics Hexaware E-book on Predictive Analytics Business Intelligence & Analytics Actionable Intelligence Enabled Published on : Feb 7, 2012 Hexaware E-book on Predictive Analytics What is Data mining? Data mining,

More information

Machine Learning: Overview

Machine Learning: Overview Machine Learning: Overview Why Learning? Learning is a core of property of being intelligent. Hence Machine learning is a core subarea of Artificial Intelligence. There is a need for programs to behave

More information

Certificate Program in Applied Big Data Analytics in Dubai. A Collaborative Program offered by INSOFE and Synergy-BI

Certificate Program in Applied Big Data Analytics in Dubai. A Collaborative Program offered by INSOFE and Synergy-BI Certificate Program in Applied Big Data Analytics in Dubai A Collaborative Program offered by INSOFE and Synergy-BI Program Overview Today s manager needs to be extremely data savvy. They need to work

More information

Maximizing Return and Minimizing Cost with the Decision Management Systems

Maximizing Return and Minimizing Cost with the Decision Management Systems KDD 2012: Beijing 18 th ACM SIGKDD Conference on Knowledge Discovery and Data Mining Rich Holada, Vice President, IBM SPSS Predictive Analytics Maximizing Return and Minimizing Cost with the Decision Management

More information

Learning is a very general term denoting the way in which agents:

Learning is a very general term denoting the way in which agents: What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);

More information

Turning Data into Actionable Insights: Predictive Analytics with MATLAB WHITE PAPER

Turning Data into Actionable Insights: Predictive Analytics with MATLAB WHITE PAPER Turning Data into Actionable Insights: Predictive Analytics with MATLAB WHITE PAPER Introduction: Knowing Your Risk Financial professionals constantly make decisions that impact future outcomes in the

More information

Teaching Computational Thinking using Cloud Computing: By A/P Tan Tin Wee

Teaching Computational Thinking using Cloud Computing: By A/P Tan Tin Wee Teaching Computational Thinking using Cloud Computing: By A/P Tan Tin Wee Technology in Pedagogy, No. 8, April 2012 Written by Kiruthika Ragupathi (kiruthika@nus.edu.sg) Computational thinking is an emerging

More information

Big Data and Healthcare Payers WHITE PAPER

Big Data and Healthcare Payers WHITE PAPER Knowledgent White Paper Series Big Data and Healthcare Payers WHITE PAPER Summary With the implementation of the Affordable Care Act, the transition to a more member-centric relationship model, and other

More information

AMIS 7640 Data Mining for Business Intelligence

AMIS 7640 Data Mining for Business Intelligence The Ohio State University The Max M. Fisher College of Business Department of Accounting and Management Information Systems AMIS 7640 Data Mining for Business Intelligence Autumn Semester 2013, Session

More information

WROX Certified Big Data Analyst Program by AnalytixLabs and Wiley

WROX Certified Big Data Analyst Program by AnalytixLabs and Wiley WROX Certified Big Data Analyst Program by AnalytixLabs and Wiley Disclaimer: This material is protected under copyright act AnalytixLabs, 2011. Unauthorized use and/ or duplication of this material or

More information

CALL FOR APPLICATIONS FOR ADMISSION GRADUATE STUDY PROGRAM "MASTER OF SCIENCE in DATA SCIENCE" Part Time Program 2015-2016

CALL FOR APPLICATIONS FOR ADMISSION GRADUATE STUDY PROGRAM MASTER OF SCIENCE in DATA SCIENCE Part Time Program 2015-2016 CALL FOR APPLICATIONS FOR ADMISSION GRADUATE STUDY PROGRAM "MASTER OF SCIENCE in DATA SCIENCE" Part Time Program 2015-2016 Data Science is the study of data through computational and statistical techniques,

More information

Practical Calculation of Expected and Unexpected Losses in Operational Risk by Simulation Methods

Practical Calculation of Expected and Unexpected Losses in Operational Risk by Simulation Methods Practical Calculation of Expected and Unexpected Losses in Operational Risk by Simulation Methods Enrique Navarrete 1 Abstract: This paper surveys the main difficulties involved with the quantitative measurement

More information

School of Computer Science

School of Computer Science School of Computer Science Head of School Professor S Linton Taught Programmes M.Sc. Advanced Computer Science Artificial Intelligence Computing and Information Technology Information Technology Human

More information

Perspectives on Data Mining

Perspectives on Data Mining Perspectives on Data Mining Niall Adams Department of Mathematics, Imperial College London n.adams@imperial.ac.uk April 2009 Objectives Give an introductory overview of data mining (DM) (or Knowledge Discovery

More information

Better planning and forecasting with IBM Predictive Analytics

Better planning and forecasting with IBM Predictive Analytics IBM Software Business Analytics SPSS Predictive Analytics Better planning and forecasting with IBM Predictive Analytics Using IBM Cognos TM1 with IBM SPSS Predictive Analytics to build better plans and

More information

An interdisciplinary model for analytics education

An interdisciplinary model for analytics education An interdisciplinary model for analytics education Raffaella Settimi, PhD School of Computing, DePaul University Drew Conway s Data Science Venn Diagram http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram

More information

Machine Learning. 01 - Introduction

Machine Learning. 01 - Introduction Machine Learning 01 - Introduction Machine learning course One lecture (Wednesday, 9:30, 346) and one exercise (Monday, 17:15, 203). Oral exam, 20 minutes, 5 credit points. Some basic mathematical knowledge

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Data Mining Analytics for Business Intelligence and Decision Support

Data Mining Analytics for Business Intelligence and Decision Support Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing

More information

NOS for Data Analysis (802) September 2014 V1.3

NOS for Data Analysis (802) September 2014 V1.3 NOS for Data Analysis (802) September 2014 V1.3 NOS Reference ESKITP802301 ESKITP802401 ESKITP802501 ESKITP802601 NOS Title Assist in Delivering Routine Data Analysis Studies Design and Implement Data

More information

I N T E L L I G E N T S O L U T I O N S, I N C. DATA MINING IMPLEMENTING THE PARADIGM SHIFT IN ANALYSIS & MODELING OF THE OILFIELD

I N T E L L I G E N T S O L U T I O N S, I N C. DATA MINING IMPLEMENTING THE PARADIGM SHIFT IN ANALYSIS & MODELING OF THE OILFIELD I N T E L L I G E N T S O L U T I O N S, I N C. OILFIELD DATA MINING IMPLEMENTING THE PARADIGM SHIFT IN ANALYSIS & MODELING OF THE OILFIELD 5 5 T A R A P L A C E M O R G A N T O W N, W V 2 6 0 5 0 USA

More information

Master of Science in Marketing Analytics (MSMA)

Master of Science in Marketing Analytics (MSMA) Master of Science in Marketing Analytics (MSMA) COURSE DESCRIPTION The Master of Science in Marketing Analytics program teaches students how to become more engaged with consumers, how to design and deliver

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:

More information

IN THE CITY OF NEW YORK Decision Risk and Operations. Advanced Business Analytics Fall 2015

IN THE CITY OF NEW YORK Decision Risk and Operations. Advanced Business Analytics Fall 2015 Advanced Business Analytics Fall 2015 Course Description Business Analytics is about information turning data into action. Its value derives fundamentally from information gaps in the economic choices

More information

Visualization methods for patent data

Visualization methods for patent data Visualization methods for patent data Treparel 2013 Dr. Anton Heijs (CTO & Founder) Delft, The Netherlands Introduction Treparel can provide advanced visualizations for patent data. This document describes

More information

University of Manchester Health Data Science Masters Modules

University of Manchester Health Data Science Masters Modules University of Manchester Health Data Science Masters Modules We are taking applications now for Masters CPD modules beginning in February. All modules are 15 credits and cost 750. Timetable is as follows

More information

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an

More information

Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010

Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010 Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010 Ernst van Waning Senior Sales Engineer May 28, 2010 Agenda SPSS, an IBM Company SPSS Statistics User-driven product

More information

Final Project Report

Final Project Report CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes

More information

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments Contents List of Figures Foreword Preface xxv xxiii xv Acknowledgments xxix Chapter 1 Fraud: Detection, Prevention, and Analytics! 1 Introduction 2 Fraud! 2 Fraud Detection and Prevention 10 Big Data for

More information

An Ontology Based Text Analytics on Social Media

An Ontology Based Text Analytics on Social Media , pp.233-240 http://dx.doi.org/10.14257/ijdta.2015.8.5.20 An Ontology Based Text Analytics on Social Media Pankajdeep Kaur, Pallavi Sharma and Nikhil Vohra GNDU, Regional Campus, GNDU, Regional Campus,

More information

Machine Learning using MapReduce

Machine Learning using MapReduce Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

More information

Big Data & Security. Aljosa Pasic 12/02/2015

Big Data & Security. Aljosa Pasic 12/02/2015 Big Data & Security Aljosa Pasic 12/02/2015 Welcome to Madrid!!! Big Data AND security: what is there on our minds? Big Data tools and technologies Big Data T&T chain and security/privacy concern mappings

More information

480093 - TDS - Socio-Environmental Data Science

480093 - TDS - Socio-Environmental Data Science Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 2015 480 - IS.UPC - University Research Institute for Sustainability Science and Technology 715 - EIO - Department of Statistics and

More information

Analytics in Action. What do Jeopardy, Pampers, and Major League Baseball all have in common? October 24, 2012

Analytics in Action. What do Jeopardy, Pampers, and Major League Baseball all have in common? October 24, 2012 Analytics in Action What do Jeopardy, Pampers, and Major League Baseball all have in common? October 24, 2012 University of Cincinnati Tangeman University Center Theater Sponsored by LUCRUM, Inc. ABOUT

More information

Email: justinjia@ust.hk Office: LSK 5045 Begin subject: [ISOM3360]...

Email: justinjia@ust.hk Office: LSK 5045 Begin subject: [ISOM3360]... Business Intelligence and Data Mining ISOM 3360: Spring 2015 Instructor Contact Office Hours Course Schedule and Classroom Course Webpage Jia Jia, ISOM Email: justinjia@ust.hk Office: LSK 5045 Begin subject:

More information