Essex Big Data and Analytics Summer School 2015, 24 th - 28 th August 2015
|
|
- Magnus Garrett
- 8 years ago
- Views:
Transcription
1 Big Data Analytics Summer School 2015, 24 th - 28 th August 2015 Course code title Category Presenter Level BD001 Introduction to R 5 days BD002 BD003 Big Data Methods in R Science Big Data Leo Schalkwyk, Szymon Walkowiak, UK Data Service, Andrew Harrison, (TBC) R is an interactive computing environment programming language designed for statistical analysis graphics. Extensions to the basic capabilities of R are straightforward to produce share with others. It is widely increasingly used in many Big Data fields of research including bioinformatics. Because of its power flexibility, R is more deming to learn than traditional statistical packages but rewards some initial effort. This course is based tested material that we have been using for nearly 10 years to help research students, postdocs faculty get started in their own data analysis, is refined each time based on feedback. It is aimed at people who may have little or no programming experience. course will emphasize the fundamentals of the R language in an intensive format where each student has a computer 50% of the time is spent on practical exercises, will include a special module on techniques. This course will provide participants with an array of major techniques essential R programming skills in data analysis process of large complex socio-economic datasets. In particular, the participants will be introduced to: basics of Big Data extraction technical requirements for effective Big Data manipulation Methods of Big Data management including sub-setting, data transformations, screening for missing values etc. R packages supporting Big Data manipulation techniques e.g. extracting converting between dates times formats, text mining etc. Descriptive statistics frequency tables for Big Data Libraries facilitating Big Data statistical computation modelling Interactive Big Data visualisation techniques process of Big Data product development course will involve active learning methods with case studies real socio-economic data. Prerequisite(s): A working proficiency in R or attendance at the Summer School s introduction to R Course (BD001) history of science has shown on many occasions the benefits of bringing data sets together. It has also shown that deep insights into the Universe lead to theories that provide elegant explanations behind great unifications of knowledge ( hence data). se theories can be, in many cases, described by mathematical concepts, giving clues to how we should best represent the data in order to aid understing.
2 Course code title Category Presenter Level BD004 - BD005 BD006 BD007 - Clustering Classification with in R Bayesian Computational Methods with applications (in R) Actuarial/Finan cial modelling with applications in R Introduction to Data Mining Berthold Lausen, Hongsheng Dai Saeed Aldahmani, Spyros Vrontos, Beatriz de la Iglesia, East Anglia, ESRC Business Local Government Data Research Centre Advanced /intermediat e 8 hours (over 2 days) I will describe some of the best understood theories their representations. I will break scientific studies into those of simple, complex complicated systems. New sources of simple complex data may offer our best chance of providing new unifications understing of the causal structures within nature. Whereas complicated sources may offer little hope of inferring causality. short course gives an introduction in cluster analysis (unsupervised learning) classification (supervised learning). concept of k-means clustering hierarchical clustering are discussed applied in R. Linear discriminant analysis, logistic regression, classification regression trees (CART) rom forests are introduced as examples of statistical learning methods. Crossvalidation- bootstrap-methods are applied to assess classifiers. Using R, participants analyse example data sets compute estimates of the misclassification rate of the area under the receiver-operating characteristic (ROC). Prerequisite(s): Basic skills using R, basic concepts in statistics as correlation linear regression. course will first provide a brief introduction on Bayesian analysis then cover Markov chain Monte Carlo (MCMC) methods, such as Metropolis-Hastings algorithm Gibbs sampler. on mixture models, change-point problems regression analysis will also be covered in the lecture. course includes a 2-hour lab session to help audience be familiar with implementing MCMC algorithms using R. Prerequisite(s): Participants should have knowledge of at least first-year statistics probability R. Modelling claim frequency claim severity in general insurance, distribution fitting, application of generalised linear models in pricing, ratemaking bonus malus systems. Modelling the returns of financial assets. Option pricing in finance insurance. Monte Carlo methods their application in option pricing in pricing life insurance liabilities. Extensive applications in R with real simulated data sets. Prerequisite(s): Basic knowledge of statistics R. course will introduce the topic of data mining, will present a methodology for Knowledge Discovery in databases (KDD). tasks of clustering classification will be explored in some detail. We will look at an open source data mining package for some practical guidance on how to put what has been learned to practice.
3 Course code title Category Presenter Level BD008 BD009 BD010 BD011 BD012 A (gentle) introduction to reinforcement learning Search in big data Practical sentiment analysis High performance computing Data Protection Liability in the Age of Big Legal ethical issues Spyros Samothrakis, Allan Hanbury, Vienna Technology Diana Maynard, Sheffield Adrian Clark, Audrey Guinchard, 6 or 8 hours /advanced 8 hour Reinforcement learning is concerned with learning how to act optimally in the presence of rewards punishments. This short course on reinforcement learning will help you underst the basics provide a solid foundation necessary for advanced topics. It will have both a practical (two hour) a theoretical (two hour) component. Topics to be addressed are Markov Decision Processes, Monte Carlo methods, SARSA Q-Learning. Prerequisite(s): Some mathematical/computer science sophistication (e.g. understing summation, recursion, means/medians). As the amount of text data stored by organisations grows, information retrieval technologies become increasingly important. Effective use of search technologies are essential to ensuring that the key information is available when decisions are made. This course will start by covering the basics of information retrieval, such as indexing keyword search. It will then cover adapting search to specific domains (such as the technical health domains), will finally present how the effectiveness of search technologies is evaluated. Prerequisite(s): participants need to be comfortable in basic mathematics, especially linear algebra. This tutorial will introduce the concept of sentiment analysis from unstructured text. It will cover both rule-based machine learning techniques, provide some background information on the key underlying NLP text analysis processes required, look in detail at some of the major problems solutions, such as detection of sarcasm, use of informal language, spam opinion detection, trustworthiness of opinion holders, so on. techniques will be demonstrated with real applications developed in GATE, an opensource language processing toolkit. Hs-on exercises relevant materials will be provided for participants to try out the applications, to experiment with building their own tools, both in GATE with other common tools. Prerequisite(s): No prior knowledge of GATE, Java or Natural Language Processing (NLP) is required to attend this tutorial. However, it will include a hs-on element where you will be able to try simple things out in GATE, the tool we use for NLP tasks. This course introduces participants to high performance computing. first half of the course will cover principles (floating-point computation, speeding up code, compute clusters, using MPI) while, in the second part, participants will have the opportunity to build use a small cluster. Prerequisite(s): course assumes knowledge of programming in Python/C/C++. This session aims to introduce the current EU UK data protection regime the changes to be brought in by the future General Data Protection Regulation late Furthermore, the session will present allow for discussion of the specific challenges big
4 Course code title Category Presenter Level Data Analytics data bring, especially in light of the reports published by various data protection regulators on both at UK EU levels. BD013 BD014 BD015 Managing, curating publishing data Secure access protocols for Big Data Agent based modelling for business Curation management of data Curation management of data Sharon Bolton Louise Corti, UK Data Service, Libby Bishop Felix Ritchie, UK Data Service, Abhijit Sengupta Big data may come from a range of sources organisations, which may not be used to the idea of sharing their data with researchers. refore, they might not realise what researchers need so some of the features traditionally present that make research data easier to use might not be available. This can bring a range of problems, some of which can be addressed by good data curation. course will start with what the legal issues in brokering data. assessment of : issues of trust in quality of the source. Who is the provider? Also, it will highlight ethical issues content use of personal data. For example, some of the questions we plan to address in the session include: Data confidentiality are people identifiable from the data? Metadata accompanying documentation do users have enough information about what the data means how it can be used? Formats, size usability what kind of software, hardware techniques are needed? Publishing data products or data to support a journal article. What does the supporting data look like for verification? Run a hs-on exercise publishing a small datasets in a repository providing necessary metadata documentation. To learn what curation is what is needed. On aspects of accessing using confidential sources of Big Data. Five Safes of data access Big Data confidentiality/privacy/ethical considerations: what you need to know How to be a Safe Person when using confidential sources of Big Data Using Big Data responsibly Designing a Safe Setting for Big Data Disclosure control techniques: to data, to your research outputs objective is to prepare people who want to access confidential sources of Big Data. y might be making an application to a data owner, or for funding which has to go through an ethics panel. Or they might be using Big Data but unaware of some of the confidential/privacy/public-perception issues that surround collection analysis of Big Data. Advanced This course will start by providing students with an overview of the nature of business applications where Agent Based Modelling (ABM) can be useful, relevant practical. It will then proceed with some real world examples where ABM has been used, particularly in the context of the Fast Moving Consumer Goods (FMCG) sector. A few of these examples, which are in public domain have had an academic influence, will be discussed in detail.
5 Course code title Category Presenter Level BD016 BD017 BD018 Machine Learning with Mahout (tbc) Big Data Finance Analytics Cognitive Computing Richard Skeggs, ESRC Business Local Government Data Research Centre Neil Kellard, Detlef Nauck Martin Spott, British Telecom /advanced lecture will conclude with some indicators of where the future of this modelling paradigm lies in the context of business applications. Prerequisite(s): Understing of complex systems phenomena. Familiarity with social networks properties of networks. Reasonable knowledge of at least one ABM toolkit such as Repast or NetLogo. All practical examples in this course will be NetLogo based. This is an introduction into the use of machine learning algorithms supported by the Apache Mahout framework. class will concentrate on what problems can be solved using Mahout before looking at the common classifiers used by Mahout to achieve those objectives. Finally the class will look at building some simple working examples to see Mahout in practice. Prerequisite(s): Knowledge of the Java programming language is essential. Some statistical knowledge will be useful but not essential. Big data is the term for a collection of data sets so large complex that it becomes difficult to process using on-h database management tools or traditional data processing applications. Given contemporary computing power potential data collection, many firms, particularly those from the financial sector, wish to use. challenges include capture, curation, storage search, sharing, transfer, data visualization. primary purpose of this course is to provide the participant with an understing of data analytic approaches in finance. first part covers high frequency trading predictive. second part will concentrate on the application of data in risk modelling, corporate finance, fraud personal finance. Prerequisite(s): Some background in statistics/mathematics/econometrics is desirable but not essential. While we are successfully addressing the challenges behind storing managing massive amounts of data through technologies, we are still facing large obstacles in successfully quickly analysing that data. view that one analyst uses tools from statistics, machine learning data mining to find answers in data rapidly becomes outdated in the face of an overwhelming amount variety of data an ever increasing dem for evidence based decision making. We now need to look into concepts of collaborative distributed where analysts work together combine individual results to an overall answer. We need tools that can deal with uncertainty can assess the quality of potential answers. We need new human-computer interfaces that allow computers to really help analysts find answers that they could not have come up with themselves. We also need computers help analysts to illustrate explain the outcome of to decision makers so they have confidence in the results. Cognitive Computing addresses several of these issues. Cognitive Computing looks at how we get computers to behave interact the way humans do. Systems like IBM s Watson can deal with huge
6 Course code title Category Presenter Level BD019 BD020 BD021 BD022 Stream Processing Data Analytics for Smart City Crowdsourcing Human Computation From Big Data to Big Value Introduction to Big Data Statistics TBC Sefki Kolozali Nazli Farajidavar Surrey Jon Chamberlain, Richard Mason, Intel Nathan Cunningham, UK Data Service, /advanced volumes of data, identify knowledge patterns in the data apply this to the problem the analyst is trying to solve by giving them different alternatives to consider in particular the underlying evidence that supports those alternatives. This course looks at the challenges modern is facing explores how ideas from Cognitive Computing can lead to a new era of data. Prerequisite(s): A basic understing of what is involved in running a data science project. In this course we cover some of background concepts related to the Internet of Things Web of Things, Semantic Technologies in the smart city domain will describe solution for processing information extraction from real world data. Use-cases examples from the smart city domain will be described. We will also discuss some of the machine learning techniques data tools methods that can be used to process analyse the smart city data. Prerequisite(s): Familiarity with machine learning techniques semantic web technologies would be useful but is not compulsory Crowdsourcing has established itself in the mainstream of research methodology in recent years, using a variety of methods to engage many non-expert users to solve problems that computers or limited expert users cannot solve. Whilst the concept of human computation goes some way towards solving problems, it also introduces new challenges of data quality, participant recruitment incentivisation. This course will introduce 3 common methods of crowdsourcing: peer-production; microworking games-with-a-purpose, as well as an emerging approach using social networks as a powerful problem solving monitoring tool. Participants are encouraged to bring examples of data they would like annotated or tasks that need humans to solve for discussion as to which approach might be suitable how to implement it. Learn how Intel is harnessing Big Data to drive operational efficiency revenue optimisation across the organisation. Discuss trends how Intel is embracing these trends to gain further insights, adoption value. This is a short introductory course into understing Big Data, what it is what strategies you can adopt to make the most out of it. It would be useful to bring a device for note taking. This course will cover: Putting new knowledge first. What question do you want to answer? Defining metrics for success. What is Big Data? What Big Data solutions are available to me for free? Do you know what your real sample size is?
7 Course code title Category Presenter Level Testing hypotheses calling things significant. Managing spurious correlations. Smoothing data to understing significant relationships spatial/temporal data Make as small as possible as quick as possible. Plotting your so you don t miss the obvious. Strategies for improving prediction accuracy by averaging many models together. Prerequisite(s): Familiarity with using applying science/research data to answer questions. An understing of statistics how databases operate is desirable. A basic overview of computing infrastructure algorithm will be discussed but at an introductory level assuming no prior knowledge. Keynote Lectures Company Presenter Title of talk Abstract Thomson Jochen Leidner Reuters Small Data Big Data: Qualitative Differences Resulting from Quantitative Scale Intel Mark Woodward Using Big Data to Generate Real Revenue for Business Fujitsu Joe Duran Impact of Research on Computing in Society Citigroup Stuart Jones Bridging the Gap Between Big Data, Statistics Business While the Big Data topic has received a lot of attention, one may wonder why exactly "more of the same should constitute a step change; for instance, we haven t declared a new academic field of "Big Plastic" just because we consume process more plastic than ever. In this talk, I critically assess which, if any, quantitative changes induce qualitative changes, whether the talk of as a new area is merited. Along the way, we will revisit a couple of past ongoing efforts that fall into the space apply these findings. This talk will describe how Intel takes advantage of large, complex data sources to achieve greater efficiency, cost saving new revenue opportunities across its business. As part of the talk, examples of initiatives real world business scenarios in the Technology Manufacturing world will be discussed. TBC This talk will describe how understing business objectives including revenue, expense risk management can be satisfied with statistical analysis of. As part of the talk examples of initiatives that will assist in detecting preventing fraudulent money-laundering activity in the financial world will be discussed.
15.45-16.00 Coffee break 16.00-18.00 Parallel session 1 BD022 Introduction to Big Data and Statistics Nathan Cunningham CSEE lab 2
Day Time Activity Course no Monday 08.00-08.30 Registration Course title Speaker Room Parallel session 2 BD003 Science and Big Data Andrew Harrison TC2.12 Parallel session 2 BD003 Science and Big Data
More informationLearning outcomes. Knowledge and understanding. Competence and skills
Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges
More informationPractical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
More informationBIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376
Course Director: Dr. Kayvan Najarian (DCM&B, kayvan@umich.edu) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.
More informationAzure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
More informationMACHINE LEARNING BASICS WITH R
MACHINE LEARNING [Hands-on Introduction of Supervised Machine Learning Methods] DURATION 2 DAY The field of machine learning is concerned with the question of how to construct computer programs that automatically
More informationHow to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning
How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume
More informationPredictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD
Predictive Analytics Techniques: What to Use For Your Big Data March 26, 2014 Fern Halper, PhD Presenter Proven Performance Since 1995 TDWI helps business and IT professionals gain insight about data warehousing,
More informationMasters in Information Technology
Computer - Information Technology MSc & MPhil - 2015/6 - July 2015 Masters in Information Technology Programme Requirements Taught Element, and PG Diploma in Information Technology: 120 credits: IS5101
More informationDATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
More informationANALYTICS CENTER LEARNING PROGRAM
Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals
More informationInformation Visualization WS 2013/14 11 Visual Analytics
1 11.1 Definitions and Motivation Lot of research and papers in this emerging field: Visual Analytics: Scope and Challenges of Keim et al. Illuminating the path of Thomas and Cook 2 11.1 Definitions and
More informationGovernment of Russian Federation. Faculty of Computer Science School of Data Analysis and Artificial Intelligence
Government of Russian Federation Federal State Autonomous Educational Institution of High Professional Education National Research University «Higher School of Economics» Faculty of Computer Science School
More informationService courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.
Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are
More informationFaculty of Science School of Mathematics and Statistics
Faculty of Science School of Mathematics and Statistics MATH5836 Data Mining and its Business Applications Semester 1, 2014 CRICOS Provider No: 00098G MATH5836 Course Outline Information about the course
More informationA GENERAL TAXONOMY FOR VISUALIZATION OF PREDICTIVE SOCIAL MEDIA ANALYTICS
A GENERAL TAXONOMY FOR VISUALIZATION OF PREDICTIVE SOCIAL MEDIA ANALYTICS Stacey Franklin Jones, D.Sc. ProTech Global Solutions Annapolis, MD Abstract The use of Social Media as a resource to characterize
More informationIntroduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p.
Introduction p. xvii Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p. 9 State of the Practice in Analytics p. 11 BI Versus
More informationMasters in Computing and Information Technology
Masters in Computing and Information Technology Programme Requirements Taught Element, and PG Diploma in Computing and Information Technology: 120 credits: IS5101 CS5001 or CS5002 CS5003 up to 30 credits
More informationMSCA 31000 Introduction to Statistical Concepts
MSCA 31000 Introduction to Statistical Concepts This course provides general exposure to basic statistical concepts that are necessary for students to understand the content presented in more advanced
More informationHow To Become A Data Scientist
Programme Specification Awarding Body/Institution Teaching Institution Queen Mary, University of London Queen Mary, University of London Name of Final Award and Programme Title Master of Science (MSc)
More informationKnowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
More informationCHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION Exploration is a process of discovery. In the database exploration process, an analyst executes a sequence of transformations over a collection of data structures to discover useful
More informationnot possible or was possible at a high cost for collecting the data.
Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day
More informationMachine Learning and Data Mining. Fundamentals, robotics, recognition
Machine Learning and Data Mining Fundamentals, robotics, recognition Machine Learning, Data Mining, Knowledge Discovery in Data Bases Their mutual relations Data Mining, Knowledge Discovery in Databases,
More informationCOLUMBIA UNIVERSITY IN THE CITY OF NEW YORK DEPARTMENT OF INDUSTRIAL ENGINEERING AND OPERATIONS RESEARCH
Course: IEOR 4575 Business Analytics for Operations Research Lectures MW 2:40-3:55PM Instructor Prof. Guillermo Gallego Office Hours Tuesdays: 3-4pm Office: CEPSR 822 (8 th floor) Textbooks and Learning
More informationIs a Data Scientist the New Quant? Stuart Kozola MathWorks
Is a Data Scientist the New Quant? Stuart Kozola MathWorks 2015 The MathWorks, Inc. 1 Facts or information used usually to calculate, analyze, or plan something Information that is produced or stored by
More informationData Science and Business Analytics Certificate Data Science and Business Intelligence Certificate
Data Science and Business Analytics Certificate Data Science and Business Intelligence Certificate Description The Helzberg School of Management has launched two graduate-level certificates: one in Data
More informationIntroduction to Data Mining
Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association
More informationMasters in Human Computer Interaction
Masters in Human Computer Interaction Programme Requirements Taught Element, and PG Diploma in Human Computer Interaction: 120 credits: IS5101 CS5001 CS5040 CS5041 CS5042 or CS5044 up to 30 credits from
More informationSanjeev Kumar. contribute
RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 sanjeevk@iasri.res.in 1. Introduction The field of data mining and knowledgee discovery is emerging as a
More informationMaster of Science in Health Information Technology Degree Curriculum
Master of Science in Health Information Technology Degree Curriculum Core courses: 8 courses Total Credit from Core Courses = 24 Core Courses Course Name HRS Pre-Req Choose MIS 525 or CIS 564: 1 MIS 525
More informationMasters in Advanced Computer Science
Masters in Advanced Computer Science Programme Requirements Taught Element, and PG Diploma in Advanced Computer Science: 120 credits: IS5101 CS5001 up to 30 credits from CS4100 - CS4450, subject to appropriate
More informationMSCA 31000 Introduction to Statistical Concepts
MSCA 31000 Introduction to Statistical Concepts This course provides general exposure to basic statistical concepts that are necessary for students to understand the content presented in more advanced
More informationMasters in Networks and Distributed Systems
Masters in Networks and Distributed Systems Programme Requirements Taught Element, and PG Diploma in Networks and Distributed Systems: 120 credits: IS5101 CS5001 CS5021 CS4103 or CS5023 in total, up to
More informationBIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
More informationNavigating Big Data business analytics
mwd a d v i s o r s Navigating Big Data business analytics Helena Schwenk A special report prepared for Actuate May 2013 This report is the third in a series and focuses principally on explaining what
More informationDATA SCIENCE CURRICULUM WEEK 1 ONLINE PRE-WORK INSTALLING PACKAGES COMMAND LINE CODE EDITOR PYTHON STATISTICS PROJECT O5 PROJECT O3 PROJECT O2
DATA SCIENCE CURRICULUM Before class even begins, students start an at-home pre-work phase. When they convene in class, students spend the first eight weeks doing iterative, project-centered skill acquisition.
More informationPredictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar
Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar Prepared by Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc. www.data-mines.com Louise.francis@data-mines.cm
More informationCLUSTER ANALYSIS WITH R
CLUSTER ANALYSIS WITH R [cluster analysis divides data into groups that are meaningful, useful, or both] LEARNING STAGE ADVANCED DURATION 3 DAY WHAT IS CLUSTER ANALYSIS? Cluster Analysis or Clustering
More informationInformation Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)
More informationAn Introduction to Data Mining
An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
More informationMasters in Artificial Intelligence
Masters in Artificial Intelligence Programme Requirements Taught Element, and PG Diploma in Artificial Intelligence: 120 credits: IS5101 CS5001 CS5010 CS5011 CS4402 or CS5012 in total, up to 30 credits
More informationAdvanced Big Data Analytics with R and Hadoop
REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional
More informationCOLLEGE OF SCIENCE. John D. Hromi Center for Quality and Applied Statistics
ROCHESTER INSTITUTE OF TECHNOLOGY COURSE OUTLINE FORM COLLEGE OF SCIENCE John D. Hromi Center for Quality and Applied Statistics NEW (or REVISED) COURSE: COS-STAT-747 Principles of Statistical Data Mining
More informationEPSRC Cross-SAT Big Data Workshop: Well Sorted Materials
EPSRC Cross-SAT Big Data Workshop: Well Sorted Materials 5th August 2015 Contents Introduction 1 Dendrogram 2 Tree Map 3 Heat Map 4 Raw Group Data 5 For an online, interactive version of the visualisations
More informationWhy is Internal Audit so Hard?
Why is Internal Audit so Hard? 2 2014 Why is Internal Audit so Hard? 3 2014 Why is Internal Audit so Hard? Waste Abuse Fraud 4 2014 Waves of Change 1 st Wave Personal Computers Electronic Spreadsheets
More informationData Mining Applications in Higher Education
Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2
More informationCS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing
CS Master Level Courses and Areas The graduate courses offered may change over time, in response to new developments in computer science and the interests of faculty and students; the list of graduate
More informationText Analytics. A business guide
Text Analytics A business guide February 2014 Contents 3 The Business Value of Text Analytics 4 What is Text Analytics? 6 Text Analytics Methods 8 Unstructured Meets Structured Data 9 Business Application
More informationKATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics
ROCHESTER INSTITUTE OF TECHNOLOGY COURSE OUTLINE FORM KATE GLEASON COLLEGE OF ENGINEERING John D. Hromi Center for Quality and Applied Statistics NEW (or REVISED) COURSE (KGCOE- CQAS- 747- Principles of
More informationData Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
More informationAuto-Classification for Document Archiving and Records Declaration
Auto-Classification for Document Archiving and Records Declaration Josemina Magdalen, Architect, IBM November 15, 2013 Agenda IBM / ECM/ Content Classification for Document Archiving and Records Management
More informationData analytics Delivering intelligence in the moment
www.pwc.co.uk Data analytics Delivering intelligence in the moment January 2014 Our point of view Extracting insight from an organisation s data and applying it to business decisions has long been a necessary
More informationLecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions
SMA 50: Statistical Learning and Data Mining in Bioinformatics (also listed as 5.077: Statistical Learning and Data Mining ()) Spring Term (Feb May 200) Faculty: Professor Roy Welsch Wed 0 Feb 7:00-8:0
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationUsing Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
More informationSAS JOINT DATA MINING CERTIFICATION AT BRYANT UNIVERSITY
SAS JOINT DATA MINING CERTIFICATION AT BRYANT UNIVERSITY Billie Anderson Bryant University, 1150 Douglas Pike, Smithfield, RI 02917 Phone: (401) 232-6089, e-mail: banderson@bryant.edu Phyllis Schumacher
More informationB.Sc. in Computer Information Systems Study Plan
195 Study Plan University Compulsory Courses Page ( 64 ) University Elective Courses Pages ( 64 & 65 ) Faculty Compulsory Courses 16 C.H 27 C.H 901010 MATH101 CALCULUS( I) 901020 MATH102 CALCULUS (2) 171210
More informationNAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE. Venu Govindaraju
NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE Venu Govindaraju BIOMETRICS DOCUMENT ANALYSIS PATTERN RECOGNITION 8/24/2015 ICDAR- 2015 2 Towards a Globally Optimal Approach for Learning Deep Unsupervised
More informationThe Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
More informationIntroduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing
Introduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza Moise, Evangelos Pournaras, Dirk Helbing 1 Overview Main principles of data mining Definition
More informationCourse Descriptions: Undergraduate/Graduate Certificate Program in Data Visualization and Analysis
9/3/2013 Course Descriptions: Undergraduate/Graduate Certificate Program in Data Visualization and Analysis Seton Hall University, South Orange, New Jersey http://www.shu.edu/go/dava Visualization and
More informationIT services for analyses of various data samples
IT services for analyses of various data samples Ján Paralič, František Babič, Martin Sarnovský, Peter Butka, Cecília Havrilová, Miroslava Muchová, Michal Puheim, Martin Mikula, Gabriel Tutoky Technical
More informationPROGRAM DIRECTOR: Arthur O Connor Email Contact: URL : THE PROGRAM Careers in Data Analytics Admissions Criteria CURRICULUM Program Requirements
Data Analytics (MS) PROGRAM DIRECTOR: Arthur O Connor CUNY School of Professional Studies 101 West 31 st Street, 7 th Floor New York, NY 10001 Email Contact: Arthur O Connor, arthur.oconnor@cuny.edu URL:
More informationHexaware E-book on Predictive Analytics
Hexaware E-book on Predictive Analytics Business Intelligence & Analytics Actionable Intelligence Enabled Published on : Feb 7, 2012 Hexaware E-book on Predictive Analytics What is Data mining? Data mining,
More informationMachine Learning: Overview
Machine Learning: Overview Why Learning? Learning is a core of property of being intelligent. Hence Machine learning is a core subarea of Artificial Intelligence. There is a need for programs to behave
More informationCertificate Program in Applied Big Data Analytics in Dubai. A Collaborative Program offered by INSOFE and Synergy-BI
Certificate Program in Applied Big Data Analytics in Dubai A Collaborative Program offered by INSOFE and Synergy-BI Program Overview Today s manager needs to be extremely data savvy. They need to work
More informationMaximizing Return and Minimizing Cost with the Decision Management Systems
KDD 2012: Beijing 18 th ACM SIGKDD Conference on Knowledge Discovery and Data Mining Rich Holada, Vice President, IBM SPSS Predictive Analytics Maximizing Return and Minimizing Cost with the Decision Management
More informationLearning is a very general term denoting the way in which agents:
What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);
More informationTurning Data into Actionable Insights: Predictive Analytics with MATLAB WHITE PAPER
Turning Data into Actionable Insights: Predictive Analytics with MATLAB WHITE PAPER Introduction: Knowing Your Risk Financial professionals constantly make decisions that impact future outcomes in the
More informationTeaching Computational Thinking using Cloud Computing: By A/P Tan Tin Wee
Teaching Computational Thinking using Cloud Computing: By A/P Tan Tin Wee Technology in Pedagogy, No. 8, April 2012 Written by Kiruthika Ragupathi (kiruthika@nus.edu.sg) Computational thinking is an emerging
More informationBig Data and Healthcare Payers WHITE PAPER
Knowledgent White Paper Series Big Data and Healthcare Payers WHITE PAPER Summary With the implementation of the Affordable Care Act, the transition to a more member-centric relationship model, and other
More informationAMIS 7640 Data Mining for Business Intelligence
The Ohio State University The Max M. Fisher College of Business Department of Accounting and Management Information Systems AMIS 7640 Data Mining for Business Intelligence Autumn Semester 2013, Session
More informationWROX Certified Big Data Analyst Program by AnalytixLabs and Wiley
WROX Certified Big Data Analyst Program by AnalytixLabs and Wiley Disclaimer: This material is protected under copyright act AnalytixLabs, 2011. Unauthorized use and/ or duplication of this material or
More informationCALL FOR APPLICATIONS FOR ADMISSION GRADUATE STUDY PROGRAM "MASTER OF SCIENCE in DATA SCIENCE" Part Time Program 2015-2016
CALL FOR APPLICATIONS FOR ADMISSION GRADUATE STUDY PROGRAM "MASTER OF SCIENCE in DATA SCIENCE" Part Time Program 2015-2016 Data Science is the study of data through computational and statistical techniques,
More informationPractical Calculation of Expected and Unexpected Losses in Operational Risk by Simulation Methods
Practical Calculation of Expected and Unexpected Losses in Operational Risk by Simulation Methods Enrique Navarrete 1 Abstract: This paper surveys the main difficulties involved with the quantitative measurement
More informationSchool of Computer Science
School of Computer Science Head of School Professor S Linton Taught Programmes M.Sc. Advanced Computer Science Artificial Intelligence Computing and Information Technology Information Technology Human
More informationPerspectives on Data Mining
Perspectives on Data Mining Niall Adams Department of Mathematics, Imperial College London n.adams@imperial.ac.uk April 2009 Objectives Give an introductory overview of data mining (DM) (or Knowledge Discovery
More informationBetter planning and forecasting with IBM Predictive Analytics
IBM Software Business Analytics SPSS Predictive Analytics Better planning and forecasting with IBM Predictive Analytics Using IBM Cognos TM1 with IBM SPSS Predictive Analytics to build better plans and
More informationAn interdisciplinary model for analytics education
An interdisciplinary model for analytics education Raffaella Settimi, PhD School of Computing, DePaul University Drew Conway s Data Science Venn Diagram http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
More informationMachine Learning. 01 - Introduction
Machine Learning 01 - Introduction Machine learning course One lecture (Wednesday, 9:30, 346) and one exercise (Monday, 17:15, 203). Oral exam, 20 minutes, 5 credit points. Some basic mathematical knowledge
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationData Mining Analytics for Business Intelligence and Decision Support
Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing
More informationNOS for Data Analysis (802) September 2014 V1.3
NOS for Data Analysis (802) September 2014 V1.3 NOS Reference ESKITP802301 ESKITP802401 ESKITP802501 ESKITP802601 NOS Title Assist in Delivering Routine Data Analysis Studies Design and Implement Data
More informationI N T E L L I G E N T S O L U T I O N S, I N C. DATA MINING IMPLEMENTING THE PARADIGM SHIFT IN ANALYSIS & MODELING OF THE OILFIELD
I N T E L L I G E N T S O L U T I O N S, I N C. OILFIELD DATA MINING IMPLEMENTING THE PARADIGM SHIFT IN ANALYSIS & MODELING OF THE OILFIELD 5 5 T A R A P L A C E M O R G A N T O W N, W V 2 6 0 5 0 USA
More informationMaster of Science in Marketing Analytics (MSMA)
Master of Science in Marketing Analytics (MSMA) COURSE DESCRIPTION The Master of Science in Marketing Analytics program teaches students how to become more engaged with consumers, how to design and deliver
More informationIntroduction to Data Mining
Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:
More informationIN THE CITY OF NEW YORK Decision Risk and Operations. Advanced Business Analytics Fall 2015
Advanced Business Analytics Fall 2015 Course Description Business Analytics is about information turning data into action. Its value derives fundamentally from information gaps in the economic choices
More informationVisualization methods for patent data
Visualization methods for patent data Treparel 2013 Dr. Anton Heijs (CTO & Founder) Delft, The Netherlands Introduction Treparel can provide advanced visualizations for patent data. This document describes
More informationUniversity of Manchester Health Data Science Masters Modules
University of Manchester Health Data Science Masters Modules We are taking applications now for Masters CPD modules beginning in February. All modules are 15 credits and cost 750. Timetable is as follows
More informationBIDM Project. Predicting the contract type for IT/ITES outsourcing contracts
BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an
More informationDecision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010
Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010 Ernst van Waning Senior Sales Engineer May 28, 2010 Agenda SPSS, an IBM Company SPSS Statistics User-driven product
More informationFinal Project Report
CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes
More informationCOPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments
Contents List of Figures Foreword Preface xxv xxiii xv Acknowledgments xxix Chapter 1 Fraud: Detection, Prevention, and Analytics! 1 Introduction 2 Fraud! 2 Fraud Detection and Prevention 10 Big Data for
More informationAn Ontology Based Text Analytics on Social Media
, pp.233-240 http://dx.doi.org/10.14257/ijdta.2015.8.5.20 An Ontology Based Text Analytics on Social Media Pankajdeep Kaur, Pallavi Sharma and Nikhil Vohra GNDU, Regional Campus, GNDU, Regional Campus,
More informationMachine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
More informationBig Data & Security. Aljosa Pasic 12/02/2015
Big Data & Security Aljosa Pasic 12/02/2015 Welcome to Madrid!!! Big Data AND security: what is there on our minds? Big Data tools and technologies Big Data T&T chain and security/privacy concern mappings
More information480093 - TDS - Socio-Environmental Data Science
Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 2015 480 - IS.UPC - University Research Institute for Sustainability Science and Technology 715 - EIO - Department of Statistics and
More informationAnalytics in Action. What do Jeopardy, Pampers, and Major League Baseball all have in common? October 24, 2012
Analytics in Action What do Jeopardy, Pampers, and Major League Baseball all have in common? October 24, 2012 University of Cincinnati Tangeman University Center Theater Sponsored by LUCRUM, Inc. ABOUT
More informationEmail: justinjia@ust.hk Office: LSK 5045 Begin subject: [ISOM3360]...
Business Intelligence and Data Mining ISOM 3360: Spring 2015 Instructor Contact Office Hours Course Schedule and Classroom Course Webpage Jia Jia, ISOM Email: justinjia@ust.hk Office: LSK 5045 Begin subject:
More information