1 Big Data Analytics Summer School 2015, 24 th - 28 th August 2015 Course code title Category Presenter Level BD001 Introduction to R 5 days BD002 BD003 Big Data Methods in R Science Big Data Leo Schalkwyk, Szymon Walkowiak, UK Data Service, Andrew Harrison, (TBC) R is an interactive computing environment programming language designed for statistical analysis graphics. Extensions to the basic capabilities of R are straightforward to produce share with others. It is widely increasingly used in many Big Data fields of research including bioinformatics. Because of its power flexibility, R is more deming to learn than traditional statistical packages but rewards some initial effort. This course is based tested material that we have been using for nearly 10 years to help research students, postdocs faculty get started in their own data analysis, is refined each time based on feedback. It is aimed at people who may have little or no programming experience. course will emphasize the fundamentals of the R language in an intensive format where each student has a computer 50% of the time is spent on practical exercises, will include a special module on techniques. This course will provide participants with an array of major techniques essential R programming skills in data analysis process of large complex socio-economic datasets. In particular, the participants will be introduced to: basics of Big Data extraction technical requirements for effective Big Data manipulation Methods of Big Data management including sub-setting, data transformations, screening for missing values etc. R packages supporting Big Data manipulation techniques e.g. extracting converting between dates times formats, text mining etc. Descriptive statistics frequency tables for Big Data Libraries facilitating Big Data statistical computation modelling Interactive Big Data visualisation techniques process of Big Data product development course will involve active learning methods with case studies real socio-economic data. Prerequisite(s): A working proficiency in R or attendance at the Summer School s introduction to R Course (BD001) history of science has shown on many occasions the benefits of bringing data sets together. It has also shown that deep insights into the Universe lead to theories that provide elegant explanations behind great unifications of knowledge ( hence data). se theories can be, in many cases, described by mathematical concepts, giving clues to how we should best represent the data in order to aid understing.
2 Course code title Category Presenter Level BD004 - BD005 BD006 BD007 - Clustering Classification with in R Bayesian Computational Methods with applications (in R) Actuarial/Finan cial modelling with applications in R Introduction to Data Mining Berthold Lausen, Hongsheng Dai Saeed Aldahmani, Spyros Vrontos, Beatriz de la Iglesia, East Anglia, ESRC Business Local Government Data Research Centre Advanced /intermediat e 8 hours (over 2 days) I will describe some of the best understood theories their representations. I will break scientific studies into those of simple, complex complicated systems. New sources of simple complex data may offer our best chance of providing new unifications understing of the causal structures within nature. Whereas complicated sources may offer little hope of inferring causality. short course gives an introduction in cluster analysis (unsupervised learning) classification (supervised learning). concept of k-means clustering hierarchical clustering are discussed applied in R. Linear discriminant analysis, logistic regression, classification regression trees (CART) rom forests are introduced as examples of statistical learning methods. Crossvalidation- bootstrap-methods are applied to assess classifiers. Using R, participants analyse example data sets compute estimates of the misclassification rate of the area under the receiver-operating characteristic (ROC). Prerequisite(s): Basic skills using R, basic concepts in statistics as correlation linear regression. course will first provide a brief introduction on Bayesian analysis then cover Markov chain Monte Carlo (MCMC) methods, such as Metropolis-Hastings algorithm Gibbs sampler. on mixture models, change-point problems regression analysis will also be covered in the lecture. course includes a 2-hour lab session to help audience be familiar with implementing MCMC algorithms using R. Prerequisite(s): Participants should have knowledge of at least first-year statistics probability R. Modelling claim frequency claim severity in general insurance, distribution fitting, application of generalised linear models in pricing, ratemaking bonus malus systems. Modelling the returns of financial assets. Option pricing in finance insurance. Monte Carlo methods their application in option pricing in pricing life insurance liabilities. Extensive applications in R with real simulated data sets. Prerequisite(s): Basic knowledge of statistics R. course will introduce the topic of data mining, will present a methodology for Knowledge Discovery in databases (KDD). tasks of clustering classification will be explored in some detail. We will look at an open source data mining package for some practical guidance on how to put what has been learned to practice.
3 Course code title Category Presenter Level BD008 BD009 BD010 BD011 BD012 A (gentle) introduction to reinforcement learning Search in big data Practical sentiment analysis High performance computing Data Protection Liability in the Age of Big Legal ethical issues Spyros Samothrakis, Allan Hanbury, Vienna Technology Diana Maynard, Sheffield Adrian Clark, Audrey Guinchard, 6 or 8 hours /advanced 8 hour Reinforcement learning is concerned with learning how to act optimally in the presence of rewards punishments. This short course on reinforcement learning will help you underst the basics provide a solid foundation necessary for advanced topics. It will have both a practical (two hour) a theoretical (two hour) component. Topics to be addressed are Markov Decision Processes, Monte Carlo methods, SARSA Q-Learning. Prerequisite(s): Some mathematical/computer science sophistication (e.g. understing summation, recursion, means/medians). As the amount of text data stored by organisations grows, information retrieval technologies become increasingly important. Effective use of search technologies are essential to ensuring that the key information is available when decisions are made. This course will start by covering the basics of information retrieval, such as indexing keyword search. It will then cover adapting search to specific domains (such as the technical health domains), will finally present how the effectiveness of search technologies is evaluated. Prerequisite(s): participants need to be comfortable in basic mathematics, especially linear algebra. This tutorial will introduce the concept of sentiment analysis from unstructured text. It will cover both rule-based machine learning techniques, provide some background information on the key underlying NLP text analysis processes required, look in detail at some of the major problems solutions, such as detection of sarcasm, use of informal language, spam opinion detection, trustworthiness of opinion holders, so on. techniques will be demonstrated with real applications developed in GATE, an opensource language processing toolkit. Hs-on exercises relevant materials will be provided for participants to try out the applications, to experiment with building their own tools, both in GATE with other common tools. Prerequisite(s): No prior knowledge of GATE, Java or Natural Language Processing (NLP) is required to attend this tutorial. However, it will include a hs-on element where you will be able to try simple things out in GATE, the tool we use for NLP tasks. This course introduces participants to high performance computing. first half of the course will cover principles (floating-point computation, speeding up code, compute clusters, using MPI) while, in the second part, participants will have the opportunity to build use a small cluster. Prerequisite(s): course assumes knowledge of programming in Python/C/C++. This session aims to introduce the current EU UK data protection regime the changes to be brought in by the future General Data Protection Regulation late Furthermore, the session will present allow for discussion of the specific challenges big
4 Course code title Category Presenter Level Data Analytics data bring, especially in light of the reports published by various data protection regulators on both at UK EU levels. BD013 BD014 BD015 Managing, curating publishing data Secure access protocols for Big Data Agent based modelling for business Curation management of data Curation management of data Sharon Bolton Louise Corti, UK Data Service, Libby Bishop Felix Ritchie, UK Data Service, Abhijit Sengupta Big data may come from a range of sources organisations, which may not be used to the idea of sharing their data with researchers. refore, they might not realise what researchers need so some of the features traditionally present that make research data easier to use might not be available. This can bring a range of problems, some of which can be addressed by good data curation. course will start with what the legal issues in brokering data. assessment of : issues of trust in quality of the source. Who is the provider? Also, it will highlight ethical issues content use of personal data. For example, some of the questions we plan to address in the session include: Data confidentiality are people identifiable from the data? Metadata accompanying documentation do users have enough information about what the data means how it can be used? Formats, size usability what kind of software, hardware techniques are needed? Publishing data products or data to support a journal article. What does the supporting data look like for verification? Run a hs-on exercise publishing a small datasets in a repository providing necessary metadata documentation. To learn what curation is what is needed. On aspects of accessing using confidential sources of Big Data. Five Safes of data access Big Data confidentiality/privacy/ethical considerations: what you need to know How to be a Safe Person when using confidential sources of Big Data Using Big Data responsibly Designing a Safe Setting for Big Data Disclosure control techniques: to data, to your research outputs objective is to prepare people who want to access confidential sources of Big Data. y might be making an application to a data owner, or for funding which has to go through an ethics panel. Or they might be using Big Data but unaware of some of the confidential/privacy/public-perception issues that surround collection analysis of Big Data. Advanced This course will start by providing students with an overview of the nature of business applications where Agent Based Modelling (ABM) can be useful, relevant practical. It will then proceed with some real world examples where ABM has been used, particularly in the context of the Fast Moving Consumer Goods (FMCG) sector. A few of these examples, which are in public domain have had an academic influence, will be discussed in detail.
5 Course code title Category Presenter Level BD016 BD017 BD018 Machine Learning with Mahout (tbc) Big Data Finance Analytics Cognitive Computing Richard Skeggs, ESRC Business Local Government Data Research Centre Neil Kellard, Detlef Nauck Martin Spott, British Telecom /advanced lecture will conclude with some indicators of where the future of this modelling paradigm lies in the context of business applications. Prerequisite(s): Understing of complex systems phenomena. Familiarity with social networks properties of networks. Reasonable knowledge of at least one ABM toolkit such as Repast or NetLogo. All practical examples in this course will be NetLogo based. This is an introduction into the use of machine learning algorithms supported by the Apache Mahout framework. class will concentrate on what problems can be solved using Mahout before looking at the common classifiers used by Mahout to achieve those objectives. Finally the class will look at building some simple working examples to see Mahout in practice. Prerequisite(s): Knowledge of the Java programming language is essential. Some statistical knowledge will be useful but not essential. Big data is the term for a collection of data sets so large complex that it becomes difficult to process using on-h database management tools or traditional data processing applications. Given contemporary computing power potential data collection, many firms, particularly those from the financial sector, wish to use. challenges include capture, curation, storage search, sharing, transfer, data visualization. primary purpose of this course is to provide the participant with an understing of data analytic approaches in finance. first part covers high frequency trading predictive. second part will concentrate on the application of data in risk modelling, corporate finance, fraud personal finance. Prerequisite(s): Some background in statistics/mathematics/econometrics is desirable but not essential. While we are successfully addressing the challenges behind storing managing massive amounts of data through technologies, we are still facing large obstacles in successfully quickly analysing that data. view that one analyst uses tools from statistics, machine learning data mining to find answers in data rapidly becomes outdated in the face of an overwhelming amount variety of data an ever increasing dem for evidence based decision making. We now need to look into concepts of collaborative distributed where analysts work together combine individual results to an overall answer. We need tools that can deal with uncertainty can assess the quality of potential answers. We need new human-computer interfaces that allow computers to really help analysts find answers that they could not have come up with themselves. We also need computers help analysts to illustrate explain the outcome of to decision makers so they have confidence in the results. Cognitive Computing addresses several of these issues. Cognitive Computing looks at how we get computers to behave interact the way humans do. Systems like IBM s Watson can deal with huge
6 Course code title Category Presenter Level BD019 BD020 BD021 BD022 Stream Processing Data Analytics for Smart City Crowdsourcing Human Computation From Big Data to Big Value Introduction to Big Data Statistics TBC Sefki Kolozali Nazli Farajidavar Surrey Jon Chamberlain, Richard Mason, Intel Nathan Cunningham, UK Data Service, /advanced volumes of data, identify knowledge patterns in the data apply this to the problem the analyst is trying to solve by giving them different alternatives to consider in particular the underlying evidence that supports those alternatives. This course looks at the challenges modern is facing explores how ideas from Cognitive Computing can lead to a new era of data. Prerequisite(s): A basic understing of what is involved in running a data science project. In this course we cover some of background concepts related to the Internet of Things Web of Things, Semantic Technologies in the smart city domain will describe solution for processing information extraction from real world data. Use-cases examples from the smart city domain will be described. We will also discuss some of the machine learning techniques data tools methods that can be used to process analyse the smart city data. Prerequisite(s): Familiarity with machine learning techniques semantic web technologies would be useful but is not compulsory Crowdsourcing has established itself in the mainstream of research methodology in recent years, using a variety of methods to engage many non-expert users to solve problems that computers or limited expert users cannot solve. Whilst the concept of human computation goes some way towards solving problems, it also introduces new challenges of data quality, participant recruitment incentivisation. This course will introduce 3 common methods of crowdsourcing: peer-production; microworking games-with-a-purpose, as well as an emerging approach using social networks as a powerful problem solving monitoring tool. Participants are encouraged to bring examples of data they would like annotated or tasks that need humans to solve for discussion as to which approach might be suitable how to implement it. Learn how Intel is harnessing Big Data to drive operational efficiency revenue optimisation across the organisation. Discuss trends how Intel is embracing these trends to gain further insights, adoption value. This is a short introductory course into understing Big Data, what it is what strategies you can adopt to make the most out of it. It would be useful to bring a device for note taking. This course will cover: Putting new knowledge first. What question do you want to answer? Defining metrics for success. What is Big Data? What Big Data solutions are available to me for free? Do you know what your real sample size is?
7 Course code title Category Presenter Level Testing hypotheses calling things significant. Managing spurious correlations. Smoothing data to understing significant relationships spatial/temporal data Make as small as possible as quick as possible. Plotting your so you don t miss the obvious. Strategies for improving prediction accuracy by averaging many models together. Prerequisite(s): Familiarity with using applying science/research data to answer questions. An understing of statistics how databases operate is desirable. A basic overview of computing infrastructure algorithm will be discussed but at an introductory level assuming no prior knowledge. Keynote Lectures Company Presenter Title of talk Abstract Thomson Jochen Leidner Reuters Small Data Big Data: Qualitative Differences Resulting from Quantitative Scale Intel Mark Woodward Using Big Data to Generate Real Revenue for Business Fujitsu Joe Duran Impact of Research on Computing in Society Citigroup Stuart Jones Bridging the Gap Between Big Data, Statistics Business While the Big Data topic has received a lot of attention, one may wonder why exactly "more of the same should constitute a step change; for instance, we haven t declared a new academic field of "Big Plastic" just because we consume process more plastic than ever. In this talk, I critically assess which, if any, quantitative changes induce qualitative changes, whether the talk of as a new area is merited. Along the way, we will revisit a couple of past ongoing efforts that fall into the space apply these findings. This talk will describe how Intel takes advantage of large, complex data sources to achieve greater efficiency, cost saving new revenue opportunities across its business. As part of the talk, examples of initiatives real world business scenarios in the Technology Manufacturing world will be discussed. TBC This talk will describe how understing business objectives including revenue, expense risk management can be satisfied with statistical analysis of. As part of the talk examples of initiatives that will assist in detecting preventing fraudulent money-laundering activity in the financial world will be discussed.
MBA Marketing Electives A Career-Based Introduction (2010-2011) Marketing Department Course Recommendations Based on Career Relevance Career Paths/Job Functions Marketing Electives (BUMK) 701 706 711 715
Making Smart IT Choices Understanding Value and Risk in Government IT Investments Sharon S. Dawes Theresa A. Pardo Stephanie Simon Anthony M. Cresswell Mark F. LaVigne David F. Andersen Peter A. Bloniarz
How to embrace Big Data A methodology to look at the new technology Contents 2 Big Data in a nutshell 3 Big data in Italy 3 Data volume is not an issue 4 Italian firms embrace Big Data 4 Big Data strategies
Eindhoven, August 2014 Big Data Opportunities for the Retail Sector A Model Proposal by M.G.H. (Marcel) van Eupen BSc Industrial Engineering & Management Science TU/e 2014 Student identity number 0715154
Bachelor of Science in Business Management The Bachelor of Science in Business Management is a competencybased program that enables leaders and managers in organizations to earn a Bachelor of Science degree.
PROJECT FINAL REPORT Grant Agreement number: 212117 Project acronym: FUTUREFARM Project title: FUTUREFARM-Integration of Farm Management Information Systems to support real-time management decisions and
The NIHR Research Design Service for Yorkshire & the Humber Introduction to the Research Process Authors Antony Arthur Beverley Hancock This Resource Pack is one of a series produced by The NIHR RDS for
CGMA REPORT From insight to impact Unlocking opportunities in big data Two of the world s most prestigious accounting bodies, AICPA and CIMA, have formed a joint venture to establish the Chartered Global
Foresight, Competitive Intelligence and Business Analytics Tools for Making Industrial Programmes More Efficient Jonathan Calof, Gregory Richards, Jack Smith 9 φ β X Creating industrial policy and programmes,
Bachelor of Science in Business Human Resource Management The Bachelor of Science in Business Human Resource Management is a competency-based program that enables students to earn a Bachelor of Science
Bachelor of Science in Marketing Management The Bachelor of Science in Marketing Management is a competencybased program that enables marketing and sales professionals to earn a Bachelor of Science degree.
Accessibility, sustainability, excellence: how to expand access to research publications Report of the Working Group on Expanding Access to Published Research Findings 2 Foreword This report, Accessibility,
RC24789 (W0904-093) April 22, 2009 Computer Science IBM Research Report Towards the Open Advancement of Question Answering Systems David Ferrucci 1, Eric Nyberg 2, James Allan 3, Ken Barker 4, Eric Brown
ILM Level 3 Qualifications in Leadership and Management Candidate Handbook 2 Background to ILM The Institute of Leadership & Management (ILM) is Europe s largest independent Leadership and Management Awarding
Bachelor of Science in Accounting The Bachelor of Science in Accounting is a competency-based program that enables professionals in accounting to earn a Bachelor of Science degree. The Accounting degree
Enhancing Teaching and Learning Through Educational Data Mining and Learning Analytics: An Issue Brief Enhancing Teaching and Learning Through Educational Data Mining and Learning Analytics: An Issue Brief
BIG DATA IN ACTION FOR DEVELOPMENT This volume is the result of a collaboration of World Bank staff (Andrea Coppola and Oscar Calvo- Gonzalez) and SecondMuse associates (Elizabeth Sabet, Natalia Arjomand,
The Twining project, an institutional cooperation between Italy and Turkey, is co-financed by the European Union and the Republic of Turkey. EU TWINNING PROJECT Improving Data Quality in Public Accounts
overy in digital forensic investigations D Lawton R Stacey G Dodd (Metropolitan Police Service) September 2014 CAST Publication Number 32/14 overy in digital forensic investigations Contents 1 Summary...
MODULE C Ursula Product-Service Systems; Tools and cases Tischner and Carlo Vezzoli What does this module offer to successfully implement PSS activities? To keep the methodology simple and straightforward,
الجامعة الا لمانية الا ردنية German Jordanian University TAG-SB كلية طلال ا بوغزاله للدراسات العليا في ا دارة الا عمال Talal Abu-Ghazaleh Graduate School of Business Foundation for International Business
CRISP-DM 1.0 Step-by-step data mining guide Pete Chapman (NCR), Julian Clinton (SPSS), Randy Kerber (NCR), Thomas Khabaza (SPSS), Thomas Reinartz (DaimlerChrysler), Colin Shearer (SPSS) and Rüdiger Wirth
All available Global Online MBA routes have a set of core modules required to be completed in order to achieve an MBA. Those modules are: Management and Organizational Change (P.4) Leading Strategic Decision
Compliments of 2nd IBM Limited Edition Business Analytics in Retail Learn to: Put knowledge into action to drive higher sales Use advanced analytics for better response Tailor consumer shopping experiences
Climate Surveys: Useful Tools to Help Colleges and Universities in Their Efforts to Reduce and Prevent Sexual Assault Why are we releasing information about climate surveys? Sexual assault is a significant
TABLE OF CONTENTS Introduction... 3 The Importance of Triplestores... 4 Why Triplestores... 5 The Top 8 Things You Should Know When Considering a Triplestore... 9 Inferencing... 9 Integration with Text
SOFTWARE ENGINEERING Key Enabler for Innovation NESSI White Paper Networked European Software and Services Initiative July 2014 Executive Summary Economy and industry is experiencing a transformation towards