CLUSTER ANALYSIS WITH R

Similar documents
U N D E R S TA N D I N G T H E D N A O F DATA SCIENCE Persontyle Ltd. All rights reserved.

GETTING STARTED WITH R AND DATA ANALYSIS

INTRODUCTION TO DATA SCIENCE USING R

MACHINE LEARNING BASICS WITH R

Big Data Executive Survey

ANALYTICS CENTER LEARNING PROGRAM

Data Science and Business Analytics Certificate Data Science and Business Intelligence Certificate

Introduction to Data Mining

Doctor of Philosophy in Computer Science

Teaching Computational Thinking using Cloud Computing: By A/P Tan Tin Wee

UNIVERSITY OF INFINITE AMBITIONS. MASTER OF SCIENCE COMPUTER SCIENCE DATA SCIENCE AND SMART SERVICES

MEDICAL DATA MINING. Timothy Hays, PhD. Health IT Strategy Executive Dynamics Research Corporation (DRC) December 13, 2012

BIOINFORMATICS Supporting competencies for the pharma industry

Faculty of of Science

I. Justification and Program Goals

How To Get A Computer Science Degree At Appalachian State

CORE CLASSES: IS 6410 Information Systems Analysis and Design IS 6420 Database Theory and Design IS 6440 Networking & Servers (3)

SIMCA 14 MASTER YOUR DATA SIMCA THE STANDARD IN MULTIVARIATE DATA ANALYSIS

The Data Engineer. Mike Tamir Chief Science Officer Galvanize. Steven Miller Global Leader Academic Programs IBM Analytics

Computational Science and Informatics (Data Science) Programs at GMU

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

A GENERAL TAXONOMY FOR VISUALIZATION OF PREDICTIVE SOCIAL MEDIA ANALYTICS

REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc])

Corporate Training. Occupational Safety, Health, and Environmental Management. Certificate Program. extension.uci.edu/corporate

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

Education Policy of the Department of International Development Engineering [Bachelor s Program]

Education Policy of the Department of International Development Engineering [Bachelor s Program]

Bachelor Curriculum in cooperation with

Sanjeev Kumar. contribute

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Information Visualization WS 2013/14 11 Visual Analytics

Graduate Co-op Students Information Manual. Department of Computer Science. Faculty of Science. University of Regina

Banking Analytics Training Program

Lluis Belanche + Alfredo Vellido. Intelligent Data Analysis and Data Mining

An interdisciplinary model for analytics education

Office: LSK 5045 Begin subject: [ISOM3360]...

Visual Analytics on Public Sector Open Access Data

Introduction to Data Mining

Certificate Program in Applied Big Data Analytics in Dubai. A Collaborative Program offered by INSOFE and Synergy-BI

Pipeline Pilot Enterprise Server. Flexible Integration of Disparate Data and Applications. Capture and Deployment of Best Practices

Master Specialization in Knowledge Engineering

BIG SHIFTS WHAT S NEXT IN AML

Kazan (Volga region) Federal University, Kazan, Russia Institute of Fundamental Medicine and Biology. Master s program.

Information Management course

Understanding Your Customer Journey by Extending Adobe Analytics with Big Data

Bachelor Curriculum in cooperation with

LAB CODE (L or C) Tod R. Fairbanks, Faculty Contact, , Complete Phone Number

Azure Machine Learning, SQL Data Mining and R

Program Approval Form

CS Data Science and Visualization Spring 2016

Ph.D. in Bioinformatics and Computational Biology Degree Requirements

B.Sc. in Computer Information Systems Study Plan

Center for Dynamic Data Analytics (CDDA) An NSF Supported Industry / University Cooperative Research Center (I/UCRC) Vision and Mission

Integrating a Big Data Platform into Government:

QF01/ الخطة الدراسية كلية العلوم وتكنولوجيا المعلومات- برنامج الماجستير/ الوصف المختصر

Predictive Analytics Certificate Program

SURVEY REPORT DATA SCIENCE SOCIETY 2014

An Overview of Knowledge Discovery Database and Data mining Techniques

A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks

Paradigm Changes Affecting the Practice of Scientific Communication in the Life Sciences

Data Mining Applications in Higher Education

How To Learn Data Analytics

Challenges of Analytics

A Capability Model for Business Analytics: Part 2 Assessing Analytic Capabilities

MED 2400 MEDICAL INFORMATICS FUNDAMENTALS

Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining

Online Computer Science Degree Programs. Bachelor s and Associate s Degree Programs for Computer Science

Data Mining and Machine Learning in Bioinformatics

North Highland Data and Analytics. Data Governance Considerations for Big Data Analytics

Machine Learning with MATLAB David Willingham Application Engineer

Introduction to Data Science: CptS Syllabus First Offering: Fall 2015

Master of Science in Marketing Analytics (MSMA)

Machine Learning and Data Mining. Fundamentals, robotics, recognition

How To Understand Business Intelligence

TDWI Best Practice BI & DW Predictive Analytics & Data Mining

What is Data Science? Data, Databases, and the Extraction of Knowledge Renée November 2014

AMIS 7640 Data Mining for Business Intelligence

High Performance Computing

PROGRAM DIRECTOR: Arthur O Connor Contact: URL : THE PROGRAM Careers in Data Analytics Admissions Criteria CURRICULUM Program Requirements

DIGITS CENTER FOR DIGITAL INNOVATION, TECHNOLOGY, AND STRATEGY THOUGHT LEADERSHIP FOR THE DIGITAL AGE

Talking your Language. E-WorkBook 10 provides a one-platform, single source of truth without adding complexity to research

Visualization methods for patent data

Data Driven Discovery In the Social, Behavioral, and Economic Sciences

Foundations of Business Intelligence: Databases and Information Management

This Symposium brought to you by

DEPARTMENT OF CHEMISTRY

Transcription:

CLUSTER ANALYSIS WITH R [cluster analysis divides data into groups that are meaningful, useful, or both] LEARNING STAGE ADVANCED DURATION 3 DAY

WHAT IS CLUSTER ANALYSIS? Cluster Analysis or Clustering is the study of methods and algorithms for finding groups in data. It is an enormously important part of data science and a topic always treated in data mining and machine learning. Clustering methods can be found in areas as disparate as customer segmentation, recommender systems, drug compound library design, risk modeling, fraud detection, gene expression studies, field biology, text mining, the list is nearly endless. Clustering is an essential part of predictive modeling methodology, from data exploration and hypothesis generation about the classes or structure of data to actually being part of some predictive modeling tasks. Some Applications of Cluster Analysis; Market researchers and analysts use cluster analysis to partition the general population of consumers into market segments and to better understand the relationships between different groups of potential customers, and for use in market segmentation, Product positioning, and new product development. Clustering is used to group all the shopping items available on the web into a set of unique products. In the study of social networks, clustering is used to recognize communities within large groups of people. Cluster analysis is used to identify areas where there are greater incidences of particular types of crime. By identifying these distinct areas or "hot spots" where a similar crime has happened over a period of time, it is possible to manage law enforcement resources more effectively. Flickr's map of photos and other map sites use clustering to reduce the number of markers on a map.

3 day course for professionals and researchers interested in developing practical skills on how to implement clustering algorithms using R. This course presents a broad overview of Cluster Analysis, a form of unsupervised machine learning that is used for exploratory data analysis, data summation, ordination, and even predictive modelling. This course will provide an in depth review of both clustering theory and application across a large spectrum of disciplines and applied settings, from drug discovery to management science. Clustering topics, such as issues with data types, measures of similarity, and clustering algorithms and their taxonomy, will be additionally explored in the form of a hands-on labs with the use of the R programming language. Participants will come away with information and a set of tools that will form the basis for an approach to the use of Cluster Analysis for clustering problems in their respective domain. Cluster Analysis forms an important area of statistical learning theory, both as an independent discipline of unsupervised learning and a sometimes subdomain within predictive modelling and supervised learning. If we can get usable, flexible, dependable machine learning software into the hands of domain experts, benefits to society are bound to follow. Dr Kiri L. Wagstaff, researcher at NASA JPL

WHAT WILL YOU LEARN? This course will proceed such that participants will learn and explore by way of simulated and practical examples in R: the general concerns of data and data types used in clustering, measures of similarity (including notions of distance and metric ), theoretical foundation of clustering, data summation, ordination, connection to data mining and prediction, clustering approaches in the form of an informal taxonomy, algorithm complexity, relevant graph theory, specific clustering algorithms (model-based, hierarchical, partitional, graphical, hybrids, co-clusteing, asymmetric clustering, online clustering), visualization of various forms of clustering results, and clustering validation, parallelism. Participants will learn about the data preparation and various clustering algorithms and visualization methods with the help of the following R and R clustering packages. Examples will include real world applications in drug discovery, bioinformatics, social media, management science, finance, ecology, and others. Specific applications will include drug compound library design and diversity, gene expression, community detection, customer segmentation, species ordination, QSAR (Quantitative Structure Activity Relationship), among others. Participants will come away from the course with the tools and applied understanding necessary to approach a large array of clustering problems in their domain. PREREQUISITES Participants should have at least passing familiarity with the following topics: probability theory, statistics, matrix algebra, and programming in R.

WHO SHOULD TAKE THIS COURSE? This course is intended for those who are currently working as data analysts, programmers, market researchers with limited exposure to clustering techniques and algorithms as well as those looking to move into the field. DATA/MARKET ANALYSTS TECHNOLOGISTS/ PROGRAMMERS DATA SCIENTISTS QUANTITATIVE PROFESSIONALS RESEARCH ANALYSTS VIRTUOUS CIRCLE OF LEARNING Learning outcomes combine theory, overview of concepts and practices, applied examples from real world and implementation (Hands-on Labs). Time allocated to each topic will drive the depth and coverage of that topic. WHAT SHOULD I BRING? Along with bringing your laptop and a charger, don t forget to bring loads of curiosity, scepticism, eagerness to participate and the desire to learn.

COURSE INSTRUCTORS John MacCuish John MacCuish is a founder and President of Mesa Analytics & Computing, Inc. and a computer scientist with over 20 years of experience as a researcher, algorithm designer, and data scientist in applied settings. John has published numerous journal articles, books, successful grant applications, patents, and technical reports on graph theory, algorithm animation, scientific visualization, image processing, cheminfomatics, bioinformatics, and data mining. He also wrote or contributed to many internal and confidential reports on fraud detection, image recognition, precision agriculture, economic modeling, queuing theory models, financial risk modeling, text mining, and drug discovery. He is a recognized expert in cluster analysis, designing algorithms and implementing original software for clustering solutions in the field of early drug discovery. John has a Distinguished Performance Award from Los Alamos National Laboratory for his work on the IRS Fraud Detection Project. Dr. Norah MacCuish Dr. Norah MacCuish received her Ph.D. from Cornell University in the field of Theoretical Physical Chemistry. Her twenty years experience in pharmaceutical and software companies has given her expertise in the areas of diversity assessment for compound acquisitions, combinatorial chemistry library design, Chemical information systems use and design, both in basic drug discovery research and software development. She was awarded a Bronze Impact award for her collaborative work involving a Smith Kline Pharmaceutical Partnership. Norah has numerous publications and has made scientific presentations in the areas of fluid simulations, chemical diversity analysis, object-relational database systems, and chemical cluster analysis. She was the principal Investigator for the two Phase I NSF SBIR grants, as well as a Phase II NSF SBIR titled Cheminformatics Teaching Tools for the Cheminformatics Virtual Classroom. Today s Web-enabled deluge of electronic data calls for automated methods of data analysis. Machine learning provides these, developing methods that can automatically detect patterns in data and then use the uncovered patterns to predict future data. Kevin P. Murphy - Research Scientist at Google

"For the best return on your money, pour your purse into your head." Benjamin Franklin RETURN ON INVESTMENT (ROI) CONVINCE YOUR BOSS he advent of the data driven connected era means that analyzing massive scale, messy, noisy, and unstructured data is going to increasingly form part of everyone's work. The School of Data Science learning programs provide a unique investment opportunity that pays for itself many times over. World-class Instructors Develop Practical Data Science Skills Real World Industry Use Cases Short Courses For Time Convenience Value For Money Limited seats. We encourage you to register as soon as you can. Register Now For corporate bookings or to organize on-site training email hello@persontyle.com or call now +44 (0)20 3239 3141 THE SCHOOL OF DATA SCIENCE The School of Data Science, a project of Persontyle, specializes in designing and delivering structured, relevant and practical learning experiences for all of us to understand data science in simple human terms. /school Follow us on Twitter @schooltds Like us on Facebook Get in touch! hello@personyyle.com