CLUSTER ANALYSIS WITH R [cluster analysis divides data into groups that are meaningful, useful, or both] LEARNING STAGE ADVANCED DURATION 3 DAY
WHAT IS CLUSTER ANALYSIS? Cluster Analysis or Clustering is the study of methods and algorithms for finding groups in data. It is an enormously important part of data science and a topic always treated in data mining and machine learning. Clustering methods can be found in areas as disparate as customer segmentation, recommender systems, drug compound library design, risk modeling, fraud detection, gene expression studies, field biology, text mining, the list is nearly endless. Clustering is an essential part of predictive modeling methodology, from data exploration and hypothesis generation about the classes or structure of data to actually being part of some predictive modeling tasks. Some Applications of Cluster Analysis; Market researchers and analysts use cluster analysis to partition the general population of consumers into market segments and to better understand the relationships between different groups of potential customers, and for use in market segmentation, Product positioning, and new product development. Clustering is used to group all the shopping items available on the web into a set of unique products. In the study of social networks, clustering is used to recognize communities within large groups of people. Cluster analysis is used to identify areas where there are greater incidences of particular types of crime. By identifying these distinct areas or "hot spots" where a similar crime has happened over a period of time, it is possible to manage law enforcement resources more effectively. Flickr's map of photos and other map sites use clustering to reduce the number of markers on a map.
3 day course for professionals and researchers interested in developing practical skills on how to implement clustering algorithms using R. This course presents a broad overview of Cluster Analysis, a form of unsupervised machine learning that is used for exploratory data analysis, data summation, ordination, and even predictive modelling. This course will provide an in depth review of both clustering theory and application across a large spectrum of disciplines and applied settings, from drug discovery to management science. Clustering topics, such as issues with data types, measures of similarity, and clustering algorithms and their taxonomy, will be additionally explored in the form of a hands-on labs with the use of the R programming language. Participants will come away with information and a set of tools that will form the basis for an approach to the use of Cluster Analysis for clustering problems in their respective domain. Cluster Analysis forms an important area of statistical learning theory, both as an independent discipline of unsupervised learning and a sometimes subdomain within predictive modelling and supervised learning. If we can get usable, flexible, dependable machine learning software into the hands of domain experts, benefits to society are bound to follow. Dr Kiri L. Wagstaff, researcher at NASA JPL
WHAT WILL YOU LEARN? This course will proceed such that participants will learn and explore by way of simulated and practical examples in R: the general concerns of data and data types used in clustering, measures of similarity (including notions of distance and metric ), theoretical foundation of clustering, data summation, ordination, connection to data mining and prediction, clustering approaches in the form of an informal taxonomy, algorithm complexity, relevant graph theory, specific clustering algorithms (model-based, hierarchical, partitional, graphical, hybrids, co-clusteing, asymmetric clustering, online clustering), visualization of various forms of clustering results, and clustering validation, parallelism. Participants will learn about the data preparation and various clustering algorithms and visualization methods with the help of the following R and R clustering packages. Examples will include real world applications in drug discovery, bioinformatics, social media, management science, finance, ecology, and others. Specific applications will include drug compound library design and diversity, gene expression, community detection, customer segmentation, species ordination, QSAR (Quantitative Structure Activity Relationship), among others. Participants will come away from the course with the tools and applied understanding necessary to approach a large array of clustering problems in their domain. PREREQUISITES Participants should have at least passing familiarity with the following topics: probability theory, statistics, matrix algebra, and programming in R.
WHO SHOULD TAKE THIS COURSE? This course is intended for those who are currently working as data analysts, programmers, market researchers with limited exposure to clustering techniques and algorithms as well as those looking to move into the field. DATA/MARKET ANALYSTS TECHNOLOGISTS/ PROGRAMMERS DATA SCIENTISTS QUANTITATIVE PROFESSIONALS RESEARCH ANALYSTS VIRTUOUS CIRCLE OF LEARNING Learning outcomes combine theory, overview of concepts and practices, applied examples from real world and implementation (Hands-on Labs). Time allocated to each topic will drive the depth and coverage of that topic. WHAT SHOULD I BRING? Along with bringing your laptop and a charger, don t forget to bring loads of curiosity, scepticism, eagerness to participate and the desire to learn.
COURSE INSTRUCTORS John MacCuish John MacCuish is a founder and President of Mesa Analytics & Computing, Inc. and a computer scientist with over 20 years of experience as a researcher, algorithm designer, and data scientist in applied settings. John has published numerous journal articles, books, successful grant applications, patents, and technical reports on graph theory, algorithm animation, scientific visualization, image processing, cheminfomatics, bioinformatics, and data mining. He also wrote or contributed to many internal and confidential reports on fraud detection, image recognition, precision agriculture, economic modeling, queuing theory models, financial risk modeling, text mining, and drug discovery. He is a recognized expert in cluster analysis, designing algorithms and implementing original software for clustering solutions in the field of early drug discovery. John has a Distinguished Performance Award from Los Alamos National Laboratory for his work on the IRS Fraud Detection Project. Dr. Norah MacCuish Dr. Norah MacCuish received her Ph.D. from Cornell University in the field of Theoretical Physical Chemistry. Her twenty years experience in pharmaceutical and software companies has given her expertise in the areas of diversity assessment for compound acquisitions, combinatorial chemistry library design, Chemical information systems use and design, both in basic drug discovery research and software development. She was awarded a Bronze Impact award for her collaborative work involving a Smith Kline Pharmaceutical Partnership. Norah has numerous publications and has made scientific presentations in the areas of fluid simulations, chemical diversity analysis, object-relational database systems, and chemical cluster analysis. She was the principal Investigator for the two Phase I NSF SBIR grants, as well as a Phase II NSF SBIR titled Cheminformatics Teaching Tools for the Cheminformatics Virtual Classroom. Today s Web-enabled deluge of electronic data calls for automated methods of data analysis. Machine learning provides these, developing methods that can automatically detect patterns in data and then use the uncovered patterns to predict future data. Kevin P. Murphy - Research Scientist at Google
"For the best return on your money, pour your purse into your head." Benjamin Franklin RETURN ON INVESTMENT (ROI) CONVINCE YOUR BOSS he advent of the data driven connected era means that analyzing massive scale, messy, noisy, and unstructured data is going to increasingly form part of everyone's work. The School of Data Science learning programs provide a unique investment opportunity that pays for itself many times over. World-class Instructors Develop Practical Data Science Skills Real World Industry Use Cases Short Courses For Time Convenience Value For Money Limited seats. We encourage you to register as soon as you can. Register Now For corporate bookings or to organize on-site training email hello@persontyle.com or call now +44 (0)20 3239 3141 THE SCHOOL OF DATA SCIENCE The School of Data Science, a project of Persontyle, specializes in designing and delivering structured, relevant and practical learning experiences for all of us to understand data science in simple human terms. /school Follow us on Twitter @schooltds Like us on Facebook Get in touch! hello@personyyle.com