An interdisciplinary model for analytics education Raffaella Settimi, PhD School of Computing, DePaul University
Drew Conway s Data Science Venn Diagram http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
Who is a data scientist? Curious and inquisitive A computer scientist A statistician A data miner Creative and with strong communication skills.and let s not forget a subject-domain expert
Learning outcomes of an Analytics MS curriculum Database Processing Programming /Scripting Algorithms and data modeling Visualization and communication Applications Hands on experience Domain- specific competence
Core subjects Database Processing SQL queries DB programming NoSQL DB s (e.g. Hadoop) Data Mining Data cleaning, integration, and governance Association Rules Basic Statistics & Data visualization Statistical Analysis Multivariate statistics Time series analysis Machine Learning Classification techniques, Clustering methods Supervised and unsupervised learning NSPMA Workshop, May28-29, 2013
Tools and Platforms Students should master a core set of tools and platforms for - Data storage and integration - Modeling and analysis - Data visualization and reporting Both open source and commercial software Programming Python, Java Data Storage and integration Relational Databases (MySQL, Oracle, SQL Server) Hadoop, NoSQL, Mongo DB Modeling and analysis R SPSS & SPSS Modeler SAS & SAS Enterprise Miner Matlab Weka PANDA (Python Data Analysis Library) Visualization Tableau MapPoint ArcView, etc
DePaul s MS degree in Predictive Analytics Originally a specialization in Machine Learning of our MS in Computer Science. Created in 2010 to address increasing demand of graduates with deep technical and analytics skills to meet the challenge of mining Big Data. 100 90 80 70 60 50 40 30 20 10 0 Enrollment AY 2010/11 AY 2011/12 AY 2012/13
From a 2012 survey of our students Students are most interested in courses around working with data (including big data ) and data analysis, as well as gaining additional experience in programming and marketing. *Created on Wordle.net. Size is relative to the overall number of mentions/responses; position does not matter. Top 50 words/mentions shown.
Banking Consulting Education Food/Beverage Insurance IT/Technology Marketing/Advertising Not for Profit Health Care N/A Current Positions held by our students Breakdown of industries among students with analytics positions Analytics Positions 30% Not working full time 20% Not Related Position 50%
DePaul s MS in Predictive Analytics curriculum Prerequisite knowledge: Intro to Statistics Python Calculus & Liner Algebra (can be taken before MS) Computational Methods concentration (Fall 2011) Health Care concentration (Winter 2014) Common core Marketing concentration (Fall 2010) Hospitality concentration (Fall 2013) Practicum Course
Links to Curriculum Course home page: http://www.cdm.depaul.edu/academics/pages/ms-in-predictive- Analytics.aspx Concentrations: Computational methods: view requirements Marketing: view requirements Hospitality: view requirements Health Care: available in winter 2014 NSPMA Workshop, May 28-29, 2013
Common Core Teaches the fundamental tools and techniques for Data Science. Database processing (SQL queries, relational databases, nosql DB s, data management and integration) Statistical modeling (regression analysis, multivariate statistics) Data mining and machine learning (data cleaning, association rules, clustering, classification techniques, etc ) Application of analytics in social networks, web data mining, text mining
Applications Social Networks Analysis of network structure, Data retrieval from networks, Text analysis Web analytics User behavior modeling, E-metrics for business intelligence, Web personalization, recommender systems, privacy and ethical issues Text mining Information retrieval models, document clustering, taxonomies, sentiment analysis. NSPMA Workshop, May28-29, 2013
Additional electives Image analysis: image representation, segmentation, pattern recognition Monte Carlo techniques Visualization techniques and design principles Data stream analysis ETL, data warehousing and business intelligence tools (dashboards, reporting, etc NSPMA Workshop, May28-29, 2013
Computational Methods concentration view requirements Created in response to the demand of those students who wanted to develop strong technical skills required for Big Data analytics. Courses in Mining Big Data Programming analytics applications in Python Advanced data mining techniques (matrix factorization, probabilistic networks, etc.) Machine Learning algorithms Students learn how to apply advanced data mining and data base processing techniques for the analysis and management of extremely large datasets. NSPMA Workshop, May28-29, 2013
Marketing concentration jointly with the Marketing Department view requirements Everyday the amount of data available to businesses increases, and more information is available about markets, products, competitors and customers. Companies gain a competitive advantage by using analytics to uncover insights about their markets and make smarter decisions. Courses in Customer Relationship Management Marketing analytics Internet marketing Customer service and analysis Students learn how to Apply analytics to mine marketing data Extract information from data to support business decision making and marketing decisions.
Hospitality concentration jointly with the School of Hospitality Leadership view requirements Organizations in the tourism (hotels, restaurants, travel) industry have access to an abundance of data, both internal and from third-party available through social media channels, such as Trip Advisor and Yelp. Students learn how to Apply analytics to mine hospitality data incorporating revenue management principles, and optimization techniques Assess hospitality global distribution system analytics and predict impacts on service-firm financial performance Identify revenue management principles and optimization models unique to the various services sector within the hospitality industry NSPMA Workshop, May28-29, 2013
Health Care concentration jointly with the Marketing dept. and Health Sector Mgmt program The recent changes in healthcare have lead to a paradigm shift in healthcare industry and an increasing need in using data to predict trends in illness, disease, injury, utilization, and costs. Students learn how to Apply analytics to mine health care data such as Patient experience / satisfaction/outcomes Claim management and cost reduction Predictive modeling of care, costs and utilization Pharmacy data To develop evidence-based business models to improve health care strategies, such as patient experience, clinical processes, and resource allocation.
Interdisciplinary center In 2010, we created an interdisciplinary academic center to bring together expertise of faculty from different schools and programs at DePaul University: Computing Marketing Hospitality leadership (added in 2012) Health sector management (added in 2013) Aimed to be a center without walls facilitating: Faculty and students research across disciplines State-of-the-art curriculum for preparing a new generation of specialists in data mining and predictive analytics Faculty and students collaborations with industries Students/Alumni matching application with employers needs Networking events
Provide students with real world experience: Think outside the classroom Data science cannot be learnt just by sitting in a classroom and listening to lectures Students should Use real data in courses Work on large scale projects Gain experience through internships or industry sponsored projects Have access to a variety of platforms and tools Network with analytics professionals
Challenge: Access to real data It can be hard to get real data from companies, because data often contain sensitive information about the company or customers. Internships are easier to set up, as data remains at the company site. Industry-sponsored projects are a win-win opportunity for companies that can take advantage of a team of students and the expertise of a faculty member supervising the project - at no or relatively low cost.
DaMPA Industry Partnerships DaMPA Industry Partnerships Data Software Education Research Companies provide datasets to be used for class projects or student research projects. Companies provide software or training material to be used for teaching or research. Companies recruit students, serve as industry advisors, and guide the Center on curriculum development and long terms planning. Partner to translate new science into novel technologies and to address unmet industry s needs. Education Advisory Board Research Innovation Board
Examples of projects Medical Informatics NSF REU program in medical informatics (joint with University of Chicago) Computer-aided detection, diagnosis, and characterization for lung nodules (joint with University of Chicago) Prediction of chronic fatigue syndrome (joint with DePaul Psychology Department) Tracking illness from Tweets Analysis of legionellosis occurrence (data from Chicago Public Health Office) Web Data Mining, Web Personalization, and Recommender Systems Ontology-based user modeling for web personalization and recommendation Recommender Systems for the Social Web Trustworthy and Secure Recommender Systems for the Web Urban studies Motor Vehicle theft analysis (data from Chicago Police Dept.) A data-driven typology of urban communities in Cook County (joint with Institute of Housing Studies) Hospitality Projects Food and Beverage Analytics and Optimization Modeling Restaurant Revenue Analytics and Predictive Profit Optimization Scenarios
Where are our graduates employed? Internships or full time