1 Appendix III: Ten (10) Specialty Areas Data Sciences Curriculum Mapping to Knowledge Units-Data Sciences Specialty Area IX. Data Sciences Specialty Area 1. Knowledge Unit title: Research Design and Application for Data and Analysis A. Knowledge Unit description and objective: Framing the analytic problem for a solution using critical thinking along with various statistical, mathematical, or algorithmic tools and software. B. Requirement satisfaction: This KU is satisfied when seven (7) Topics and all Learning Objectives are met. Research Design and Application for Data and Analysis IX.1C1 IX.1C2 IX.1C3 IX.1C4 IX.1C5 IX.1C6 IX.1C7 IX.1C8 IX.1C9 IX.1C10 IX.1C11 IX.1C12 IX.1D1 IX.1D2 IX.1D3 IX.1D4 Fundamentals and historical context of data analytics and the data science pipeline Components of data sets Different data structures Common data-representation schemes and structures Scope the resources required for a data science project Know what analyses are possible given a particular data set, including both the state of the art of the field and inherent limitations Making reproducible research and processes Basic statistical understanding including probability distributions, hypothesis testing, and linear regression, and causality. Types of data science questions i.e. Descriptive, Exploratory, Inferential, and so on Design of experiment Sampling Critical thinking and logic Discuss what data represents Describe the components of data Identify common data structures used for collection for analytic problems. Discuss common data-representation schemes and structures: unstructured and semi-structured data: text, web logs, and html.

2 IX.1D5 IX.1D6 IX.1D7 IX.1D8 IX.1D9 Research Design and Application for Data and Analysis Explain the resources required to develop and complete a data science project with a timeline and cost estimate Describe best practices of reproducible data analysis Identify various experimental designs and describe the benefits and constraints of each Explain various sampling schemes Describe common critical thinking techniques

3 2. Knowledge Unit title: Data Storage and Preparation A. Knowledge Unit description and objective: Understand and be familiar with obtaining and cleaning data for analysis B. Requirement satisfaction: This KU is satisfied when all Topics and all Learning Objectives are met. Data Storage and Preparation IX.2C1 IX.2C2 IX.2C3 IX.2C4 IX.2C5 IX.2C6 IX.2C7 IX.2C8 IX.2C9 IX.2D1 IX.2D2 IX.2D3 IX.2D4 IX.2D5 IX.2D6 IX.2D7 IX.2D8 Data acquisition Dealing with Big Data sets: ETL, SQL, non-sql, data nodes, data fusion/integration, data transformation Data cleaning Data Recoding Understand specialized systems and algorithms that have been developed to work with data at scale, including MapReduce and other software; core techniques in distributed systems; characteristics of HPC and cloud platforms; and important scalable algorithms for graphs, streams and text. Data Munging/Mining: PCA, feature Extraction, binding, unbiased estimators, handing missing variables and outliers, normalization, dimensionality reduction, denoising, sampling Tidy Data CRISP-DM Data base structures and trade-offs Describe how to access data from a variety of sources including relational databases, NoSQL data stores, webbased APIs Demonstrate programming skills in R, Hadoop and other languages to mine massive amounts of information Prepare clean data Show how to reformat/recode data for analysis Apply dimensionality reduction techniques to big data sets Explain CRISP-DM data mining construct Explain different data base structures and the benefits and draw backs of each Describe the tidy data concept and employ it to produce a clean data set

4 3. Knowledge Unit title: Exploring and Analyzing Data A. Knowledge Unit description and objective: Understand and be familiar on applying analytic techniques and algorithms (including statistical and data mining approaches) to large data sets to extract meaningful insights B. Requirement satisfaction: This KU is satisfied when all Topics and all Learning Objectives are met. Exploring and Analyzing Data IX.3C1 IX.3C2 IX.3C3 IX.3C4 IX.3C5 IX.3C6 IX.3C7 IX.3D1 IX.3D2 IX.3D3 IX.3D4 IX.3D5 IX.3D6 Exploratory analysis and inferential hypothesis testing through the basics of statistical analysis Data analyses using comparisons between batches, analysis of variance and linear and logistic regression. Evaluation of assumptions; data transformation; reliability of statistical measures; resampling methods; validation of assumptions; interpretation; causation versus correlation Principles of Bayesian Statistics Spatial Statistics Time-Series Analysis Programming for data analysis (e.g., SAS, R or Python) to include data frames, vectors, matrices, reading and writing data, sub-setting, REGEX, functions, and factor analysis. Text mining/nlp: corpus, text analysis, TF/F, SVM, feature extraction, sentiment analysis Apply statistical methods and regression techniques to make sense out of data sets both large and small Demonstrate how to apply Bayesian statistics to solve problems Employ time series analysis to temporal and spatiotemporal data Employ spatial statistics to spatial and spatio-temporal data Use various statistical packages or programs to conduct data analysis Apply text mining techniques to unstructured textual data

5 4. Knowledge Unit title: Machine Learning and Statistical Models A. Knowledge Unit description: Understand and be familiar with building appropriate machine learning applications for tasks. B. Requirement satisfaction: This KU is satisfied when at least seven (7) Topics and all Learning Objectives are met. Machine Learning and Statistical Models IX.4C1 IX.4C2 IX.4C3 IX.4C4 IX.4C5 IX.4C6 IX.4D1 IX.4D2 IX.4D3 IX.4D4 IX.4D5 IX.4D6 IX.4D7 IX.4D8 IX.4D9 Introduction of the theory and application of statistical machine learning Topics include supervised versus unsupervised learning; and regression, classification, clustering, and dimensionality reduction Deep Learning techniques, especially CNN and computer vision Collaborative Filtering/Recommendation Engines Model Evaluation Machine learning applications Open-source programming tools and techniques available for implementing machine learning Identify potential applications of machine learning Describe the differences in type of analyses enabled by regression, classification, clustering, and dimensionality reduction Select the appropriate machine learning technique Explain the difference between machine learning and deep learning and describe the structure of deep learning techniques Apply regression, classification, clustering, retrieval, recommender systems, and deep learning Assess the model quality with relevant error metrics Use a fitted model to analyze new data Build an end-to-end application that uses machine learning at its core Implement these techniques in Python or R (or in the language of your choice, though Python or R is highly recommended)

6 5. Knowledge Unit title: Data Visualization and Communication A. Knowledge Unit description and objective: Understand and be familiar with the ability to model and communicate results of analysis effectively (visually and verbally) to a broad audience. B. Requirement satisfaction: This KU is satisfied when at least all Topics and all Learning Objectives are met. Data Visualization and Communication IX.5C1 IX.5C2 IX.5C3 IX.5C4 IX.5C5 Types of infographics: decision trees, neural networks, survey plots, timelines, bubble charts, scatterplots, tree maps, histograms, boxplots, etc. Communicating quantitative information through storytelling to impact the organization Understand the design and presentation of digital information using modern visualization software (e.g., Tableau, ggplot2, D3.js, matplotlib, Qlikview) Identify common design principles for visualizations (e.g., Edward Tufte's The Visual Display of Quantitative Information) Presenting appropriate data visualizations for specific customers IX.5D1 IX.5D2 IX.5D3 IX.5D4 IX.5D5 IX.5D6 Design and critique visualizations Prepare infographics and dashboards in at least one program (e.g., MATLAB, Tableau, etc.) and programming language (e.g., R, Python, etc.) Construct streamlined analyses and highlight their implications efficiently using visualizations Produce effective visualizations that harness the human brain s innate perceptual and cognitive tendencies Explore methods of presenting complex information to enhance comprehension and analysis; and the incorporation of visualization techniques into humancomputer interfaces. Explain the state-of-the-art in privacy, ethics, governance around big data and data science

