DATA SCIENCE ADVISING NOTES David Wild - updated May 2015 GENERAL NOTES Lots of information can be found on the website at http://datascience.soic.indiana.edu. Dr David Wild, Data Science Graduate Program Director, is the default advisor for all students. Email questions regarding advising should be sent to dsadvise@indiana.edu. You can book an advising session in semester time (starting fall 2015) with David Wild on Tuesdays or Wednesday mornings at https://djwild.youcanbook.me/. Residential students should meet in Informatics West 207 (901 E 1oth St); online students should provide a Skype or Google Hangouts username for an online session. PATHS AND TRACKS There is understandably some confusion about Paths and Tracks. A path is not a formal part of your degree, but a guide to help you choose courses which are right for you. Courses are classed into or paths. Decision maker courses are focused on the skills needed by data science decision makers such as utilization of data science techniques, social factors, and domain-specific applications. courses are focused on the technical side of data science, often requiring strong programming skills. You do not have to declare a path for your degree, and being on one path doesn t mean you can t take courses from another path. In the course listings below, we tag the classes with paths (note that these tags are currently provisional, and some may change) A track is more formal specialization, and the track determines which courses you can take. The default is the general track, in which you can pick whichever courses you wish. Currently we have just one specialization track, computational & analytic, the requirements of which are listed at the end of this document. We intend to add more tracks in the future. ONLINE COURSES - upcoming semester classes highlighted in green Online classes are also available to residential students. Some online classes have physical classroom meetings for residential students, others are held completely online.
Course Next Class O=Online R=Residential Instructor Path / Specialization Track Notes INFO I590 Data Science in Drug Discovery, Health and Translational Medicine Wild Website: http://dsdht.wikispaces.com Prerequisites: Ability to perform basic statistical tasks in R; conceptual understanding of machine learning. INFO I590 Management, Access, and Use of Big and Complex Data #34881 (O) #34879 (R) Plale & INFO I590 Big Data Applications and Analytics INFO I590 Big Data Open Source Software and Projects #34717 (O) #15590 (R) Fox Some programming experience required, Python preferred. Fox Familiarity in Scripting Languages such as Linux Shell and especially Python. Knowledge of Java helpful but not required. CSCI B649 Cloud Computing for Data Intensive Sciences Qiu This is a programming intensive course. It has similar requirements to the CS graduate level residential version. Students are expected to have weekly (or biweekly) programming homework. General programming experience with Windows or Linux using Java (2-3 years) and scripts is required. A background in parallel and cluster computing is a plus, although not necessary. CSCI B649 High Performance Computing ILS Z636 Data Semantics #34732 (O) #11972 (R) Sterling Intermediate C/C++ experience Familiarity with Linux/Unix command-line utilities Ding Basic knowledge of HTML and XML is necessary. Basic knowledge of Java can be helpful. ILS Z637 Information Visualization Börner
ILS Z604 Social and Organizational Informatics of Big Data #33198 (O) #33197 (R) Rosenbaum & Fichman RESIDENTIAL COURSES - upcoming semester classes highlighted in green Course Next Class Instructor Path / Specialization Track Notes CSCI B503: Algorithms Design and Analysis #7461 Ergun CSCI B534: Distributed Systems CSCI B551: Elements of Artificial Intelligence #3171 Leake CSCI B552: Knowledge-Based Artificial Intelligence CSCI B553: Neural and Genetic Approaches to Artificial Intelligence CSCI B555: Machine Learning #30744 White CSCI B561: Advanced Database Concepts #12330 or #3172 Zhang CSCI B565: Data Mining #35008 CSCI B649: Advanced Topics in Privacy Dalkilic
CSCI B652: Computer Models of Symbolic Learning CSCI B656: Web mining CSCI B659: Information Theory and Inference CSCI B661: Database Theory and System Design CSCI B662: Database Systems & Internal Design CSCI B669: Topics in Database and Information Systems: Scientific Data Management and Preservation INFO I519: Introduction to Bioinformatics #8668 Ye INFO I520: Security For Networked Systems #31306 Camp INFO I525: Organizational Informatics and Economics of Security INFO I529: Machine Learning in Bioinformatics? INFO I533: Systems & Protocol Security & Information Assurance INFO I573: Programming for Science Informatics
Complex Networks and their Applications Applied Machine Learning Complex Systems Mining the Social Web #33588 Ferrara Relational Probabilistic Models INFO I590: Visual Analytics ILS P536: Advanced Operating Systems ILS P538: Computer Networks ILS Z511: Database Design #6149 Bourlai ILS Z534: Information Retrieval: Theory and Practice #14383 or #15713 Liu / Guo ILS Z604: Topics in Library and Information Science: Data Curation ILS Z604: Topics in Library and Information Science: Scholarly Communication ILS Z604: Topics in Library and
Information Science: Big Data Analysis for Web and Text ILS Z605: Internship in Library and Information Science #6154 Fichman To be arranged with faculty advisor ILS Z652: Digital Libraries #7433 Walsh STAT S520: Intro to Statistics #13627 Luen STAT S670: Exploratory Data Analysis STAT S675: Statistical Learning & High-Dimensional Data Analysis #8932 King #14446 Trosset STAT S681: Statistical Network Analysis COMPUTATIONAL & ANALYTIC TRACK REQUIREMENTS Requirements for the track: 1. A student has to take at least 3 courses (9 credits) from Category 1 Core Courses. CSCI B503 is required. 2. A student must take at least 2 courses from Category 2 Data Systems, AND, at least 2 courses from Category 3 Data Analysis. Courses taken in Category 1 can be double counted if they are also listed in Category 2 or Category 3. 3. A student must take at least 3 courses from Category 2 Data Systems, OR, at least 3 courses from Category 3 Data Analysis. Again, courses taken in Category 1 can be double counted if they are also listed in Category 2 or Category 3. Category 1: Core Courses CSCI B503 Analysis of Algorithms (Data analysis and Statistics) REQUIRED CSCI B555 Machine Learning OR INFO I590 Applied Machine Learning (Data Lifecycle)
CSCI B561 Advanced Database Concepts (Data Management and Infrastructure) STAT S520 Introduction to Statistics OR (New Course) Probabilistic Reasoning (Data Analysis and Statistics) Category 2: Data Systems CSCI B534 Distributed Systems (Data Management and Infrastructure) CSCI B561 Advanced Database Concepts, (Data Management and Infrastructure) CSCI B662 Database Systems & Internal Design (Data Management and Infrastructure) CSCI B649 Cloud Computing (Data Management and Infrastructure) CSCI B649 Advanced Topics in Privacy(Data Management and Infrastructure) CSCI P538 Computer Networks (Data Management and Infrastructure) INFO I533 Systems & Protocol Security & Information Assurance (Application areas) ILS Z534: Information Retrieval: Theory and Practice (Data Analysis and Statistics) Category 3: Data Analysis CSCI B565 Data Mining (Data Analysis and Statistics) CSCI B555 Machine Learning (Data Lifecycle) INFO I590 Applied Machine Learning (Data Lifecycle) INFO I590 Complex Networks and Their Applications (Data Management and Infrastructure) STAT S520 Introduction to Statistics (Data Analysis and Statistics) Category 4: Elective Courses CSCI B551 Elements of Artificial Intelligence (Data Lifecycle) CSCI B553 Probabilistic Approaches to Artificial Intelligence Data Analysis and Statistics) CSCI B659 Information Theory and Inference Data Analysis and Statistics) CSCI B661 Database Theory and Systems Design (Data Management and Infrastructure) INFO I519 Introduction to Bioinformatics (Application areas) INFO I520 Security For Networked Systems (Data Management and Infrastructure) INFO I529 Machine Learning in Bioinformatics (Application areas) INFO I590 Relational Probabilistic Models (Data Analysis and Statistics) ILS Z637 - Information Visualization (Data Analysis and Statistics) All courses from STAT that are 600 and above (Data Analysis and Statistics)