IST687 Applied Data Science



Similar documents
IST687 Scientific Data Management

IST359 - INTRODUCTION TO DATABASE MANAGEMENT SYSTEMS

IST659 Database Admin Concepts & Management Syllabus Spring Location: Time: Office Hours:

IST359 INTRODUCTION TO DATABASE MANAGEMENT SYSTEMS

Syracuse University School of Information Studies. IST553 - Information Architecture for Internet Services. Tentative Syllabus - Spring 2015

IST565 M001 Yu Spring 2015 Syllabus Data Mining

Managing Information Systems Projects (IST645 M ) School Of Information Studies Fall 2010

COMP252: Systems Administration and Networking Online SYLLABUS COURSE DESCRIPTION OBJECTIVES

CSCI-599 DATA MINING AND STATISTICAL INFERENCE

LSC 740 Database Management Syllabus. Description

H. JOHN HEINZ III COLLEGE CARNEGIE MELLON UNIVERSITY PROJECT MANAGEMENT SPRING A3 / B3 COURSE SYLLABUS

PSY 3329 Educational Psychology Online Course Spring Week Course

PCA 342B, andia<at>post.harvard.edu, (305) Spring 2015

ENGR 102: Engineering Problem Solving II

MAT Elements of Modern Mathematics Syllabus for Spring 2011 Section 100, TTh 9:30-10:50 AM; Section 200, TTh 8:00-9:20 AM

This four (4) credit hour. Students will explore tools and techniques used penetrate, exploit and infiltrate data from computers and networks.

Video Game Programming ITP 380 (4 Units)

Fundamentals of Evaluation, Measurement & Research EMR 5400

UNIVERSITY OF MASSACHUSETTS BOSTON COLLEGE OF MANAGEMENT AF Theory of Finance SYLLABUS Spring 2013

DATABASE DESIGN AND IMPLEMENTATION II SAULT COLLEGE OF APPLIED ARTS AND TECHNOLOGY SAULT STE. MARIE, ONTARIO. Sault College

through d2l Phone: Office: Ewing 240 Office Hours: Online "Office Hours": Friday 11:00-12:00

IST659 Fall 2015 M003 Class Syllabus. Data Administration Concepts and Database Management

ISM 4210: DATABASE MANAGEMENT

Colorado School of Mines Spring 2014 Principles of Economics EBGN 201

16-week online course COURSE DESCRIPTION: Business 001 COURSE TEXT: ISBN COURSE OBJECTIVES

Psychological Testing (PSYCH 149) Syllabus

ITNW 1337 Introduction to the Internet Course Syllabus: Spring 2015

METHODS OF SOCIAL RESEARCH

IST 600: Advocacy for Academic, Public, and School Libraries Course Syllabus ~~ Spring Contact Information:

CS 1361-D10: Computer Science I

PSYC 3200-C Child Psychology 3 SEMESTER HOURS

CRN: STAT / CRN / INFO 4300 CRN

JOU 3411 DESIGN SYLLABUS

IT 145 Section 300 Fall 2013 Web Design Fundamentals: HTML and Style Sheets. Syllabus and Course Outline

Forensic Biology 3318 Syllabus

COURSE SYLLABUS EDG 6931: Designing Integrated Media Environments 2 Educational Technology Program University of Florida

Northern Virginia Community College: Hybrid Course Template

ISM and 05D, Online Class Business Processes and Information Technology SYLLABUS Fall 2015

BUS 140: Legal Environment of Business. Course Description. Course Details. Student Learning Outcomes (SLOs) & Course Objectives

MT120-ES: Topics in Applied College Math (4 credits; 100% online) Syllabus Fall 2013

Course information: Copy and paste current course information from Class Search/Course Catalog.


Introduction: How does a student get started? How much time does this course require per week?

MGMT 280 Impact Investing Ed Quevedo

MSIS 635 Session 1 Health Information Analytics Spring 2014

MATH Probability & Statistics - Fall Semester 2015 Dr. Brandon Samples - Department of Mathematics - Georgia College

WAYLAND BAPTIST UNIVERSITY VIRTUAL CAMPUS SCHOOL OF BUSINESS SYLLABUS

IST 645 Managing Information Technology Projects

College of Southern Maryland Fundamentals of Accounting Practice(ACC 1015) Course Syllabus Spring 2015

MKT 363 Professional Selling & Sales Management Course Syllabus (Spring 2011)

Thursday 11:00 a.m. - 12:00 p.m. and by appointment

POLS 209: Introduction to Political Science Research Methods

IST 755 Strategic Management of Information Resources School of Information Studies, Syracuse University Spring Semester 2015

COURSE SYLLABUS PHILOSOPHY 001 CRITICAL THINKING AND WRITING SPRING 2012

uncommon thinking ORACLE BUSINESS INTELLIGENCE ENTERPRISE EDITION ONSITE TRAINING OUTLINES

BUS Computer Concepts and Applications for Business Fall 2012

MLIS 7520 Syllabus_Fall 2013 Page 1 of 6

Intro. to Data Visualization Spring 2016

JACKSONVILLE STATE UNIVERSITY DEPARTMENT OF HEALTH, PHYSICAL EDUCATION, AND RECREATION HPE 567 Sport Facility Administration and Design

MAC 1105 FLEX SYLLABUS

Math 1302 (College Algebra) Syllabus Fall 2015 (Online)

Learning Web Page: Office Hours: I can be melvin.mays@hccs.edu or

MATH 205 STATISTICAL METHODS

MATH 1314 College Algebra Departmental Syllabus

Precalculus Algebra Online Course Syllabus

CSC-570 Introduction to Database Management Systems

ISM 4403 Section 001 Advanced Business Intelligence 3 credit hours. Term: Spring 2012 Class Location: FL 411 Time: Monday 4:00 6:50

LOS ANGELES VALLEY COLLEGE MATH 275. Ordinary Differential Equations (section # units) S16. MW 9:40 11:05a MS 108

Be sure to keep a copy of this Syllabus with you for easy reference.

JEFFERSON COLLEGE COURSE SYLLABUS CIS-236 SQL AND DATABASE DESIGN. 3 Credit Hours. Prepared by: Chris DeGeare CIS Instructor. Revised: 3/11/2013

Business Statistics MATH 222. Eagle Vision Home. Course Syllabus

Office: LSK 5045 Begin subject: [ISOM3360]...

Blackboard Development Checklist for Online Courses

Central Michigan University College of Business Administration Online MBA Program. MBA 620 Online: Managerial Accounting: A Management Perspective

Course Syllabus STA301 Statistics for Economics and Business (6 ECTS credits)

Physics 21-Bio: University Physics I with Biological Applications Syllabus for Spring 2012

MKT/IBUS 4321 International Marketing

FINC 6532-ADVANCED FINANCIAL MANAGEMENT Expanded Course Outline Spring 2007, Monday & Wednesday, 5:30-6:45 p.m.

PHOENIX COLLEGE ONLINE. SBS220 Internet Marketing for Small Business

English 1302 Writing Across the Curriculum Spring 2016

Fundamentals of Computer Programming CS 101 (3 Units)

Psychology 2510: Survey of Abnormal Psychology (Section 2) Fall 2015

GRAPHIC DESIGN 1. ART 115 Course Syllabus Fontbonne University, St. Louis, MO COURSE INFORMATION COURSE DESCRIPTION COURSE OBJECTIVES PREREQUISITES

How To Pass The Cis 50 Online Course

Introduction to Graphic Design Classroom and Grading Policies

MTH 110: Elementary Statistics (Online Course) Course Syllabus Fall 2012 Chatham University

GEOG 5200S Elements of Cartography : Serving the Community Through Cartography Spring 2015

IST 611: Information Technologies in Educational Organizations. IST 611: Information Technologies in Educational Organizations SPRING 2009 SYLLABUS

CS 394 Introduction to Computer Architecture Spring 2012

IST 754: Final Project in Telecommunications Systems

ECON 351: Microeconomics for Business

Digital Production Art 3-D AET Bringing ones imagination to life has never been easier. Fall 2015 CBA TTH 9:30-11:00 AM

Transcription:

1 IST687 Applied Data Science Course: Instructor: IST687 Applied Data Science Gary Krudys Semester: E-Mail: Spring 2015 gekrudys@syr.edu Office: 114 Hinds Hall Phone: 315-857-7243 (cell) Office hours: Meeting place: Online only Meeting time: Online Catalog Description Introduces fundamentals about data and the standards, technologies, and methods for organizing, managing, curating, preserving, and using data. Discusses broader issues relating to data management and use as well as quality control and publication of data. Course Description Data scientists play important roles in the four A's of data: data architecture, data acquisition, data analysis and data archiving. A data scientist provides input to system architects on how data needs to be routed and organized to support analysis, visualization and presentation of the data to the appropriate people. Data acquisition, from a data scientist's standpoint, is a critical phase in which data are obtained in the right format, linked to other relevant data and screened to ensure appropriateness for analysis. Analysis is where the data scientists are most heavily involved: summarization of the data, using portions of data (samples) to make inferences about the larger context and visualization of the data by presenting it in tables, graphs and even animations. Finally, the data scientist must become involved in the archiving of the data. Preservation of collected data is a form that makes it highly reusable - what you might think of as "data curation" - is a difficult challenge because it is so hard to anticipate all of the future uses of the data. This course provides an overview of all four areas, as well as the opportunity to ramp up on the popular open source data science tool, the R open source statistical analysis and visualization system. R is reckoned by many to be the most popular choice among data analysts worldwide; having some knowledge and skill with using it is considered a valuable and marketable job skill for most data scientists. The course will include applied examples of data collection, processing, transformation, management, and analysis to provide students with hands-on experience. Students will have opportunities to practice the methods and tools for planning a data analysis project from data collection to the end data products and reports with data quality management and data product development in between. The evaluation of each student will arise from exercises, reports, class discussion, and a course project. Learning Objectives At the end of the course, students are expected to understand: Essential concepts and characteristics of data Scripting/code development for data management Principles and practices in data screening, cleaning, and linking

2 Sampler of technologies used to manage, manipulate, and analyze data Communication of results to decision makers At the end of the course, students are expected to be able to: Identify a problem and the data needed for addressing the problem Perform basic computational scripting using R and other optional tools Transform data through processing, linking, aggregation, summarization, and searching Organize and manage data at various stages of a project lifecycle What does it take to succeed in the course? An interest and passion in data science career in the corporate, academic, or government sector Curiosity on natural and social phenomena and basic computer skills Close familiarity with algebra, geometry, and trigonometry Basic understanding of descriptive statistics Motivation to learn and excel Textbooks: Introduction to Data Science (2013) V3, by Jeffrey M. Stanton. Available for free in the itunes bookstore or as a PDF download at http://jsresearch.net. The instructor will provide additional and supplemental readings in Blackboard as electronic documents for downloading and printing. Students are expected to read the assigned materials for discussions and coursework. Contributions to Grade The work for this class will involve a mixture of individual assignments, reports, and a final project. Exercises (11 x 4% = 44%) are designed for you to practice the necessary skills in carrying out data processing, analysis, and management tasks. o 4 = Solid / no mistakes (or really minor), well commented/documented o 3 = Good / some mistakes o 2 = Fair / some major conceptual errors o 1 = Poor / did not finish o 0 = Did not participate / did not hand in Mid-term quiz (15%) is designed to evaluate your mastery of concepts, methods, and tools in data analysis and management. Note: quiz questions are graded out of 100. Your grade will be weighted by 15% o Content Coverage: Data Science ebook chapters Overview through chapter 8 o Format: multiple choice

3 Final group project (30%): For the final project you will identify a data set or a data source, transform the data as needed, and provide an analysis with visualizations. Final group project framework and deliverables are discussed in detail via the course Blackboard site. Weekly Class Discussions (11%) includes participation in weekly class discussions on Blackboard, both as discussion leader and participant. Discussion questions should be relevant to the chapter material. Contributions should be germane to the discussion question, well thought out, substantive, clear, and concise and will be assessed on these criteria. Discussion leaders are responsible for identifying and posting a chapter relevant discussion question. Discussion leaders are also responsible for monitoring and responding to postings in their respective discussion thread. Grading Policy Each assigned work will be graded on the scale as specified for the component, which will be summed at the end of the semester. Grade levels follow the scales below: A = 95-100, A- = 90-94.9, B+ = 85-89.9, B = 80-84.9, B- = 75-79.9, C+ = 70-74.9, C = 65-69.9, C- = 60-64.9, F = below 60 An incomplete grade, I, can be given only if the circumstances preventing the on-time completion of all course requirements were clearly unforeseeable and uncontrollable. If an incomplete is required a written contract must be completed which specifies the nature of the missing work, the date it will be completed, and the default grade that will be given if that deadline is missed. It is unethical to allow some students additional opportunities, such as extra credit assignments, without allowing the same options to all students. To discuss a grade, arrange for a private meeting in which you identify the sources of your concern. It is important to bring with you to that meeting the relevant materials (e.g., marked papers). Except for extraordinary circumstances, no appeal for an individual assignment or project will be considered later than two weeks after the graded assignment was returned. For final grades, no appeal will be considered after one week of final project submission date. Preferred contact methods: If you have a question that you don t mind sharing with the class, post in the Questions to the Professor discussion area. If you have a more private matter, you can email me at my syr.edu address: gekrudys@syr.edu Expect responses within 48 hours, Monday through Friday, primarily before 8:00a EST and after 5:00p EST. Occasionally I will be able to respond during normal business day hours of 8:00a EST 5:00p EST.. My response hours may seem a bit odd primarily because I work full time as a Sr. Business Intelligence Analyst for a Regional Health Information Organization. Academic Integrity The academic community of Syracuse University and of the School of Information Studies requires the highest standards of professional ethics and personal integrity from all members of the community.

4 Violations of these standards are violations of a mutual obligation characterized by trust, honesty, and personal honor. As a community, we commit ourselves to standards of academic conduct, impose sanctions against those who violate these standards, and keep appropriate records of violations. The academic integrity statement can be found at: http://www.ist.syr.edu/courses/advising/integrity.asp Student with Disabilities In compliance with section 504 of the Americans with Disabilities Act (ADA), Syracuse University is committed to ensure that no otherwise qualified individual with a disability shall, solely by reason of disability, be excluded from participation in, be denied the benefits of, or be subjected to discrimination under any program or activity If you feel that you are a student who may need academic accommodations due to a disability, you should immediately register with the Office of Disability Services (ODS) at 804 University Avenue, Room 308 3rd Floor, 315.443.4498 or 315.443.1371 (TTD only). ODS is the Syracuse University office that authorizes special accommodations for students with disabilities.

5 IST687 spring 2015 schedule as of 11.17.14 Unit/ Week 0 1/5/15 Topics Readings Activities/Assignments Course Introduction Syllabus Walk Through Course navigation Subject content Exercises/Assignments Discussion threads Grading Course communication Introduction Lecture Blackboard Complete and post Business Profile Access R-Bloggers site http://www.r-bloggers.com/ Follow instructions to subscribe to daily newsletter 1 1/12/15 Data Science Overview Overview to Data Science, pp. ii-7 Section 1: Data Science, Many Skills Install the R open source software package on your computer. Submit a screen shot of R to the instructor. 2 1/19/15 3 1/26/15 Essential concepts of data. Intro to DS, pp. 8-13 Chapter 1: About Data Using R to manipulate data. Intro to DS, pp. 14-23 Chapter 2: Identifying Data Problems; Chapter 3: Getting Started with R Exercise 1: Boolean logic. Install RStudio Submit a screen shot of RStudio control panels 4 2/2/15 Data frames and other data structures Intro to DS, pp. 24-36 Chapter 4: Follow the Data; Chapter 5: Rows and Columns Exercise 2: myfamily data structure Exercise 3: Manipulating data frames 5 2/9/15 Descriptive Statistics, Distributions, and Histograms Intro to DS, pp. 37-48 Chapter 6: Beer, Farms, and Peas Exercise 4: Generating distributions 6 2/16/15 Introduction to sampling and inferential statistics Intro to DS, pp. 49-59 Chapter 7: Sample in a Jar Exercise 5: Decision Making Thresholds 7 2/23/15 Identifying a data source for your final project Intro to DS, pp. 60-64 Chapter 8: Big Data: Big Deal Midterm Quiz 8 3/23/15 Code creation with RStudio Intro to DS, pp. 65-75 Chapter 9: Onward with R Studio Exercise 6: Code your own MySamplingDistribution() 3/9/15 Semester Break Semester Break Semester Break

6 9 3/16/15 Connecting with external data sources Intro to DS, pp. 132-147 Chapter 14: Storage Wars Exercise 7: Running SQL queries from R Submit 1 page proposal for the dataset or data source you plan to use for your final project. 10 3/23/15 Working with map data Intro to DS, pp. 148-159 Chapter 15: Map Mash-Up) Exercise 8: Color code map points 11 3/30/15 Linear modeling Intro to DS, pp. 160-169 (Chapter 16: Line Up, Please Exercise 9: Evaluating linear models Submit 2 page proposal outlining data analysis plan. 12 4/6/15 Association Rules Mining Intro to DS, pp. 170-182 Chapter 17: Hi Ho, Hi Ho-Data Mining We Go Exercise 10: Apply association rules mining to a novel data set 13 4/13/15 Support Vector Machines Intro to DS, pp. 183-193 Chapter 18: What s your vector, Victor? Exercise 11: Apply support vector analysis to a novel data set Submit 3 page project report describing results of data screening, cleaning, and linking. 14 4/20/15 Final project problem diagnosis and support No reading No exercise 15 4/27/15 Final project problem diagnosis and support No reading Final project submissions due.

7