CSE 6040 Computing for Data Analytics: Methods and Tools. Lecture 1 Course Overview

Size: px
Start display at page:

Download "CSE 6040 Computing for Data Analytics: Methods and Tools. Lecture 1 Course Overview"

Transcription

1 CSE 6040 Computing for Data Analytics: Methods and Tools Lecture 1 Course Overview DA KUANG, POLO CHAU GEORGIA TECH FALL 2014 Fall 2014 CSE 6040 COMPUTING FOR DATA ANALYSIS 1

2 Course Staff Instructor Da Kuang Postdoctoral Researcher, CSE Office: Klaus 1305 (facing the kitchen door) Office hour: Thu 4-5pm, Klaus 1315 Instructor Duen Horng (Polo) Chau Assistant Professor, CSE Office hour: Thu 4-5pm, Klaus 1315 TA Lianxiao (Shawn) Qiu MS CS Student Office hour: Mon 1-2pm, Klaus 2108 Fall 2014 CSE 6040 COMPUTING FOR DATA ANALYSIS 2

3 MS Analytics Curriculum Computing Computing for Data Analysis: Methods and Tools Data and Visual Analytics Computational Data Analysis High Performance Computing Statistics/Optimization Introduction to Analytical Methods Regression Analysis Deterministic Optimization Probabilistic Models Data Mining and Statistical Learning Simulation Time Series Analysis Business Introduction to Business for Analytics Risk Analytics Project Management Pricing Analytics and Revenue Management Business Process Analysis and Design Customer Relationship Management Introductory Advanced Fall 2014 CSE 6040 COMPUTING FOR DATA ANALYSIS 3

4 Data Analytics Problems Regression: Predicting a numerical variable Y-axis: # New homes sold in the US (shaded areas indicate US recessions) [Hal Varian, Predicting the present with search engine data, 2013] Fall 2014 CSE 6040 COMPUTING FOR DATA ANALYSIS 4

5 Data Analytics Problems Regression: Predicting a numerical variable Search frequencies on Google used as predictors Target variable: # new homes sold in the US [Hal Varian, Predicting the present with search engine data, 2013] Fall 2014 CSE 6040 COMPUTING FOR DATA ANALYSIS 5

6 Data Analytics Problems Query classification Classification: Predicting a categorical variable (or its probability) News classification Statistical machine translation Fall 2014 CSE 6040 COMPUTING FOR DATA ANALYSIS 6

7 Data Analytics Problems Clustering: Finding patterns without human labeling Both topic modeling and recommender system can be viewed as a clustering problem. Fall 2014 CSE 6040 COMPUTING FOR DATA ANALYSIS 7

8 Data Analytics Pipeline Data storage/retrieval Data collection Data analysis Data visualization Fall 2014 CSE 6040 COMPUTING FOR DATA ANALYSIS 8

9 Data Analytics Pipeline Data storage/retrieval Data collection sqlite numpy pandas scikit-learn Data analysis Scrapy Selenium BeautifulSoup Data visualization igraph bokeh Names in red are Python packages. Fall 2014 CSE 6040 COMPUTING FOR DATA ANALYSIS 8

10 Data Analytics Pipeline Data storage/retrieval Data collection Data analysis Data visualization Fall 2014 CSE 6040 COMPUTING FOR DATA ANALYSIS 9

11 What you will learn in this course Python programming (and a little bit Java and Matlab) 4 lecture One of Google s 3 main languages Python packages Data collection 2 lectures Data storage and retrieval 1 lecture Data analysis Data visualization 2 lectures Basic linear algebra (math tools, matrices, etc.) 2 lectures Basic numerical computing (how to do math programmatically) 4 lectures Several fundamental machine learning algorithms (focusing on intuitive ideas and software development for them) Linear regression 2 lectures Logistic regression 1 lecture K-means 2 lectures Singular value decomposition 4 lectures (more detailed topics are in the online tentative syllabus) Fall 2014 CSE 6040 COMPUTING FOR DATA ANALYSIS 10

12 Logistics Course website (with tentative schedule; slides and assignments will be posted here): Discussion, Q&A, find teammates on Piazza (please sign up): Homework/Project submissions on T-square (only for submission; use Piazza for discussion): Fall 2014 CSE 6040 COMPUTING FOR DATA ANALYSIS 11

13 Logistics 3 homework assignments (30%) Mid-term (20%) Project (40%) more details coming soon! Class and Piazza participation (10%) No late homework allowed. Start now to find project teammates 2~3 people per team Fall 2014 CSE 6040 COMPUTING FOR DATA ANALYSIS 12

14 What you will do in this course Attend the lectures ACTIVELY participate in class discussion Based on both in-class and Piazza activities Chat with / Help out your classmates on Piazza (but DO NOT share your answers) 10% of your grade Read tutorials/references for programming languages Read documentation for software packages Solve simple math problems Included in the mid-term: 20% of your grade Fall 2014 CSE 6040 COMPUTING FOR DATA ANALYSIS 13

15 What you will do in this course (cont d) Coding, of course! Homework #1: Collect real data online (10%) Homework #2: Visualize the data you collected (10%) Homework #3: Implement a machine learning algorithm (10%) Play with different machine learning frameworks/packages Project (40%): Work on the Yelp Dataset Data for five cities (US, Canada, UK); four of them just released this month Includes businesses, attributes, check-ins, tips, users, user connections, reviews Work in teams of 2~3 students Get inspired: (Again, in Python!) (DO NOT copy these examples for your project) Write your own team proposal (Optional) Enter the challenge ($5K prize): Round 4 through Dec 31, 2014 Fall 2014 CSE 6040 COMPUTING FOR DATA ANALYSIS 14

16 Course Expectation You will never say I don t have data. You will be exposed to the entire lifecycle of data analytics (in a simplified way). You will be able to code in Python, a common scripting language for data analytics and employed by many companies (e.g., one of Google s 3 main languages), as well as have experience in many useful packages. You will know some most fundamental machine learning algorithms. If you already know them, you will have deeper understanding for them from the computational aspect. Hopefully, you will know how to write fast code for data analytics. Fall 2014 CSE 6040 COMPUTING FOR DATA ANALYSIS 15

17 Why Python? One of Google s 3 main languages Simpler code: Focus on concepts rather than machine details More readable Many useful packages Data manipulation Machine learning Image processing Natural language processing Spatial analysis Web application... Reasonably fast Easier to parallelize than C++ Fall 2014 CSE 6040 COMPUTING FOR DATA ANALYSIS 16

18 Python Setup A text editor + A terminal (command-line window) This is the convention for (Python) developers in companies Text editor suggestions: Windows: Notepad++ (open source, with auto-indent and auto-fill) Linux: Vim, Emacs, Sublime Mac: Sublime, TextWrangler We use Python 2.7, NOT the highest version 3.x Many packages support Python 2.x only Fall 2014 CSE 6040 COMPUTING FOR DATA ANALYSIS 17

19 Go Jackets! Everyone Sign up on Piazza: Windows users Install Python on your own machine: Make sure it s Python 2.7.8, NOT Python 3.x Make sure python can be called on command-line (may need to set up environment variables) Make sure the Python27 directory is located in a root directory, NOT in Program Files Everyone Setup your development environment See Everyone Download your own Yelp dataset: (423M tarball) We cannot share it by the terms and conditions Tip: Save the page that contains the Download Data button for future use Fall 2014 CSE 6040 COMPUTING FOR DATA ANALYSIS 18

Syllabus for MATH 191 MATH 191 Topics in Data Science: Algorithms and Mathematical Foundations Department of Mathematics, UCLA Fall Quarter 2015

Syllabus for MATH 191 MATH 191 Topics in Data Science: Algorithms and Mathematical Foundations Department of Mathematics, UCLA Fall Quarter 2015 Syllabus for MATH 191 MATH 191 Topics in Data Science: Algorithms and Mathematical Foundations Department of Mathematics, UCLA Fall Quarter 2015 Lecture: MWF: 1:00-1:50pm, GEOLOGY 4645 Instructor: Mihai

More information

CSE 427 CLOUD COMPUTING WITH BIG DATA APPLICATIONS

CSE 427 CLOUD COMPUTING WITH BIG DATA APPLICATIONS CSE 427 CLOUD COMPUTING WITH BIG DATA APPLICATIONS COURSE OVERVIEW & STRUCTURE Fall 2015 Marion Neumann ABOUT Marion Neumann email: m dot neumann at wustl dot edu office: Jolley Hall 403 office hours:

More information

Big Data Analytics Process & Building Blocks

Big Data Analytics Process & Building Blocks Big Data Analytics Process & Building Blocks Duen Horng (Polo) Chau Georgia Tech CSE 6242 A / CS 4803 DVA Jan 10, 2013 Partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko, Christos

More information

ANALYTICS CENTER LEARNING PROGRAM

ANALYTICS CENTER LEARNING PROGRAM Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals

More information

Attend Part 1 (2-3pm) to get 1 point extra credit. Polo will announce on Piazza options for DL students.

Attend Part 1 (2-3pm) to get 1 point extra credit. Polo will announce on Piazza options for DL students. Attend Part 1 (2-3pm) to get 1 point extra credit. Polo will announce on Piazza options for DL students. Data Science/Data Analytics and Scaling to Big Data with MathWorks Using Data Analytics to turn

More information

DATA SCIENCE CURRICULUM WEEK 1 ONLINE PRE-WORK INSTALLING PACKAGES COMMAND LINE CODE EDITOR PYTHON STATISTICS PROJECT O5 PROJECT O3 PROJECT O2

DATA SCIENCE CURRICULUM WEEK 1 ONLINE PRE-WORK INSTALLING PACKAGES COMMAND LINE CODE EDITOR PYTHON STATISTICS PROJECT O5 PROJECT O3 PROJECT O2 DATA SCIENCE CURRICULUM Before class even begins, students start an at-home pre-work phase. When they convene in class, students spend the first eight weeks doing iterative, project-centered skill acquisition.

More information

Data Science and Business Analytics Certificate Data Science and Business Intelligence Certificate

Data Science and Business Analytics Certificate Data Science and Business Intelligence Certificate Data Science and Business Analytics Certificate Data Science and Business Intelligence Certificate Description The Helzberg School of Management has launched two graduate-level certificates: one in Data

More information

DATA SCIENCE ADVISING NOTES David Wild - updated May 2015

DATA SCIENCE ADVISING NOTES David Wild - updated May 2015 DATA SCIENCE ADVISING NOTES David Wild - updated May 2015 GENERAL NOTES Lots of information can be found on the website at http://datascience.soic.indiana.edu. Dr David Wild, Data Science Graduate Program

More information

BUDT 758B-0501: Big Data Analytics (Fall 2015) Decisions, Operations & Information Technologies Robert H. Smith School of Business

BUDT 758B-0501: Big Data Analytics (Fall 2015) Decisions, Operations & Information Technologies Robert H. Smith School of Business BUDT 758B-0501: Big Data Analytics (Fall 2015) Decisions, Operations & Information Technologies Robert H. Smith School of Business Instructor: Kunpeng Zhang (kzhang@rmsmith.umd.edu) Lecture-Discussions:

More information

Introduction to Computer Graphics. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2012

Introduction to Computer Graphics. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2012 CSE 167: Introduction to Computer Graphics Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2012 Today Course organization Course overview 2 Course Staff Instructor Jürgen Schulze,

More information

CSci 538 Articial Intelligence (Machine Learning and Data Analysis)

CSci 538 Articial Intelligence (Machine Learning and Data Analysis) CSci 538 Articial Intelligence (Machine Learning and Data Analysis) Course Syllabus Fall 2015 Instructor Derek Harter, Ph.D., Associate Professor Department of Computer Science Texas A&M University - Commerce

More information

Numerical Analysis. Professor Donna Calhoun. Fall 2013 Math 465/565. Office : MG241A Office Hours : Wednesday 10:00-12:00 and 1:00-3:00

Numerical Analysis. Professor Donna Calhoun. Fall 2013 Math 465/565. Office : MG241A Office Hours : Wednesday 10:00-12:00 and 1:00-3:00 Numerical Analysis Professor Donna Calhoun Office : MG241A Office Hours : Wednesday 10:00-12:00 and 1:00-3:00 Fall 2013 Math 465/565 http://math.boisestate.edu/~calhoun/teaching/math565_fall2013 What is

More information

Challenges and Lessons from NIST Data Science Pre-pilot Evaluation in Introduction to Data Science Course Fall 2015

Challenges and Lessons from NIST Data Science Pre-pilot Evaluation in Introduction to Data Science Course Fall 2015 Challenges and Lessons from NIST Data Science Pre-pilot Evaluation in Introduction to Data Science Course Fall 2015 Dr. Daisy Zhe Wang Director of Data Science Research Lab University of Florida, CISE

More information

BUSA 501: Introduction to Business Analytics

BUSA 501: Introduction to Business Analytics BUSA 501: Introduction to Business Analytics COURSE SYLLABUS: Spring 2016 01W Instructor: Dr. Bo Han Email Address: bo.han@tamuc.edu To protect your academic privacy, please always send me emails from

More information

AMIS 7640 Data Mining for Business Intelligence

AMIS 7640 Data Mining for Business Intelligence The Ohio State University The Max M. Fisher College of Business Department of Accounting and Management Information Systems AMIS 7640 Data Mining for Business Intelligence Autumn Semester 2013, Session

More information

INFO/CS 4302 Web Information Systems. FT 2012 Week 1: Course Introduction

INFO/CS 4302 Web Information Systems. FT 2012 Week 1: Course Introduction INFO/CS 4302 Web Information Systems FT 2012 Week 1: Course Introduction Who We Are - Instructors Bernhard Haslhofer Theresa Velden bh392@cornell.edu Office hours: TUE / THU 1:30-3:00 tav6@cornell.edu

More information

CSC 314: Operating Systems Spring 2005

CSC 314: Operating Systems Spring 2005 CSC 314: Operating Systems Spring 2005 Instructor: Lori Carter lcarter@ptloma.edu (619) 849-2352 Office hours: MWF TTh 11:00 a.m. 12:00 p.m. 1:15 2:15 p.m 10:00-11:30 a.m. Texts: Silbershatz et.al, Operating

More information

Advanced analytics at your hands

Advanced analytics at your hands 2.3 Advanced analytics at your hands Neural Designer is the most powerful predictive analytics software. It uses innovative neural networks techniques to provide data scientists with results in a way previously

More information

PRACTICAL DATA SCIENCE

PRACTICAL DATA SCIENCE PRACTICAL DATA SCIENCE INFO-GB.3359.10 Fall 2013 SYLLABUS Professors Josh Attenberg Office; Hours Wednesdays 2-3, KMC 8-171 & By appointment Email jattenbe@stern.nyu.edu Emails should have subject tag:

More information

CPSC 340: Machine Learning and Data Mining. Mark Schmidt University of British Columbia Fall 2015

CPSC 340: Machine Learning and Data Mining. Mark Schmidt University of British Columbia Fall 2015 CPSC 340: Machine Learning and Data Mining Mark Schmidt University of British Columbia Fall 2015 Outline 1) Intro to Machine Learning and Data Mining: Big data phenomenon and types of data. Definitions

More information

02-201: Programming for Scientists

02-201: Programming for Scientists 1. Course Information 1.1 Course description 02-201: Programming for Scientists Carl Kingsford Fall 2015 Provides a practical introduction to programming for students with little or no prior programming

More information

Text Analytics (Text Mining)

Text Analytics (Text Mining) CSE 6242 / CX 4242 Apr 3, 2014 Text Analytics (Text Mining) LSI (uses SVD), Visualization Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey

More information

Scaling Up 2 CSE 6242 / CX 4242. Duen Horng (Polo) Chau Georgia Tech. HBase, Hive

Scaling Up 2 CSE 6242 / CX 4242. Duen Horng (Polo) Chau Georgia Tech. HBase, Hive CSE 6242 / CX 4242 Scaling Up 2 HBase, Hive Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko, Christos Faloutsos, Le

More information

CSCI-599 DATA MINING AND STATISTICAL INFERENCE

CSCI-599 DATA MINING AND STATISTICAL INFERENCE CSCI-599 DATA MINING AND STATISTICAL INFERENCE Course Information Course ID and title: CSCI-599 Data Mining and Statistical Inference Semester and day/time/location: Spring 2013/ Mon/Wed 3:30-4:50pm Instructor:

More information

An interdisciplinary model for analytics education

An interdisciplinary model for analytics education An interdisciplinary model for analytics education Raffaella Settimi, PhD School of Computing, DePaul University Drew Conway s Data Science Venn Diagram http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram

More information

Big Data Analytics Building Blocks; Simple Data Storage (SQLite)

Big Data Analytics Building Blocks; Simple Data Storage (SQLite) Big Data Analytics Building Blocks; Simple Data Storage (SQLite) Duen Horng (Polo) Chau Georgia Tech CSE6242 / CX4242 Jan 9, 2014 Partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John

More information

Big Data Analytics Building Blocks; Simple Data Storage (SQLite)

Big Data Analytics Building Blocks; Simple Data Storage (SQLite) Big Data Analytics Building Blocks; Simple Data Storage (SQLite) Duen Horng (Polo) Chau Georgia Tech CSE6242 / CX4242 Aug 21, 2014 Partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John

More information

How To Get A Masters Degree In Logistics And Supply Chain Management

How To Get A Masters Degree In Logistics And Supply Chain Management Industrial and Systems Engineering Master of Science Program Logistics and Supply Chain Management Department of Integrated Systems Engineering The Ohio State University Logistics is the science of design,

More information

Analysis Tools and Libraries for BigData

Analysis Tools and Libraries for BigData + Analysis Tools and Libraries for BigData Lecture 02 Abhijit Bendale + Office Hours 2 n Terry Boult (Waiting to Confirm) n Abhijit Bendale (Tue 2:45 to 4:45 pm). Best if you email me in advance, but I

More information

PROGRAMMING FOR BIOLOGISTS. BIOL 6297 Monday, Wednesday 10 am -12 pm

PROGRAMMING FOR BIOLOGISTS. BIOL 6297 Monday, Wednesday 10 am -12 pm PROGRAMMING FOR BIOLOGISTS BIOL 6297 Monday, Wednesday 10 am -12 pm Tomorrow is Ada Lovelace Day Ada Lovelace was the first person to write a computer program Today s Lecture Overview of the course Philosophy

More information

Scaling Up HBase, Hive, Pegasus

Scaling Up HBase, Hive, Pegasus CSE 6242 A / CS 4803 DVA Mar 7, 2013 Scaling Up HBase, Hive, Pegasus Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko,

More information

CS 40 Computing for the Web

CS 40 Computing for the Web CS 40 Computing for the Web Art Lee January 20, 2015 Announcements Course web on Sakai Homework assignments submit them on Sakai Email me the survey: See the Announcements page on the course web for instructions

More information

CMSC 10600 Fundamentals of Computer Programming II (C++)

CMSC 10600 Fundamentals of Computer Programming II (C++) CMSC 10600 Fundamentals of Computer Programming II (C++) Department of Computer Science University of Chicago Winter 2011 Quarter Dates: January 3 through March 19, 2011 Lectures: TuTh 12:00-13:20 in Ryerson

More information

COMP 598 Applied Machine Learning Lecture 21: Parallelization methods for large-scale machine learning! Big Data by the numbers

COMP 598 Applied Machine Learning Lecture 21: Parallelization methods for large-scale machine learning! Big Data by the numbers COMP 598 Applied Machine Learning Lecture 21: Parallelization methods for large-scale machine learning! Instructor: (jpineau@cs.mcgill.ca) TAs: Pierre-Luc Bacon (pbacon@cs.mcgill.ca) Ryan Lowe (ryan.lowe@mail.mcgill.ca)

More information

Course Content Concepts

Course Content Concepts CS 1371 SYLLABUS, Spring, 2016 Revised 1/8/16 Computing for Engineers Course Content Concepts The students will be expected to be familiar with the following concepts, either by writing code to solve problems,

More information

Email: justinjia@ust.hk Office: LSK 5045 Begin subject: [ISOM3360]...

Email: justinjia@ust.hk Office: LSK 5045 Begin subject: [ISOM3360]... Business Intelligence and Data Mining ISOM 3360: Spring 2015 Instructor Contact Office Hours Course Schedule and Classroom Course Webpage Jia Jia, ISOM Email: justinjia@ust.hk Office: LSK 5045 Begin subject:

More information

CMPT 165 INTRODUCTION TO THE INTERNET AND THE WORLD WIDE WEB

CMPT 165 INTRODUCTION TO THE INTERNET AND THE WORLD WIDE WEB CMPT 165 INTRODUCTION TO THE INTERNET AND THE WORLD WIDE WEB Unit 0 Course Introduction Slides based on course material SFU Icons their respective owners 1 How many activities in your life make use of

More information

Information and Decision Sciences (IDS)

Information and Decision Sciences (IDS) University of Illinois at Chicago 1 Information and Decision Sciences (IDS) Courses IDS 400. Advanced Business Programming Using Java. 0-4 Visual extended business language capabilities, including creating

More information

IN THE CITY OF NEW YORK Decision Risk and Operations. Advanced Business Analytics Fall 2015

IN THE CITY OF NEW YORK Decision Risk and Operations. Advanced Business Analytics Fall 2015 Advanced Business Analytics Fall 2015 Course Description Business Analytics is about information turning data into action. Its value derives fundamentally from information gaps in the economic choices

More information

CS 51 Intro to CS. Art Lee. September 2, 2014

CS 51 Intro to CS. Art Lee. September 2, 2014 CS 51 Intro to CS Art Lee September 2, 2014 Announcements Course web page at: http://www.cmc.edu/pages/faculty/alee/cs51/ Homework/Lab assignment submission on Sakai: https://sakai.claremont.edu/portal/site/cx_mtg_79055

More information

Machine Learning. CUNY Graduate Center, Spring 2013. Professor Liang Huang. huang@cs.qc.cuny.edu

Machine Learning. CUNY Graduate Center, Spring 2013. Professor Liang Huang. huang@cs.qc.cuny.edu Machine Learning CUNY Graduate Center, Spring 2013 Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning Logistics Lectures M 9:30-11:30 am Room 4419 Personnel

More information

Web Design Principles

Web Design Principles Web Design Principles University of Florida, Online Masters in Web Design and Communication Course Number: MMC 5277 Term: Summer 2013 Credits: 4 Meeting Time: Tuesday and Thursday, 8-10pm EST Meeting Location:

More information

Big Data Analytics Building Blocks. Simple Data Storage (SQLite)

Big Data Analytics Building Blocks. Simple Data Storage (SQLite) http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Big Data Analytics Building Blocks. Simple Data Storage (SQLite) Duen Horng (Polo) Chau Georgia Tech Partly based on materials

More information

Lecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions

Lecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions SMA 50: Statistical Learning and Data Mining in Bioinformatics (also listed as 5.077: Statistical Learning and Data Mining ()) Spring Term (Feb May 200) Faculty: Professor Roy Welsch Wed 0 Feb 7:00-8:0

More information

Geography 167: Cartography (Summer 2014, Session A) Instructor Course Description Learning Objectives: Course Delivery Method: online course

Geography 167: Cartography (Summer 2014, Session A) Instructor Course Description Learning Objectives: Course Delivery Method: online course Geography 167: Cartography (Summer 2014, Session A) Instructor: Nick Burkhart Office hours: Tues./Thurs., 9am 10am or by appointment Email: nickburkhart@ucla.edu Skype: nick.uclagis Course Description:

More information

Big Data Analytics Building Blocks. Simple Data Storage (SQLite)

Big Data Analytics Building Blocks. Simple Data Storage (SQLite) http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Big Data Analytics Building Blocks. Simple Data Storage (SQLite) Duen Horng (Polo) Chau Georgia Tech Partly based on materials

More information

Learning outcomes. Knowledge and understanding. Competence and skills

Learning outcomes. Knowledge and understanding. Competence and skills Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges

More information

QMB 3302 - Business Analytics CRN 80700 - Fall 2015 T & R 9.30 to 10.45 AM -- Lutgert Hall 2209

QMB 3302 - Business Analytics CRN 80700 - Fall 2015 T & R 9.30 to 10.45 AM -- Lutgert Hall 2209 QMB 3302 - Business Analytics CRN 80700 - Fall 2015 T & R 9.30 to 10.45 AM -- Lutgert Hall 2209 Elias T. Kirche, Ph.D. Associate Professor Department of Information Systems and Operations Management Lutgert

More information

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large

More information

IST565 M001 Yu Spring 2015 Syllabus Data Mining

IST565 M001 Yu Spring 2015 Syllabus Data Mining IST565 M001 Yu Spring 2015 Syllabus Data Mining Draft updated 10/28/2014 Instructor: Professor Bei Yu Classroom: Hinds 117 Email: byu.teaching@gmail.com Class time: 3:45-5:05 Wednesdays Office: Hinds 320

More information

Lecture: Mon 13:30 14:50 Fri 9:00-10:20 ( LTH, Lift 27-28) Lab: Fri 12:00-12:50 (Rm. 4116)

Lecture: Mon 13:30 14:50 Fri 9:00-10:20 ( LTH, Lift 27-28) Lab: Fri 12:00-12:50 (Rm. 4116) Business Intelligence and Data Mining ISOM 3360: Spring 203 Instructor Contact Office Hours Course Schedule and Classroom Course Webpage Jia Jia, ISOM Email: justinjia@ust.hk Office: Rm 336 (Lift 3-) Begin

More information

CS 301 Course Information

CS 301 Course Information CS 301: Languages and Automata January 9, 2009 CS 301 Course Information Prof. Robert H. Sloan Handout 1 Lecture: Tuesday Thursday, 2:00 3:15, LC A5 Weekly Problem Session: Wednesday, 4:00 4:50 p.m., LC

More information

College of Health and Human Services. Fall 2013. Syllabus

College of Health and Human Services. Fall 2013. Syllabus College of Health and Human Services Fall 2013 Syllabus information placement Instructor description objectives HAP 780 : Data Mining in Health Care Time: Mondays, 7.20pm 10pm (except for 3 rd lecture

More information

Course Description This course will change the way you think about data and its role in business.

Course Description This course will change the way you think about data and its role in business. INFO-GB.3336 Data Mining for Business Analytics Section 32 (Tentative version) Spring 2014 Faculty Class Time Class Location Yilu Zhou, Ph.D. Associate Professor, School of Business, Fordham University

More information

Introduction to data mining

Introduction to data mining Introduction to data mining Ryan Tibshirani Data Mining: 36-462/36-662 January 15 2013 1 Logistics Course website (syllabus, lectures slides, homeworks, etc.): http://www.stat.cmu.edu/~ryantibs/datamining

More information

CSE 40437/60437 - Social Sensing and Cyber- Physical Systems - Spring 2015

CSE 40437/60437 - Social Sensing and Cyber- Physical Systems - Spring 2015 CSE 40437/60437 - Social Sensing and Cyber- Physical Systems - Spring 2015 Instructor Prof. Dong Wang dwang5 at nd dot edu Office Hours: Tue 3:15-5:15 PM, 214B Cushing Hall TA: Chao Huang chuang7 at nd

More information

MSCA 31000 Introduction to Statistical Concepts

MSCA 31000 Introduction to Statistical Concepts MSCA 31000 Introduction to Statistical Concepts This course provides general exposure to basic statistical concepts that are necessary for students to understand the content presented in more advanced

More information

Fast Analytics on Big Data with H20

Fast Analytics on Big Data with H20 Fast Analytics on Big Data with H20 0xdata.com, h2o.ai Tomas Nykodym, Petr Maj Team About H2O and 0xdata H2O is a platform for distributed in memory predictive analytics and machine learning Pure Java,

More information

CS 1361-D10: Computer Science I

CS 1361-D10: Computer Science I CS 1361-D10: Computer Science I Instructor Name: Lopamudra Roychoudhuri Instructor Phone Number: (325) 486-5448 Instructor email: lroychoudhuri@angelo.edu Instructor Office: MCS 205E Class Times: Online,

More information

Proposal for Undergraduate Certificate in Large Data Analysis

Proposal for Undergraduate Certificate in Large Data Analysis Proposal for Undergraduate Certificate in Large Data Analysis To: Helena Dettmer, Associate Dean for Undergraduate Programs and Curriculum From: Suely Oliveira (Computer Science), Kate Cowles (Statistics),

More information

Programming Languages

Programming Languages CS 345 Programming Languages Vitaly Shmatikov http://www.cs.utexas.edu/~shmat/courses/cs345/ slide 1 Course Personnel Instructor: Vitaly Shmatikov Office: CSA 1.114 Office hours: Tuesday, 3:30-4:30pm (after

More information

Course Syllabus. Purposes of Course:

Course Syllabus. Purposes of Course: Course Syllabus Eco 5385.701 Predictive Analytics for Economists Summer 2014 TTh 6:00 8:50 pm and Sat. 12:00 2:50 pm First Day of Class: Tuesday, June 3 Last Day of Class: Tuesday, July 1 251 Maguire Building

More information

MBAD/DSBA 6278 (U90): Innovation Analytics (IA)

MBAD/DSBA 6278 (U90): Innovation Analytics (IA) MBAD/DSBA 6278 (U90): Innovation Analytics (IA) Semester: Spring 2016 Time & Room: Thu 5:30pm-8:15pm @ Center City 801 (Lab) Course Website: Moodle (moodle2.uncc.edu) Instructor: Associate Professor Instructor

More information

CS 207 - Data Science and Visualization Spring 2016

CS 207 - Data Science and Visualization Spring 2016 CS 207 - Data Science and Visualization Spring 2016 Professor: Sorelle Friedler sorelle@cs.haverford.edu An introduction to techniques for the automated and human-assisted analysis of data sets. These

More information

Oracle Advanced Analytics 12c & SQLDEV/Oracle Data Miner 4.0 New Features

Oracle Advanced Analytics 12c & SQLDEV/Oracle Data Miner 4.0 New Features Oracle Advanced Analytics 12c & SQLDEV/Oracle Data Miner 4.0 New Features Charlie Berger, MS Eng, MBA Sr. Director Product Management, Data Mining and Advanced Analytics charlie.berger@oracle.com www.twitter.com/charliedatamine

More information

What is Data Science? Data, Databases, and the Extraction of Knowledge Renée T., @becomingdatasci, November 2014

What is Data Science? Data, Databases, and the Extraction of Knowledge Renée T., @becomingdatasci, November 2014 What is Data Science? { Data, Databases, and the Extraction of Knowledge Renée T., @becomingdatasci, November 2014 Let s start with: What is Data? http://upload.wikimedia.org/wikipedia/commons/f/f0/darpa

More information

QMB 3302 - Business Analytics CRN 82361 - Fall 2015 W 6:30-9:15 PM -- Lutgert Hall 2209

QMB 3302 - Business Analytics CRN 82361 - Fall 2015 W 6:30-9:15 PM -- Lutgert Hall 2209 QMB 3302 - Business Analytics CRN 82361 - Fall 2015 W 6:30-9:15 PM -- Lutgert Hall 2209 Rajesh Srivastava, Ph.D. Professor and Chair, Department of Information Systems and Operations Management Lutgert

More information

AMIS 7640 Data Mining for Business Intelligence

AMIS 7640 Data Mining for Business Intelligence The Ohio State University The Max M. Fisher College of Business Department of Accounting and Management Information Systems AMIS 7640 Data Mining for Business Intelligence Autumn Semester 2014, Session

More information

Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini

Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini NEW YORK UNIVERSITY ROBERT F. WAGNER GRADUATE SCHOOL OF PUBLIC SERVICE Course Syllabus Spring 2016 Statistical Methods for Public, Nonprofit, and Health Management Section Format Day Begin End Building

More information

4-letter Designator Prefix Course Number Suffix

4-letter Designator Prefix Course Number Suffix L ~ ;~ NEW COURSE PROPOSAL NCP ~""t!'_ tt:jnj(ntm ~~~ - >"-=:;A usc Columbia, Lancaster, Salkehatchie, Sumter & Union campuses ~~~~;~=:~:-~~;;~~:~;~~::;~~:~~~ -~~= -~ourse to the Universfty course database.

More information

CS 253: Intro to Systems Programming

CS 253: Intro to Systems Programming CS 253: Intro to Systems Programming Spring 2014 Amit Jain, Shane Panter, Marissa Schmidt Department of Computer Science College of Engineering Boise State University Logistics Instructor: Amit Jain http://cs.boisestate.edu/~amit

More information

Government of Russian Federation. Faculty of Computer Science School of Data Analysis and Artificial Intelligence

Government of Russian Federation. Faculty of Computer Science School of Data Analysis and Artificial Intelligence Government of Russian Federation Federal State Autonomous Educational Institution of High Professional Education National Research University «Higher School of Economics» Faculty of Computer Science School

More information

Machine Learning with MATLAB David Willingham Application Engineer

Machine Learning with MATLAB David Willingham Application Engineer Machine Learning with MATLAB David Willingham Application Engineer 2014 The MathWorks, Inc. 1 Goals Overview of machine learning Machine learning models & techniques available in MATLAB Streamlining the

More information

Introduction to Data Science: CptS 483-06 Syllabus First Offering: Fall 2015

Introduction to Data Science: CptS 483-06 Syllabus First Offering: Fall 2015 Course Information Introduction to Data Science: CptS 483-06 Syllabus First Offering: Fall 2015 Credit Hours: 3 Semester: Fall 2015 Meeting times and location: MWF, 12:10 13:00, Sloan 163 Course website:

More information

MACHINE LEARNING IN HIGH ENERGY PHYSICS

MACHINE LEARNING IN HIGH ENERGY PHYSICS MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!

More information

Data Science Certificate Program

Data Science Certificate Program Information Technologies Programs Data Science Certificate Program Accelerate Your Career extension.uci.edu/datascience Offered in partnership with University of California, Irvine Extension s professional

More information

Department of Electrical and Electronic Engineering, California State University, Sacramento

Department of Electrical and Electronic Engineering, California State University, Sacramento Department of Electrical and Electronic Engineering, California State University, Sacramento Engr 17 Introductory Circuit Analysis, graded, 3 units Instructor: Tatro - Spring 2016 Section 2, Call No. 30289,

More information

Hadoop MapReduce and Spark. Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015

Hadoop MapReduce and Spark. Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015 Hadoop MapReduce and Spark Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015 Outline Hadoop Hadoop Import data on Hadoop Spark Spark features Scala MLlib MLlib

More information

UNIVERSITY OF LETHBRIDGE FACULTY OF MANAGEMENT. Management 3830 - Contemporary Database Applications (Using Access)

UNIVERSITY OF LETHBRIDGE FACULTY OF MANAGEMENT. Management 3830 - Contemporary Database Applications (Using Access) UNIVERSITY OF LETHBRIDGE FACULTY OF MANAGEMENT Management 3830 - Contemporary Database Applications (Using Access) Term: Spring 2014 Instructor: Brian Dobing, Room M-4053, 329-2492, brian.dobing@uleth.ca

More information

MSCA 31000 Introduction to Statistical Concepts

MSCA 31000 Introduction to Statistical Concepts MSCA 31000 Introduction to Statistical Concepts This course provides general exposure to basic statistical concepts that are necessary for students to understand the content presented in more advanced

More information

CSE454 Project Part4: Dealer s Choice Assigned: Monday, November 28, 2005 Due: 10:30 AM, Thursday, December 15, 2005

CSE454 Project Part4: Dealer s Choice Assigned: Monday, November 28, 2005 Due: 10:30 AM, Thursday, December 15, 2005 CSE454 Project Part4: Dealer s Choice Assigned: Monday, November 28, 2005 Due: 10:30 AM, Thursday, December 15, 2005 1 Project Description For the last part of your project, you should choose what to do.

More information

COLUMBIA UNIVERSITY IN THE CITY OF NEW YORK DEPARTMENT OF INDUSTRIAL ENGINEERING AND OPERATIONS RESEARCH

COLUMBIA UNIVERSITY IN THE CITY OF NEW YORK DEPARTMENT OF INDUSTRIAL ENGINEERING AND OPERATIONS RESEARCH Course: IEOR 4575 Business Analytics for Operations Research Lectures MW 2:40-3:55PM Instructor Prof. Guillermo Gallego Office Hours Tuesdays: 3-4pm Office: CEPSR 822 (8 th floor) Textbooks and Learning

More information

Session 85 IF, Predictive Analytics for Actuaries: Free Tools for Life and Health Care Analytics--R and Python: A New Paradigm!

Session 85 IF, Predictive Analytics for Actuaries: Free Tools for Life and Health Care Analytics--R and Python: A New Paradigm! Session 85 IF, Predictive Analytics for Actuaries: Free Tools for Life and Health Care Analytics--R and Python: A New Paradigm! Moderator: David L. Snell, ASA, MAAA Presenters: Brian D. Holland, FSA, MAAA

More information

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning. Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

More information

CAS CS 565, Data Mining

CAS CS 565, Data Mining CAS CS 565, Data Mining Course logistics Course webpage: http://www.cs.bu.edu/~evimaria/cs565-10.html Schedule: Mon Wed, 4-5:30 Instructor: Evimaria Terzi, evimaria@cs.bu.edu Office hours: Mon 2:30-4pm,

More information

Big-data Analytics: Challenges and Opportunities

Big-data Analytics: Challenges and Opportunities Big-data Analytics: Challenges and Opportunities Chih-Jen Lin Department of Computer Science National Taiwan University Talk at 台 灣 資 料 科 學 愛 好 者 年 會, August 30, 2014 Chih-Jen Lin (National Taiwan Univ.)

More information

MS1b Statistical Data Mining

MS1b Statistical Data Mining MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to

More information

CS 378: Computer Game Technology

CS 378: Computer Game Technology CS 378: Computer Game Technology http://www.cs.utexas.edu/~fussell/courses/cs378/ Spring 2013 University of Texas at Austin CS 378 Game Technology Don Fussell Instructor and TAs! Instructor: Don Fussell!

More information

College/School: College of Science Department: Forensics Science Program Submitted by: Jason Kinser Ext: 3-3785 Email: jkinser@gmu.

College/School: College of Science Department: Forensics Science Program Submitted by: Jason Kinser Ext: 3-3785 Email: jkinser@gmu. Course Approval Form For approval of new courses and deletions or modifications to an existing course. More information is located on page 2. Action Requested: Course Level: X Create new course Delete

More information

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing CS Master Level Courses and Areas The graduate courses offered may change over time, in response to new developments in computer science and the interests of faculty and students; the list of graduate

More information

ECE 297 Design and Communication. Course Syllabus, January 2015

ECE 297 Design and Communication. Course Syllabus, January 2015 ECE 297 Design and Communication Course Syllabus, January 2015 Lecturers and Office Hours: Design Communication Lecturer Vaughn Betz Ken Tallman Office Location 311 Engineering Annex Sanford Fleming, SF

More information

1.00 Lecture 1. Course information Course staff (TA, instructor names on syllabus/faq): 2 instructors, 4 TAs, 2 Lab TAs, graders

1.00 Lecture 1. Course information Course staff (TA, instructor names on syllabus/faq): 2 instructors, 4 TAs, 2 Lab TAs, graders 1.00 Lecture 1 Course Overview Introduction to Java Reading for next time: Big Java: 1.1-1.7 Course information Course staff (TA, instructor names on syllabus/faq): 2 instructors, 4 TAs, 2 Lab TAs, graders

More information

Course Title: Advanced Topics in Quantitative Methods: Educational Data Science Practicum

Course Title: Advanced Topics in Quantitative Methods: Educational Data Science Practicum COURSE NUMBER: APSTA- GE.2017 Course Title: Advanced Topics in Quantitative Methods: Educational Data Science Practicum Number of Credits: 2 Meeting Pattern: 3 hours per week, 7 weeks; first class meets

More information

Business Analytics Syllabus

Business Analytics Syllabus B6101 Business Analytics Fall 2014 Business Analytics Syllabus Course Description Business analytics refers to the ways in which enterprises such as businesses, non-profits, and governments can use data

More information

University of Washington, Tacoma TCSS 360 (Software Development and Quality Assurance Techniques), Spring 2005 Handout 1: Course Syllabus

University of Washington, Tacoma TCSS 360 (Software Development and Quality Assurance Techniques), Spring 2005 Handout 1: Course Syllabus University of Washington, Tacoma TCSS 360 (Software Development and Quality Assurance Techniques), Spring 2005 Handout 1: Course Syllabus Contact information: name: Marty Stepp email: stepp AT u washington

More information

How To Learn Data Analytics

How To Learn Data Analytics COURSE DESCRIPTION Spring 2014 COURSE NAME COURSE CODE DESCRIPTION Data Analytics: Introduction, Methods and Practical Approaches INF2190H The influx of data that is created, gathered, stored and accessed

More information

BIOM611 Biological Data Analysis

BIOM611 Biological Data Analysis BIOM611 Biological Data Analysis Spring, 2015 Tentative Syllabus Introduction BIOMED611 is a ½ unit course required for all 1 st year BGS students (except GCB students). It will provide an introduction

More information