?????? Data Analytics Prof. Dr.-Ing. Lars Linsen Prof. Dr. Adalbert FX Wilhelm Fall 2015
0. Organizational Stuff
0.1 Syllabus and Organization Data Analytics 3
Course website http://www.faculty.jacobsuniversity.de/llinsen/teaching/??????.htm (will be accessible through CampusNet) Data Analytics 4
Course description This course provides an introduction to data analytics concepts and methods. The objective of the course is to present methods for gaining insight from data and drawing conclusions for analytical reasoning and decision making. The course starts off by giving real-world examples. Abstracting from these examples leads towards a taxonomy for data types, their characteristics, and relations. The course comprises methods for the analytics of text or document data, image data, high-dimensional data, time-series data, and geospatial data. Moreover, concepts for the analysis of hierarchical, uni-, or bilateral relations are being taught. Data visualization methods are used for visual data representations, visual encoding, and interaction mechanisms, leading to an interactive visual analytics process. Automatic analysis components such as data transformation, aggregation, classification, clustering, and outlier detection are an integral part of the analytics process. Data Analytics 5
Lectures Times: - Tuesday, 9:45am 11:00am, - Thursday 8:15am 9:30am. Location:??? Data Analytics 6
Instructors Lars Linsen (75%) Office: Res I, 128. Phone: 3196 E-Mail: l.linsen [@jacobs-university.de] Office hours: by appointment Adalbert FX Wilhelm (25%) Office: Res IV, 111. Phone: 3402 E-Mail: a.wilhelm [@jacobs-university.de] Data Analytics 7
Lectures Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 Week 10 Week 11 Week 12 Week 13 Week 14 Sep 8 Sep 10 Sep 17 Sep 24 Oct 1 Oct 8 Oct 15 Oct 22 Oct 29 Nov 5 Nov 12 Nov 19 Nov 26 Dec 3 Linsen Wilhelm Linsen Linsen Linsen Linsen Linsen Wilhelm Wilhelm Wilhelm Linsen Linsen Linsen Linsen Data Analytics 8
Tuesdays Thursday lectures end with an assignment, where the taught material needs to be applied to a real-world problem. Students will work in groups on a solution. It is intended that group compositions change during the duration of the course. Students will present solutions in the Tuesday slots. Data Analytics 9
Exams There will be a written final examination. Date of the exam: tbd (around finals week). There will be no quizzes or midterms. Data Analytics 10
Grading Assignments: 60% Final exam: 40% Data Analytics 11
Literature??? Alexandru Telea: Data Visualization: Principles and Practice, Wellesley, Mass.: AK Peters, 1st edition, 2008. Matthew Ward, Georges Grinstein, Daniel Keim: Interactive Data Visualization: Foundations, Techniques, and Applications. AK Peters, 1st edition, 2010. Data Analytics 12
Goal This course provides an introduction to data analytics concepts and methods. The objective of the course is to present methods for gaining insight from data and drawing conclusions for analytical reasoning and decision making. Data Analytics 13
Topics Introductory examples Taxonomy for data types Supervised and unsupervised learning Visual analytics High-dimensional data analytics Aggregation, clustering, and classification Text and document data analytics Image data analytics Relations Time-series data analytics Geospatial data analytics Data Analytics 14
1. Introductory Examples and Taxonomy
1.1 Examples for the Digital Era
Social media [LinkedIn] Data Analytics 17
Twitter Data Analytics 18
Twitter Data Analytics 19
Twitter Data Analytics 20
Twitter Data Analytics 21
Instagram Data Analytics 22
Instagram Data Analytics 23
Some challenges bilateral relations huge network text & document data image data time-varying data geospatial data different heterogeneous sources Data Analytics 24
Tasks detect hot topics what goes viral? detect trends detect changes over time detect spatio-temporal patterns Data Analytics 25
Movies online Data Analytics 26
Netflix competition Data Analytics 27
Some challenges massive data: 500k users 20k movies 100m ratings many factors affect ratings actors directors genres high-dimensional data data incomplete Data Analytics 28
Tasks detect correlations understand correlations make predictions (related to many other application, cf. online selling, e.g., amazon etc.) Data Analytics 29
Human genome Data Analytics 30
Microarrays Data Analytics 31
Sequencing Data Analytics 32
Sequencing costs Data Analytics 33
Genome data Data Analytics 34
Genome data Data Analytics 35
Genome visualization Data Analytics 36
Genome visualization Data Analytics 37
Genome visualization Data Analytics 38
Personalized therapy 10 years from now, each cancer patient is going to want to get a genomic analysis of their cancer and will expect customized therapy based on that information. (Director of The Cancer Genome Atlas, Time Magazine, June 13, 2011) Data Analytics 39
Connectome Ramon y Cajal, 1905 Data Analytics 40
Connectome workflow Data Analytics 41
Ultra-thin eletron microscopy sections Data Analytics 42
Automatic reconstruction Data Analytics 43
Connectome visualization Data Analytics 44
Crime prevention Data Analytics 45
Predictive policing [sueddeutsche.de] Data Analytics 46
Predictive policing using Tableau Data Analytics 47
Internet of things Data Analytics 48
Taxi data Data Analytics 49
1.2 Big data analytics
Big Data Data Analytics 51
Big Data Data Analytics 52
Big Data Between the dawn of civilization and 2003, we only created five exabyte of information; now we re creating that amount every two days. (Eric Schmidt, Google) Data Analytics 53
What is Big Data? Massive data? How many exabytes? Everything we cannot inspect manually. It s not just about the amount of data it s also about the complexity of the data. Data Analytics 54
The big V s of Big Data Data Analytics 55
The big V s of Big Data Data Analytics 56
The big V s of Big Data Data Analytics 57
The fourth paradigm Data Analytics 58
Ubiquitous data The ability to take data to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it s going to be a hugely important skill in the next decades, not only at the professional level but even at the educational level for elementary school kids, for high-school kids, for college kids. Because now we really do have essentially free and ubiquituous data. (Hal Varian, UC Berkeley) Data Analytics 59
Job growth Data Analytics 60
Processing pipeline Data Analytics 61
1.3 Taxonomy
Data example Data Analytics 63
Data example Data Analytics 64
Data example Data Analytics 65
Data example Data Analytics 66
Data example Data Analytics 67
Taxonomy Data samples are items with attributes. Attributes (stored in tables) can be quantitative continuous numbers (real), discrete numbers (integer), ordinal (ordered sets, rating), or nominal / categorical (unordered sets). Data Analytics 68
Taxonomy Nominal / categorical support = relationship oranges, apples, Ordinal obey < relationship small < medium < large Quantitative can do arithmetics on them cm, kg, Data Analytics 69
Data dimensions Uni-variate Data Analytics 70
Data dimensions Bi-variate Data Analytics 71
Data dimensions Tri-variate Data Analytics 72
Data dimensions Multi-variate Data Analytics 73
Special attributes Geospatial location (longitude, latitude) Time-varying attributes change values over time time series Spatio-temporal geospatial & time-varying Data Analytics 74
Other aspects Data Analytics 75
Complex data attributes Text / document cf. Twitter Image cf. Instagram Data Analytics 76
Data samples may have unilateral relations, bilateral relations, or hierarchical relations. Relations Data Analytics 77