Name of Module: Big Data ECTS: 6 Module-ID: Person Responsible for Module (Name, Mail address): Angel Rodríguez, arodri@fi.upm.es University: UPM Departments: DATSI, DLSIIS 1. Prerequisites for Participation According to the general prerequisites for ICT KIC master programmes, this is the first course for students enrolled in the DS Master Programme. Students must have completed their degree project and should also have participated in the Freshers' Week. 2. Intended Learning Outcomes After finishing the course, students will: Be capable of processing and analysing massive data. Acquainted with how to apply computational data analysis techniques in some specific field of science or engineering. Be acquainted with visual analytics techniques. 3. Content
Description of main issues to be covered: Introduction and basics o Data value chain o CRISP-DM process o Architectures and applications o Data typologies o Visual analytics Data storage o Data services basics and technologies o Non-relational (nosql) data structure models o Specialized models for data typologies o Data storage performance improvement technologies o Massive retrieval and processing technologies Data analytics o Big data o Data types o Data mining problem types o Preprocessing of data mining project data o Scalable data analysis Information display o Basics of information display o Data abstractions o Task abstractions o Visual coding techniques o Design of representation methods o Interactive techniques o Analysis of example systems o General design principles 4. Teaching and Learning Methods
Teaching and learning approach: Theory classes: During a theory class or lecture, lecturers give an oral presentation of the contents of the subject matter under consideration by means of which they provide students with essential and structured information from different sources for specific predefined goals (motivate students, present the contents of a topic, explain knowledge, give theoretical proofs, report experience, etc.), for which purpose they may use other audiovisual teaching resources, documents, etc., apart from oral presentation). Problem-solving classes: This teaching method is used to round out theory classes (lectures) and is based on asking students to develop solutions suited to a particular purpose by running routines, applying formulae or algorithms, enacting information transformation procedures and interpreting results. The main aim is for students to apply what they have learned in order to improve comprehension of both the importance and the content of a new topic, consolidate and apply knowledge and strategies in any practical situations that are raised. Practicals: The aim is to for students to complete medium-sized software development projects from beginning to end. Students shall have to work on a document containing the detailed description of the functional specifications to be met by the project. The final product output shall have to pass an exhaustive set of functional tests. Independent work: These are activities that students shall have to undertake individually without supervision from instructors, although they will receive feedback and support through non-scheduled tutorials. The main aim is to develop students' self-learning capability. Group work: These are activities as part of which several students must perform a particular task or project as a group. Apart from the inherent complexity of the actual project, group work requires the group of students to split up and manage project development by parts. Tutorials: Personalized attention by means of a series of scheduled meetings targeting very small groups of students where they can interact with each other and with the instructor. ECTS distribution (6 ECTS) Individual/group work: 6 5. Assessment and Grading Procedures
The student assessment will be based on two main sources: Development and presentation of individual work (80%) (projects) Written exam (20%) (understanding of basic concepts) The course shall be assessed by means of two types of tests: a) Course examination. At the end of the course, an examination will be set on all of the course content. b) Practical assignments. The statements shall be presented in the assigned classroom and, on the dates specified in the course schedule during the regular course timetable. These projects shall be developed both face-to-face in practical classes in the laboratory and as homework using the resources offered by the Computer Centre for this purpose and with support during tutorials to clear up any questions related to project development. The submission dates shall be staggered throughout the course and shall be published in advance on the course web page. The final course grade shall be calculated considering that the examination is worth 20% and the average grade of the practical projects is worth 80%. In order to pass the course, apart from achieving a final grade of 5 or more, students shall have to have achieved a grade of at least 4 in each of the two parts. 6. Workload calculation (contact hours, homework, exam preparation,..) 36 hours for lectures 28 hours for project lectures 47 hours for individual project work 30 hours for exam preparation (including elevator pitch preparation) 2 hours for exam Personal tuition will be offered to students or teams (average 1 hour/week) 7. Frequency and dates This course shall be organised during the first semester of the 1 st year. 4 hours per week 8. Max. Number of Participants The course is limited to a maximum of 30 students Students shall be divided into teams of 2-3 (depending on the number of students on the course) in order develop the scheduled group-based activities. 9. Enrolment Procedure
Enrolment is not independent of the general enrolment process for the DS Master Programme. E1 is a mandatory unit of the I&E minor. All students on the DS Master Programme must take this course. 10. Recommended Reading, Course Material Students will use the following type of educational material: 1. Slides used in the lectures. 2. References of some case studies. Furthermore, students should have access to the following selected reading: 1. Henry Chesbrough. Open Innovation: The New Imperative for Creating and Profiting from Technology (HBS Press, 2003). 2. Jiawei Han, Micheline Kamber, Data Mining : Concepts and Techniques, 2nd edition, Morgan Kaufmann, ISBN 1558609016, 2006. 3. Pang-Ning Tan, Michael Steinbach, Vipin Kumar, Introduction to Data Mining, Pearson Addison Wesley, ISBN: 0321321367, 2005 4. Ian Witten and Eibe Frank, Data Mining: Practical Machine Learning Tools and Techniques, 2nd Edition, Morgan Kaufmann, ISBN: 0120884070, 2005. 5. Ian Witten, Eibe Frank, Mark Hall, Data Mining: Practical Machine Learning Tools and Techniques, 3rd Edition, Morgan Kaufmann, ISBN:978-0-12-374856-0, 2011. 6. Keim, D., Kohlhammer, J., Ellis, G., Mansmann, F. Mastering the information age. Solving problems with visual analytics 2010 Eurographics Association. 7. Tamara Munzner. Visualization Analysis and Design. A K Peters Visualization Series. CRC Press. Nov. 2014. In successive academic years, individual papers prepared by E1 students shall be also available for other student cohorts. 11. Other Information (e.g. module home page) The course description is available on the E1 web page at the DS web site. Students shall also have access to documents available at the ICT Labs Master School web site