CISC 432/CMPE 432/CISC 832 Advanced Database Systems



Similar documents
COGS 100 F Introduction to Cognitive Science. Farhana Zulkernine

QUEEN S UNIVERSITY DEPARTMENT OF SOCIOLOGY SOCY 273 SOCIAL PSYCHOLOGY WINTER Tuesdays 8:30-10:00 am, Fridays 10:00-11:30 am Botterell Hall, B147

PSYC 203: Research Methods in Psychology Winter Session, 2014 Syllabus

Cleveland State University

NewSQL: Towards Next-Generation Scalable RDBMS for Online Transaction Processing (OLTP) for Big Data Management

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

Course Notes Psychology 271 Online. Course Introduction

How To Handle Big Data With A Data Scientist

NoSQL Data Base Basics

Infrastructures for big data

Preparing Your Data For Cloud

INTRODUCTION TO CASSANDRA

CS 525 Advanced Database Organization - Spring 2013 Mon + Wed 3:15-4:30 PM, Room: Wishnick Hall 113

CS 649 Database Management Systems. Fall 2011

Introduction to NoSQL Databases. Tore Risch Information Technology Uppsala University

CSE 562 Database Systems

How To Scale Out Of A Nosql Database

Lecture Data Warehouse Systems

Northeastern University Online College of Professional Studies Course Syllabus

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

CSE 412/598 Database Management Spring 2012 Semester Syllabus

Client Overview. Engagement Situation. Key Requirements

Structured Data Storage

How To Use Big Data For Telco (For A Telco)

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

Databases 2 (VU) ( )

Enterprise Operational SQL on Hadoop Trafodion Overview

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

How To Write A Database Program

Challenges for Data Driven Systems

Session 1: IT Infrastructure Security Vertica / Hadoop Integration and Analytic Capabilities for Federal Big Data Challenges

Introduction to Databases

Big Data Technologies Compared June 2014

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

A survey of big data architectures for handling massive data

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Database Web Development ITP 300 (3 Units)

Objectives of Lecture 1. Labs and TAs. Class and Office Hours. CMPUT 391: Introduction. Introduction

NoSQL. Thomas Neumann 1 / 22

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard

NoSQL Systems for Big Data Management

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

Cleveland State University

Big Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D.

IST659 Fall 2015 M003 Class Syllabus. Data Administration Concepts and Database Management

Cloud Scale Distributed Data Storage. Jürmo Mehine

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

Reference Architecture, Requirements, Gaps, Roles

Improving Data Processing Speed in Big Data Analytics Using. HDFS Method

Cloud & Big Data a perfect marriage? Patrick Valduriez

Comparing SQL and NOSQL databases

Big Data Analytics. Lucas Rego Drumond

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &

So What s the Big Deal?

CloudDB: A Data Store for all Sizes in the Cloud

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

ISM and 05D, Online Class Business Processes and Information Technology SYLLABUS Fall 2015

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Big Data and Your Data Warehouse Philip Russom

Data Warehousing in the Age of Big Data

Applications for Big Data Analytics

MongoDB in the NoSQL and SQL world. Horst Rechner Berlin,

Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island

Testing Big data is one of the biggest

NoSQL for SQL Professionals William McKnight

DATAOPT SOLUTIONS. What Is Big Data?

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

REAL-TIME BIG DATA ANALYTICS

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale

Cleveland State University

Choosing The Right Big Data Tools For The Job A Polyglot Approach

Evolving Data Warehouse Architectures

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

Keywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop

NoSQL replacement for SQLite (for Beatstream) Antti-Jussi Kovalainen Seminar OHJ-1860: NoSQL databases

The emergence of big data technology and analytics

GGR272: GEOGRAPHIC INFORMATION AND MAPPING I. Course Outline

Cloud Computing at Google. Architecture

Introduction to Apache Cassandra

CSE 544 Principles of Database Management Systems. Magdalena Balazinska (magda) Fall 2007 Lecture 1 - Class Introduction

HYPER-CONVERGED INFRASTRUCTURE STRATEGIES

Integrating Big Data into the Computing Curricula

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Big Data and Databases

Introduction to Business Course Syllabus. Dr. Michelle Choate Office # C221 Phone: Mobile Office:

A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA

Big Data and Data Science: Behind the Buzz Words

Transcription:

CISC 432/CMPE 432/CISC 832 Advanced Database Systems

Course Info Instructor: Patrick Martin Goodwin Hall 630 613 533 6063 martin@cs.queensu.ca Office Hours: Wednesday 11:00 1:00 or by appointment Schedule: Slot 22 (Monday 3:30 4:30, Wednesday 2:30 3:30 and Thursday 4:30 5:30) Botterell Hall B129 Prerequisites: CISC 332* or permission of the instructor. 432/832 2

Assumed Background Previous course in DBMS or equivalent experience Relational model, SQL, relational algebra, schema design File structures, indexes (tree and hash) 432/832 3

Reference Materials Textbook: Database System Concepts(6 th Edition) by A. Silbershatz, H. Korth and S. Sudarshan, McGraw-Hill. Research papers 432/832 4

Learning Outcomes Apply optimization algorithms to SQL queries to produce efficient query plans. Apply concurrency control and recovery algorithms to sample transaction workloads to ensure ACID properties are maintained. Assess the use of relational DBMSs and NoSQL systems for different types of data and applications. Apply a NoSQL system to the creation of a sample database. Apply the MapReduce framework to a sample big data problem. Evaluate the use of a big data approach to a sample application area or problem. 432/832 5

Marking Schemes Undergraduate students (432) 4 assignments (60 %). Due dates: Oct 10, Oct 24, Nov 14, Nov 28 2 term tests (30%). Class participation (10%). 432/832 6

Marking Schemes (Cont.) Graduate students (832) 4 assignments (40%). Due dates: Oct 10, Oct 24, Nov 14, Nov 28 2 term tests (20%). Class participation (10%). Term project (30 %). Proposal due Oct. 17, Report due Dec 5. 432/832 7

Marking Schemes (Cont.) Grading Method In this course, some components will be graded using numerical percentage marks. Other components will receive letter grades, which for purposes of calculating your course average will be translated into numerical equivalents using the Faculty of Arts and Science approved scale (see below). Your course average will then be converted to a final letter grade according to Queen s Official Grade Conversion Scale. Late Policy Assignments should be handed in by 4:00 pm on the day they are due. Late assignments are subject to a 10% per day late penalty, with weekends counted as one day. Late assignments will not be accepted beyond 5 days past the date due. 432/832 8

Class Participation Rubric 9-10 marks 7 8 marks 5 6 marks < 5 marks Frequency and Quality Attends class regularly and always contributes to the discussion by raising thoughtful questions, analyzing relevant issues, building on others ideas, synthesizing across readings and discussions, expanding the class perspective. Attends class regularly and sometimes contributes to the discussion in the aforementioned ways. Attends class regularly but rarely contributes to the discussion in the aforementioned ways. Attends class regularly but never contributes to the discussion in the aforementioned ways. Readings and Surveys Reads all of the assigned readings, demonstrates understanding of the material, completes all required surveys. Reads most of the assigned readings, demonstrates some understanding of the material, completes all required surveys. Reads some of the assigned readings, completes all required surveys Reads some of the assigned readings, completes some required surveys 432/832 9

Academic Integrity Academic integrity is constituted by the five core fundamental values of honesty, trust, fairness, respect and responsibility (see www.academicintegrity.org). These values are central to the building, nurturing and sustaining of an academic community in which all members of the community will thrive. Adherence to the values expressed through academic integrity forms a foundation for the "freedom of inquiry and exchange of ideas" essential to the intellectual life of the University (see the Senate Report on Principles and Priorities) Students are responsible for familiarizing themselves with the regulations concerning academic integrity and for ensuring that their assignments conform to the principles of academic integrity. Information on academic integrity is available in the Arts and Science Calendar (see Academic Regulation 1), on the Arts and Science website (see http://www.queensu.ca/artsci/sites/default/files/academic%20regulations.pdf), and from the instructor of this course. Departures from academic integrity include plagiarism, use of unauthorized materials, facilitation, forgery and falsification, and are antithetical to the development of an academic community at Queen's. Given the seriousness of these matters, actions which contravene the regulation on academic integrity carry sanctions that can range from a warning or the loss of grades on an assignment to the failure of a course to a requirement to withdraw from the university. 432/832 10

Course Content Big Data Structured data Unstructured data Streaming data RDBMS distributed parallel column-oriented multi-tenant NewSQL NoSQL key-value document graph column-oriented DSMS (data model, query performance, consistency, scalability, availability) 432/832 11

Course Schedule Structured Data (weeks 1 7) Relational DBMSs Query processing, query optimization, transaction management, concurrency control, recovery, BI & data warehousing, column-oriented storage, database architectures, SaaS & multi-tenant DBs NewSQL systems 432/832 12

Course Outline (cont) Unstructured Data (weeks 8 11) NoSQL database systems, MapReduce technology for analytics Streaming Data (week 12) Data stream management systems 432/832 13

Focus Issues Data model: language and logical constructs that determine the logical structure of a database and consequently how data is stored, organized, and manipulated. 432/832 14

Focus Issues (cont) Query performance Efficiency of a DBMS typically expressed with metrics like query response time and/or query throughput Consistency Guarantee concerning state of a data item when accessed Eg ACID, BASE 432/832 15

Focus Issues (cont) Scalability Ability of a system to handle a growing amount of work in a capable manner or its ability to be enlarged to accommodate that growth Availability Proportion of time that requests received by a system receive a response. 432/832 16

What is Big Data? 7 definitions of big data (http://timoelliott.com/blog/2013/07/7-definitions-of-big-data-you-should-knowabout.html) 3 V s: Volume, Velocity, Variety New scale-out technologies like Hadoop and NoSQL Data distinctions: generated from transactions versus interactions (eg click on web page) and observations (eg sensors) 432/832 17

What is Big Data? 7 definitions (cont) Signals: data is real-time and facilitates realtime business decisions Opportunity: analyzing data that was previously ignored because of technology limitations Metaphor: process of helping planet grow a nervous system and we are just another sensor New term for old stuff 432/832 18

Related Buzzwords Data analytics (Business Intelligence) Analytics is the process of obtaining an optimal and realistic decision based on existing data. Data analytics (DA) is the science of examining raw data with the purpose of drawing conclusions about that information. 432/832 19

Related Buzzwords Cloud computing A networking solution in which everything from computing power to computing infrastructure, applications, business processes to personal collaboration is delivered as a service wherever and whenever you need. 432/832 20

Related Buzzwords NoSQL (Not Only SQL) NoSQL encompasses a wide range of technologies and architectures not based on the relational model that seeks to solve the scalability and big data performance issues of relational databases. Eg Cassandra, BigTable, SimpleDB, MongoDB, Voldemort 432/832 21

Related Buzzwords NewSQL Encompasses solutions aimed at bringing to the relational model the benefits of horizontal scalability and fault tolerance provided by NoSQL solutions Eg. Google Spanner, VoltDB 432/832 22