Cleveland State University
|
|
|
- Aileen Cannon
- 10 years ago
- Views:
Transcription
1 Cleveland State University CIS 612 Modern Database Processing & Big Data (3-0-3) Fall 2015 Section 50 Class Nbr Tues, Thu 4:30 5:45 PM Prerequisites: CIS 505 and CIS 530. CIS 611 Preferred. Instructor: Dr. Sunnie S. Chung Office Location: BU0112 Phone: Webpage: Office Time: Mon, Wed 2:00 3:30 PM and 5:45 6:45 PM (or by appointment) Class Location: BU 0112 Section 50 Tue & Thu 4:30-5:45 PM Catalog Description: Detailed study of modern database processing and non-relational database systems for big data processing. First, the course presents practical approaches to advanced database processing features and its applications. The course advances with study of semi-structured/non-structured database processing, JSON, XML data processing, XPATH, and XQuery. The course continues study of building distributed database and web service systems with web data processing for data analytics. The course extends study with architectures and features of Massively Parallel Processing (MPP) systems for big data processing Parallel Data Warehouse (PDW) with OLAP, Map Reduce with Hadoop, and Cloud Computing. It continues an exploration of applications and big data processing systems of NoSQLand NewSQ such as Google s Big Table, Hive, HBase, PigLatin, Mongo DB, VoltDB. Finally, the course will explore the most significant industry research trend for big data processing and data analytics. Key Concepts: Modern Database Programming, Web Data Processing, Semi-structured database processing, JASON, XML data processing, XPath, XQuery, Unstructured data processing, web data processing, Massively Parallel Processing (MPP) systems, Map Reduce with Hadoop, Cloud Computing, Parallel Data Warehouse (PDW), OLAP Cube, Big Data Processing, Google s Big Table, Hive, HBase, PigLatin, ACID, MongoDB, VoltDB, NoSQL, NewSQL, and Cloud Computing for Data Analytics. Expected Outcomes: Upon successful completion of the course, the student will be able to create modern database applications to process non-traditional data such as web data, semi-structured/unstructured data, - JASON, XML data, or plain text data. The student will be able to design data warehouse, create OLAP Cubes and write analytical data mining queries. More importantly, the student will obtain comprehensive knowledge and analytical skills to build an infrastructure for a big data processing system for data analytics and its applications by understandings the problems of big data processing and its well known approaches in the modern database systems. The student will further extend knowledge with the architectures and the features of modern parallel computing systems for big data processing PDW, Map Reduce and Hadoop, and Cloud Computing. The student will obtain hands on experiences in major NoSQL/NewSQL systems to create Map Reduce applications for big data processing systems and further exposed to the significant current database industry research trends for big data processing. List of Required Materials: Any RDBMS: Oracle Database 11g or higher - This is available at Microsoft SQL Server 2012, Microsoft Visual Studio 2012 or any higher Microsoft SQL Server Business Intelligence Data Analytic Tool 2012 OLAP Server and SQL Server Data Tool They are available at the Microsoft Academic Alliance program: Adventure Works 2012 Data Warehouse Database for SQL Server 2012 Instruction will be given in class
2 Applied Analytics Using SAS Enterprise Miner Several NoSQL/NewSQL Open Source Systems Installation/Set Up details will be given in class Text Book: 1. Lecture Notes will be available on the class webpage 2. "Fundamentals of Database Systems". Elmasri / Navathe. Ed. Addison/Wesley Pub Co. 6th Edition, (2011). 3. List of Selected Database Industry R&D Papers on Big Data Processing and Data Analytics will be given in class Supplement Text Book: 1. Data Mining with Microsoft SQL Server BI Data Tool (2008 or 2012), Jamie MacLennan, Bogdan Crivat, Publisher: Wiley; 1 Edition ISBN-10: ISBN-13: Edition: 1 2. Database Management Systems 3rd Edition by Raghu Ramakrishnan and Johannes Gehrke. Ed. McGraw-Hill. Available at Official Calendar Please consult the university web page at: Final exam: Tues Dec 8 4:00-6:00 PM Grading: The course grade is based on a student's overall performance through the entire Semester. The final grade is distributed among the following components: 1. Exams (Midterm & Final) 40% (15% Midterm, 25% Final) 2. Computer Labs 30% (about 3-4 Assignments) 3. 1 Project on Big data processing: 2 person group project (20%) 4. Research Topic Presentation : 10% A 94% + A: Outstanding (student's performance is genuinely excellent) A- 90% - 93% B+ 87% - 89% B 80% - 86% B: Very Good (student's performance is clearly commendable but not necessarily outstanding) B- 70% - 79% C <70% C: Good (student's performance meets every course requirement and is acceptable; not distinguished) D F < 60% D: Below Average (student's performance fails to meet course objectives and standards) F: Failure (student's performance is unacceptable) Examination Policy: Students are allowed to bring to the tests a summary page (standard letter size) with their own notes. During the exams: (1) the use of books, cell phones, calculators, or any electronic devices is prohibited, and (2) students must not share any materials. Make-Up Exam Policy: No makeup exams will be given unless notified and agreed to in advance. Requests will be considered only in case of exceptional demonstrated need. Homework Policy: The students are expected to attend all classes. The students are responsible for collecting the notes, handouts and any other course material distributed during the class period. All
3 assignments must be individually and independently completed and must represent the effort of the student turning in the assignment. Should two or more students turn in substantially the same solution or output, in the judgment of the instructor, the solution will be considered group effort. All involved in group effort homework will receive a zero grade for that assignment. A student turning in a group effort assignment more than once will automatically receive an F grade for the course. Late Assignment: All lab assignments are due at the beginning of class on the date specified. Laboratory Assignments handed in after the class has begun will be accepted with a 25% grade penalty for up to a week and then not accepted at all. All laboratory assignments must be completed. Failure to do so will lower your course grade one additional letter grade. Student Conduct: Students are expected to do their own work. Academic misconduct, student misconduct, cheating and plagiarism will not be tolerated. Violations will be subject to disciplinary action as specified in the CSU Student Conduct Code. A copy can be obtained on the web page at: or by contacting Valerie Hinton Hannah, Judicial Affairs Officer in the Department of Student Life (MC 106 [email protected] ). For more information consult the following web page CSU Judicial Affairs available at Course Schedule: The schedule of topics and their order of coverage is given below. The schedule and topics to be covered may vary depending upon the progress made. Week of Topic Reading 1,2 Review: Database Foundations DBMS Architecture, Overview of Query Processing and Execution Steps Complex Queries, Transaction, Advanced Topics in Views Big Data Elmasri. Chp. 8,10,11 Lecture Notes 2, 3, 4 Modern Database Programming: Database Triggers Embedded SQL, Dynamic SQL, PHP Stored Procedure, User Defined Function (UDF), User Defined Type (UDT), User Defined Aggregate (UDA), Table Function, CLR, LINQ,.NET Database Programmability and Extensibility in Microsoft SQL Server by José A. Blakeley, et al (Microsoft Corporation) in the proceedings of SIGMOD , 5, 6 Modern Databases: Enhanced Data Models for Advanced Applications Semi Structured and Unstructured Databases: XML Data Processing: - XML Schema, Syntax/Semantics, Protocol - XPath - XQuery Data Transformation from Semi Structure to Relation JSON Data Processing Lecture Notes, Elmasri, 10, 11 Listed Papers Lecture Notes, Elmasri 12, 19, 20, 21 Listed papers. Introduction to Information Retrieval and Web Data Processing Data Models for Unstructured Data Processing Bigtable: A Distributed Storage System for Structured Data, by
4 Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Google, Inc. in the Proceedings of OSDI Overview of Query Processing Techniques and Operators: Join, Projection, Selection, Group By, Aggregations 8 Data Warehouse and OLAP (Tentative for the limited time) - Decision Support Technology - On Line Analytical Processing - Star Schema - OLAP Aggregation Operators: Data Cube, Roll Up, Drill Down - Building BI Data Analytics using DW and OLAP - MDX, DMX Ramakrishnan 11, 12 Lecture Notes. Selected Papers J. Han Chap 1,2,3. Listed Papers, Lecture Notes An Overview of Data Warehousing and OLAP Technology by Surajit Chaudhuri (Microsoft) and Umeshwar Dayal (HP Labs), in the proceedings of IEEE 1995 Data Cube: A Relational Aggregation Operator Generalizing Group By, Cross Tab, and SubTotals by Jim Gray (Microsoft), et al, in the proceedings of IEEE Data Mining for Data Analytics (Tentative for the limited time) - Concept & Architecture - Support and Confidence - Association Rules - Data Mining Algorithms: o Associate Rule Mining o APRIORI Algorithm and Optimization o Frequent Pattern Tree o Decision Tree - Lift and Other Correlation Measures J Han Chap 6. Listed Papers, Lecture Notes 9, 10, 11, 12, 13, 14 - Integrating association rule mining with relational database systems: Alternatives and implications by S. Sarawagi, et al (IBM Almaden Research Center) in the proceedings of SIGMOD'98 Big Data Processing and Parallel Computing: Introduction of Big Data Massively Parallel Processing Systems: Google Map Reduce Tutorial Apache Hadoop File System Pig Latin on Apache Hadoop by Yahoo and Apache Data Warehouse HIVE with Hadoop by Facebook HBase Key Value Stores Map Reduce Join Algorithms Parallel Data Warehouse with OLAP Query Processing: Microsoft Extended PDW with Map Reduce and Hadoop : Oracle, Teradata Columnar Databases : SAP HANA Databases Extended PDW with Columnar Data Processing :Teradata, Microsoft MongoDB Cassandra VoltDB NoSQL vs NewSQL ACID Listed papers. Lecture Notes
5 - MapReduce: Simplified Data Processing on Large Clusters by Jeffrey Dean (Google) and Sanjay Ghemawat (Google) in the proceedings of OSDI Apathy Hadoop in White Papers by Apache, Yahoo - Pig Latin: A Not-So-Foreign Language for Data Processing, Christopher Olston, et al. (Yahoo! Research) in the proceedings of SIGMOD Data Warehousing and Analytics Infrastructure at Facebook. by Ashish Thusoo, et al. (Facebook) in the proceedings of SIGMOD Petabyte Scale Databases and Storage Systems Deployed at Facebook, Dhruba Borthakur, et al. in the proceedings of SIGMOD Fast Data in the Era of Big Data: Twitter s Real-Time Related Query Suggestion Architecture, Gilad Mishne, Jeff Dalton, Zhenghua Li, Aneesh Sharma, Jimmy Lin (Twitter, Inc), SIGMOD The Big Data Ecosystem at LinkedIn - Roshan Sumbaly, Jay Kreps, and Sam Shah (LinkedIn), SIGMOD Avatara: OLAP for Webscale Analytics Products Lili Wu Roshan Sumbaly Chris Riccomini Gordon Koo Hyung Jin Kim Jay, Kreps Sam Shah (LinkedIn), SIGMOD More to be given in class Cloud Computing - SQL Azure as a Self-Managing Database Service: Lessons Learned and Challenges Ahead by Kunal Mukerjee, et al (Microsoft) in the proceedings of IEEE Computer Society Technical Committee on Data Engineering , 16 Presentation of Significant Database Industry Research Papers on Big Data Processing: List of Selected Papers will be given in class. Selected Papers Technical Topics: Select one and prepare a 20 min talk on the subject. 1. Semistructured/Unstructured Data Processing using Structured Data Model 2. Data Warehousing and Analytics Infrastructure at Facebook 3. Parallel Computing for Big Data Processing: Google Map Reduce with Apache Hadoop Parallel Data Warehouse (PDW) 4. MapReduce: Simplified Data Processing on Large Clusters by Google 5. Lammal, Ralf. Google's MapReduce Programming Model Revisited. 6. Open Source Apache Hadoop 7. Pig Latin, Hbase, Hive 8. Map Reduce Join Algorithmes, 9. Data Partition Techniques 10. SQL vs NoSQL 11. Processing MR/Hadoop with PDW : Oracle, Teradata 12. Columnar Databases : SAP Hana 13. Information Retrieval: How Google Search Engine Works 14. Cloud Computing : Microsoft AZURE, Google Cloud. More New Topics to come here.
6 NOTE: The instructor reserves the right to retain, for pedagogical reasons, either the original or a copy of your work submitted either individually or as a group project for this class. Students' names will be deleted from any retained items.
Cleveland State University
Cleveland State University CIS 695 Big Data Processing and Data Analytics (3-0-3) 2016 Section 51 Class Nbr. 5493. Tues, Thur TBA Prerequisites: CIS 505 and CIS 530. CIS 612, CIS 660 Preferred. Instructor:
Cleveland State University
Cleveland State University CIS 612 Modern Database Programming & Big Data Processing (3-0-3) Fall 2014 Section 50 Class Nbr. 2670. Tues, Thur 4:00 5:15 PM Prerequisites: CIS 505 and CIS 530. CIS 611 Preferred.
An Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov
An Industrial Perspective on the Hadoop Ecosystem Eldar Khalilov Pavel Valov agenda 03.12.2015 2 agenda Introduction 03.12.2015 2 agenda Introduction Research goals 03.12.2015 2 agenda Introduction Research
Sunnie Chung. Cleveland State University
Sunnie Chung Cleveland State University Data Scientist Big Data Processing Data Mining 2 INTERSECT of Computer Scientists and Statisticians with Knowledge of Data Mining AND Big data Processing Skills:
Data Warehouses and Business Intelligence ITP 487 (3 Units) Fall 2013. Objective
Data Warehouses and Business Intelligence ITP 487 (3 Units) Objective Fall 2013 While the increased capacity and availability of data gathering and storage systems have allowed enterprises to store more
Sunnie Chung. Cleveland State University
Sunnie Chung Cleveland State University They are very new technologies to Computer Science in rise of Web Service on Internet (IoT) They were fast developed and fast evolving Research and Developments
Big Data Technologies Compared June 2014
Big Data Technologies Compared June 2014 Agenda What is Big Data Big Data Technology Comparison Summary Other Big Data Technologies Questions 2 What is Big Data by Example The SKA Telescope is a new development
Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard
Hadoop and Relational base The Best of Both Worlds for Analytics Greg Battas Hewlett Packard The Evolution of Analytics Mainframe EDW Proprietary MPP Unix SMP MPP Appliance Hadoop? Questions Is Hadoop
USC Viterbi School of Engineering
USC Viterbi School of Engineering INF 551: Foundations of Data Management Units: 3 Term Day Time: Spring 2016 MW 8:30 9:50am (section 32411D) Location: GFS 116 Instructor: Wensheng Wu Office: GER 204 Office
Fast Data in the Era of Big Data: Twitter s Real-
Fast Data in the Era of Big Data: Twitter s Real- Time Related Query Suggestion Architecture Gilad Mishne, Jeff Dalton, Zhenghua Li, Aneesh Sharma, Jimmy Lin Presented by: Rania Ibrahim 1 AGENDA Motivation
CISC 432/CMPE 432/CISC 832 Advanced Database Systems
CISC 432/CMPE 432/CISC 832 Advanced Database Systems Course Info Instructor: Patrick Martin Goodwin Hall 630 613 533 6063 [email protected] Office Hours: Wednesday 11:00 1:00 or by appointment Schedule:
AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW
AGENDA What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story Hadoop PDW Our BIG DATA Roadmap BIG DATA? Volume 59% growth in annual WW information 1.2M Zetabytes (10 21 bytes) this
PELLISSIPPI STATE COMMUNITY COLLEGE MASTER SYLLABUS ADVANCED DATABASE MANAGEMENT SYSTEMS CSIT 2510
PELLISSIPPI STATE COMMUNITY COLLEGE MASTER SYLLABUS ADVANCED DATABASE MANAGEMENT SYSTEMS CSIT 2510 Class Hours: 2.0 Credit Hours: 3.0 Laboratory Hours: 2.0 Revised: Fall 2012 Catalog Course Description:
Objectives of Lecture 1. Class and Office Hours. Labs and TAs. CMPUT 391: Introduction. Introduction
Database Management Systems Winter 2004 CMPUT 391: Introduction Dr. Osmar R. Zaïane Objectives of Lecture 1 Introduction Get a rough initial idea about the content of the course: Lectures Resources Activities
Enhancing Massive Data Analytics with the Hadoop Ecosystem
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 3, Issue 11 November, 2014 Page No. 9061-9065 Enhancing Massive Data Analytics with the Hadoop Ecosystem Misha
Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012
Big Data Buzzwords From A to Z By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012 Big Data Buzzwords Big data is one of the, well, biggest trends in IT today, and it has spawned a whole new generation
Advanced Database Management MISM Course F14-95704 A Fall 2014
Advanced Database Management MISM Course F14-95704 A Fall 2014 Carnegie Mellon University Instructor: Randy Trzeciak Office: Software Engineering Institute / CERT CIC Office hours: By Appointment Phone:
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics
Course Design Document. IS417: Data Warehousing and Business Analytics
Course Design Document IS417: Data Warehousing and Business Analytics Version 2.1 20 June 2009 IS417 Data Warehousing and Business Analytics Page 1 Table of Contents 1. Versions History... 3 2. Overview
SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford
SQL VS. NO-SQL Adapted Slides from Dr. Jennifer Widom from Stanford 55 Traditional Databases SQL = Traditional relational DBMS Hugely popular among data analysts Widely adopted for transaction systems
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to
Native Connectivity to Big Data Sources in MSTR 10
Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single
Application Development. A Paradigm Shift
Application Development for the Cloud: A Paradigm Shift Ramesh Rangachar Intelsat t 2012 by Intelsat. t Published by The Aerospace Corporation with permission. New 2007 Template - 1 Motivation for the
Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.
Collaborative Big Data Analytics 1 Big Data Is Less About Size, And More About Freedom TechCrunch!!!!!!!!! Total data: bigger than big data 451 Group Findings: Big Data Is More Extreme Than Volume Gartner!!!!!!!!!!!!!!!
Objectives of Lecture 1. Labs and TAs. Class and Office Hours. CMPUT 391: Introduction. Introduction
Database Management Systems Winter 2003 CMPUT 391: Introduction Dr. Osmar R. Zaïane Objectives of Lecture 1 Introduction Get a rough initial idea about the content of the course: Lectures Resources Activities
Can the Elephants Handle the NoSQL Onslaught?
Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented
INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & MANAGEMENT INFORMATION SYSTEM (IJITMIS)
INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & MANAGEMENT INFORMATION SYSTEM (IJITMIS) International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976 ISSN 0976
Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru
Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy Presented by: Jeffrey Zhang and Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop?
An Approach to Implement Map Reduce with NoSQL Databases
www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 8 Aug 2015, Page No. 13635-13639 An Approach to Implement Map Reduce with NoSQL Databases Ashutosh
ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
How To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
Tap into Hadoop and Other No SQL Sources
Tap into Hadoop and Other No SQL Sources Presented by: Trishla Maru What is Big Data really? The Three Vs of Big Data According to Gartner Volume Volume Orders of magnitude bigger than conventional data
Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores
Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores Composite Software October 2010 TABLE OF CONTENTS INTRODUCTION... 3 BUSINESS AND IT DRIVERS... 4 NOSQL DATA STORES LANDSCAPE...
INTRODUCTION TO CASSANDRA
INTRODUCTION TO CASSANDRA This ebook provides a high level overview of Cassandra and describes some of its key strengths and applications. WHAT IS CASSANDRA? Apache Cassandra is a high performance, open
Data Modeling for Big Data
Data Modeling for Big Data by Jinbao Zhu, Principal Software Engineer, and Allen Wang, Manager, Software Engineering, CA Technologies In the Internet era, the volume of data we deal with has grown to terabytes
Introduction to Big Data! with Apache Spark" UC#BERKELEY#
Introduction to Big Data! with Apache Spark" UC#BERKELEY# So What is Data Science?" Doing Data Science" Data Preparation" Roles" This Lecture" What is Data Science?" Data Science aims to derive knowledge!
CS 564: DATABASE MANAGEMENT SYSTEMS
Fall 2013 CS 564: DATABASE MANAGEMENT SYSTEMS 9/4/13 CS 564: Database Management Systems, Jignesh M. Patel 1 Teaching Staff Instructor: Jignesh Patel, [email protected] Office Hours: Mon, Wed 1:30-2:30
Big Data and Hadoop with components like Flume, Pig, Hive and Jaql
Abstract- Today data is increasing in volume, variety and velocity. To manage this data, we have to use databases with massively parallel software running on tens, hundreds, or more than thousands of servers.
SQL Server 2012 PDW. Ryan Simpson Technical Solution Professional PDW Microsoft. Microsoft SQL Server 2012 Parallel Data Warehouse
SQL Server 2012 PDW Ryan Simpson Technical Solution Professional PDW Microsoft Microsoft SQL Server 2012 Parallel Data Warehouse Massively Parallel Processing Platform Delivers Big Data HDFS Delivers Scale
Integrating Big Data into the Computing Curricula
Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big
BUDT 758B-0501: Big Data Analytics (Fall 2015) Decisions, Operations & Information Technologies Robert H. Smith School of Business
BUDT 758B-0501: Big Data Analytics (Fall 2015) Decisions, Operations & Information Technologies Robert H. Smith School of Business Instructor: Kunpeng Zhang ([email protected]) Lecture-Discussions:
BIG DATA TRENDS AND TECHNOLOGIES
BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.
You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.
What is this course about? This course is an overview of Big Data tools and technologies. It establishes a strong working knowledge of the concepts, techniques, and products associated with Big Data. Attendees
Azure Data Lake Analytics
Azure Data Lake Analytics Compose and orchestrate data services at scale Fully managed service to support orchestration of data movement and processing Connect to relational or non-relational data
Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 15
Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 15 Big Data Management V (Big-data Analytics / Map-Reduce) Chapter 16 and 19: Abideboul et. Al. Demetris
Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12
Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using
Apriori-Map/Reduce Algorithm
Apriori-Map/Reduce Algorithm Jongwook Woo Computer Information Systems Department California State University Los Angeles, CA Abstract Map/Reduce algorithm has received highlights as cloud computing services
A Comparative Study on Operational Database, Data Warehouse and Hadoop File System T.Jalaja 1, M.Shailaja 2
RESEARCH ARTICLE A Comparative Study on Operational base, Warehouse Hadoop File System T.Jalaja 1, M.Shailaja 2 1,2 (Department of Computer Science, Osmania University/Vasavi College of Engineering, Hyderabad,
Big Data Analytics. Lucas Rego Drumond
Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Big Data Analytics Big Data Analytics 1 / 36 Outline
Bringing Big Data to People
Bringing Big Data to People Microsoft s modern data platform SQL Server 2014 Analytics Platform System Microsoft Azure HDInsight Data Platform Everyone should have access to the data they need. Process
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing
Big Data on Microsoft Platform
Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4
Lecture Data Warehouse Systems
Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches in DW NoSQL and MapReduce Stonebraker on Data Warehouses Star and snowflake schemas are a good idea in the DW world C-Stores
Big Data and Data Science: Behind the Buzz Words
Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing
BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES
BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES Relational vs. Non-Relational Architecture Relational Non-Relational Rational Predictable Traditional Agile Flexible Modern 2 Agenda Big Data
WHITE PAPER. Four Key Pillars To A Big Data Management Solution
WHITE PAPER Four Key Pillars To A Big Data Management Solution EXECUTIVE SUMMARY... 4 1. Big Data: a Big Term... 4 EVOLVING BIG DATA USE CASES... 7 Recommendation Engines... 7 Marketing Campaign Analysis...
Syllabus for Course 1-02-326: Database Systems Engineering at Kinneret College
Syllabus for Course 1-02-326: Database Systems Engineering at Kinneret College Instructor: Michael J. May Semester 2 of 5769 1 Course Details The course meets 9:00am 11:00am on Wednesdays. The Targil for
Instructor: Michael J. May Semester 1 of 5775. The course meets 9:00am 11:00am on Sundays. The Targil for the course is 12:00pm 2:00pm on Sundays.
Syllabus for ISM 14-324: Database Systems Department of Software and Information Systems Achi Racov School of Engineering Kinneret College on the Sea of Galilee Instructor: Michael J. May Semester 1 of
Keywords Big Data Analytic Tools, Data Mining, Hadoop and MapReduce, HBase and Hive tools, User-Friendly tools.
Volume 3, Issue 8, August 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Significant Trends
Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect
Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate
The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn
The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn Presented by :- Ishank Kumar Aakash Patel Vishnu Dev Yadav CONTENT Abstract Introduction Related work The Ecosystem Ingress
Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth
MAKING BIG DATA COME ALIVE Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth Steve Gonzales, Principal Manager [email protected]
Workshop on Hadoop with Big Data
Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly
Business Intelligence and Column-Oriented Databases
Page 12 of 344 Business Intelligence and Column-Oriented Databases Kornelije Rabuzin Faculty of Organization and Informatics University of Zagreb Pavlinska 2, 42000 [email protected] Nikola Modrušan
BUSINESS INTELLIGENCE AND NOSQL DATABASES
INFORMATION SYSTEMS IN MANAGEMENT Information Systems in Management (2012) Vol. 1 (1) 25 37 BUSINESS INTELLIGENCE AND NOSQL DATABASES JERZY DUDA Department of Applied Computer Science, Faculty of Management,
American International Journal of Research in Science, Technology, Engineering & Mathematics
American International Journal of Research in Science, Technology, Engineering & Mathematics Available online at http://www.iasir.net ISSN (Print): 2328-3491, ISSN (Online): 2328-3580, ISSN (CD-ROM): 2328-3629
CSE 427 CLOUD COMPUTING WITH BIG DATA APPLICATIONS
CSE 427 CLOUD COMPUTING WITH BIG DATA APPLICATIONS COURSE OVERVIEW & STRUCTURE Fall 2015 Marion Neumann ABOUT Marion Neumann email: m dot neumann at wustl dot edu office: Jolley Hall 403 office hours:
Step by Step: Big Data Technology. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 25 August 2015
Step by Step: Big Data Technology Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 25 August 2015 Data Sources IT Infrastructure Analytics 2 B y 2015, 20% of Global 1000 organizations
Data Warehouse design
Data Warehouse design Design of Enterprise Systems University of Pavia 10/12/2013 2h for the first; 2h for hadoop - 1- Table of Contents Big Data Overview Big Data DW & BI Big Data Market Hadoop & Mahout
Big Data and Apache Hadoop s MapReduce
Big Data and Apache Hadoop s MapReduce Michael Hahsler Computer Science and Engineering Southern Methodist University January 23, 2012 Michael Hahsler (SMU/CSE) Hadoop/MapReduce January 23, 2012 1 / 23
ANALYTICS CENTER LEARNING PROGRAM
Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals
Big Data: Tools and Technologies in Big Data
Big Data: Tools and Technologies in Big Data Jaskaran Singh Student Lovely Professional University, Punjab Varun Singla Assistant Professor Lovely Professional University, Punjab ABSTRACT Big data can
Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy
Native Connectivity to Big Data Sources in MicroStrategy 10 Presented by: Raja Ganapathy Agenda MicroStrategy supports several data sources, including Hadoop Why Hadoop? How does MicroStrategy Analytics
How To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 [email protected] www.scch.at Michael Zwick DI
SENG 520, Experience with a high-level programming language. (304) 579-7726, [email protected]
Course : Semester : Course Format And Credit hours : Prerequisites : Data Warehousing and Business Intelligence Summer (Odd Years) online 3 hr Credit SENG 520, Experience with a high-level programming
Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?
Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time? Kai Wähner [email protected] @KaiWaehner www.kai-waehner.de Disclaimer! These opinions are my own and do not necessarily
Big Data With Hadoop
With Saurabh Singh [email protected] The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials
BIRT in the World of Big Data
BIRT in the World of Big Data David Rosenbacher VP Sales Engineering Actuate Corporation 2013 Actuate Customer Days Today s Agenda and Goals Introduction to Big Data Compare with Regular Data Common Approaches
Search and Real-Time Analytics on Big Data
Search and Real-Time Analytics on Big Data Sewook Wee, Ryan Tabora, Jason Rutherglen Accenture & Think Big Analytics Strata New York October, 2012 Big Data: data becomes your core asset. It realizes its
#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld
Tapping into Hadoop and NoSQL Data Sources in MicroStrategy Presented by: Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop? Customer Case
Testing 3Vs (Volume, Variety and Velocity) of Big Data
Testing 3Vs (Volume, Variety and Velocity) of Big Data 1 A lot happens in the Digital World in 60 seconds 2 What is Big Data Big Data refers to data sets whose size is beyond the ability of commonly used
Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN
Hadoop MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Understanding Hadoop Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current
SQL Server 2012 Business Intelligence Boot Camp
SQL Server 2012 Business Intelligence Boot Camp Length: 5 Days Technology: Microsoft SQL Server 2012 Delivery Method: Instructor-led (classroom) About this Course Data warehousing is a solution organizations
TOP 8 TRENDS FOR 2016 BIG DATA
The year 2015 was an important one in the world of big data. What used to be hype became the norm as more businesses realized that data, in all forms and sizes, is critical to making the best possible
CSE 412/598 Database Management Spring 2012 Semester Syllabus http://my.asu.edu/
PROFESSOR: Dr. Hasan Davulcu OFFICE: BY 564 CSE 412/598 Database Management Spring 2012 Semester Syllabus http://my.asu.edu/ OFFICE HOURS: TBD Students should make every effort to utilize the scheduled
Cost-Effective Business Intelligence with Red Hat and Open Source
Cost-Effective Business Intelligence with Red Hat and Open Source Sherman Wood Director, Business Intelligence, Jaspersoft September 3, 2009 1 Agenda Introductions Quick survey What is BI?: reporting,
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK OVERVIEW ON BIG DATA SYSTEMATIC TOOLS MR. SACHIN D. CHAVHAN 1, PROF. S. A. BHURA
Big Data solutions to support Intelligent Systems and Applications
Big solutions to support Intelligent Systems and Applications Luciana Lima, Filipe Portela, Manuel Filipe Santos, António Abelha and José Machado. Abstract in the last years the number of data available
P4.1 Reference Architectures for Enterprise Big Data Use Cases Romeo Kienzler, Data Scientist, Advisory Architect, IBM Germany, Austria, Switzerland
P4.1 Reference Architectures for Enterprise Big Data Use Cases Romeo Kienzler, Data Scientist, Advisory Architect, IBM Germany, Austria, Switzerland IBM Center of Excellence for Data Science, Cognitive
Big data for the Masses The Unique Challenge of Big Data Integration
Big data for the Masses The Unique Challenge of Big Data Integration White Paper Table of contents Executive Summary... 4 1. Big Data: a Big Term... 4 1.1. The Big Data... 4 1.2. The Big Technology...
LEARNING SOLUTIONS website milner.com/learning email [email protected] phone 800 875 5042
Course 20467A: Designing Business Intelligence Solutions with Microsoft SQL Server 2012 Length: 5 Days Published: December 21, 2012 Language(s): English Audience(s): IT Professionals Overview Level: 300
Big Data Explained. An introduction to Big Data Science.
Big Data Explained An introduction to Big Data Science. 1 Presentation Agenda What is Big Data Why learn Big Data Who is it for How to start learning Big Data When to learn it Objective and Benefits of
CS 649 Database Management Systems. Fall 2011
SCHOOL OF BUSINESS, PUBLIC ADMINISTRATION AND INFORMATION SCIENCES LONG ISLAND UNIVERSITY, BROOKLYN CAMPUS DEPARTMENT OF COMPUTER SCIENCE CS 649 Database Management Systems Fall 2011 Course Schedule: Thursday
www.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage
www.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage If every image made and every word written from the earliest stirring of civilization
This Symposium brought to you by www.ttcus.com
This Symposium brought to you by www.ttcus.com Linkedin/Group: Technology Training Corporation @Techtrain Technology Training Corporation www.ttcus.com Big Data Analytics as a Service (BDAaaS) Big Data
W H I T E P A P E R. Building your Big Data analytics strategy: Block-by-Block! Abstract
W H I T E P A P E R Building your Big Data analytics strategy: Block-by-Block! Abstract In this white paper, Impetus discusses how you can handle Big Data problems. It talks about how analytics on Big
Big Data Scenario mit Power BI vs. SAP HANA Gerhard Brückl
Big Data Scenario mit Power BI vs. SAP HANA Gerhard Brückl About me Gerhard Brückl Working with Microsoft BI since 2006 Started working with SAP HANA in 2013 focused on Analytics and Reporting Blog: email:
